Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: 1) Construct graphs that appropriately describe data 2) Calculate and interpret numerical summaries of a data set. 3) Combine numerical methods with graphical methods to analyze a data set. 4) Apply graphical methods of summarizing data to choose appropriate numerical summaries. 5) Apply software and/or calculators to automate graphical and numerical summary procedures.
144
Embed
Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 3Descriptive Statistics Graphical
and Numerical Summaries of DataUNIT OBJECTIVES
At the conclusion of this unit you should be able to 1) Construct graphs that appropriately describe
data 2) Calculate and interpret numerical summaries
of a data set 3) Combine numerical methods with graphical
methods to analyze a data set 4) Apply graphical methods of summarizing data
to choose appropriate numerical summaries 5) Apply software andor calculators to automate
graphical and numerical summary procedures
Section 31Displaying Categorical Data
ldquoSometimes you can see a lot just by lookingrdquo
Yogi Berra
Hall of Fame Catcher NY Yankees
The three rules of data analysis wonrsquot be difficult to remember
1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data
2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns
3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture
Bar Charts show counts or relative frequency for
each category Example Titanic passengercrew distribution
Titanic Passengers by Class
885
325285
706
000
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Crew First Second Third
Pie Charts shows proportions of the
whole in each category Example Titanic passengercrew
distribution Titanic Passengers by Class
Crew40
First15
Second13
Third32
Example Top 10 causes of death in the United States
Rank Causes of death Counts of top 10s
of total deaths
1 Heart disease 700142 37 28
2 Cancer 553768 29 22
3 Cerebrovascular 163538 9 6
4 Chronic respiratory 123013 6 5
5 Accidents 101537 5 4
6 Diabetes mellitus 71372 4 3
7 Flu and pneumonia 62034 3 2
8 Alzheimerrsquos disease 53852 3 2
9 Kidney disorders 39480 2 2
10 Septicemia 32238 2 1
All other causes 629967 25
For each individual who died in the United States we record what was the
cause of death The table above is a summary of that information
0100200300400500600700800
Counts
(x1000)
Top 10 causes of deaths in the United States
Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or
sometimes the percentage) for that particular category
The number of individuals who died of an accident in is approximately 100000
0100200300400500600700800
Counts
(x1000)
Bar graph sorted by rank Easy to analyze
Top 10 causes of deaths in the United States
0100200300400500600700800
Cou
nts
(x10
00)
Sorted alphabetically Much less useful
1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119
1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Section 31Displaying Categorical Data
ldquoSometimes you can see a lot just by lookingrdquo
Yogi Berra
Hall of Fame Catcher NY Yankees
The three rules of data analysis wonrsquot be difficult to remember
1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data
2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns
3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture
Bar Charts show counts or relative frequency for
each category Example Titanic passengercrew distribution
Titanic Passengers by Class
885
325285
706
000
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Crew First Second Third
Pie Charts shows proportions of the
whole in each category Example Titanic passengercrew
distribution Titanic Passengers by Class
Crew40
First15
Second13
Third32
Example Top 10 causes of death in the United States
Rank Causes of death Counts of top 10s
of total deaths
1 Heart disease 700142 37 28
2 Cancer 553768 29 22
3 Cerebrovascular 163538 9 6
4 Chronic respiratory 123013 6 5
5 Accidents 101537 5 4
6 Diabetes mellitus 71372 4 3
7 Flu and pneumonia 62034 3 2
8 Alzheimerrsquos disease 53852 3 2
9 Kidney disorders 39480 2 2
10 Septicemia 32238 2 1
All other causes 629967 25
For each individual who died in the United States we record what was the
cause of death The table above is a summary of that information
0100200300400500600700800
Counts
(x1000)
Top 10 causes of deaths in the United States
Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or
sometimes the percentage) for that particular category
The number of individuals who died of an accident in is approximately 100000
0100200300400500600700800
Counts
(x1000)
Bar graph sorted by rank Easy to analyze
Top 10 causes of deaths in the United States
0100200300400500600700800
Cou
nts
(x10
00)
Sorted alphabetically Much less useful
1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119
1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
The three rules of data analysis wonrsquot be difficult to remember
1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data
2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns
3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture
Bar Charts show counts or relative frequency for
each category Example Titanic passengercrew distribution
Titanic Passengers by Class
885
325285
706
000
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Crew First Second Third
Pie Charts shows proportions of the
whole in each category Example Titanic passengercrew
distribution Titanic Passengers by Class
Crew40
First15
Second13
Third32
Example Top 10 causes of death in the United States
Rank Causes of death Counts of top 10s
of total deaths
1 Heart disease 700142 37 28
2 Cancer 553768 29 22
3 Cerebrovascular 163538 9 6
4 Chronic respiratory 123013 6 5
5 Accidents 101537 5 4
6 Diabetes mellitus 71372 4 3
7 Flu and pneumonia 62034 3 2
8 Alzheimerrsquos disease 53852 3 2
9 Kidney disorders 39480 2 2
10 Septicemia 32238 2 1
All other causes 629967 25
For each individual who died in the United States we record what was the
cause of death The table above is a summary of that information
0100200300400500600700800
Counts
(x1000)
Top 10 causes of deaths in the United States
Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or
sometimes the percentage) for that particular category
The number of individuals who died of an accident in is approximately 100000
0100200300400500600700800
Counts
(x1000)
Bar graph sorted by rank Easy to analyze
Top 10 causes of deaths in the United States
0100200300400500600700800
Cou
nts
(x10
00)
Sorted alphabetically Much less useful
1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119
1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC
Schools 2013 ($ millions)
School Support y - ybar Z-score
Maryland 155 64 179
UVA 131 40 112
Louisville 109 18 050
UNC 92 01 003
VaTech 79 -12 -034
FSU 79 -12 -034
GaTech 71 -20 -056
NCSU 65 -26 -073
Clemson 38 -53 -147
Mean=91000 s=35697
Sum = 0 Sum = 0
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score
1 103
2 -103
3 239
4 1865
5 -1865
Section 34Measures of Position (also called Measures of Relative Standing)
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
79 years so 79 is an outlier The line from the top
end of the box is drawn to the biggest number in the
data that is less than 705
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
ATM Withdrawals by Day Month Holidays
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15
15(IQR)=15(15)=225
Q1 - 15(IQR) 63 ndash 225=405
Q3 + 15(IQR) 78 + 225=1005
7063 78405 100545
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who
gained at least 50 yards What is the approximate value of Q3
0 136273
410547
684821
9581095
12321369
Pass Catching Yards by Receivers
1 450
2 750
3 215
4 545
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Automating Boxplot Construction
Excel ldquoout of the boxrdquo does not draw boxplots
Many add-ins are available on the internet that give Excel the capability to draw box plots
Statcrunch (httpstatcrunchstatncsuedu) draws box plots
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Tuition 4-yr Colleges
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph
The correlation is due to a third ldquolurkingrdquo variable ndash playing time
correlation r = 935
End of Chapter 3
>
Chapter 3 Descriptive Statistics Graphical and Numerical Summa
Section 31 Displaying Categorical Data
The three rules of data analysis wonrsquot be difficult to remember
Bar Charts show counts or relative frequency for each category
Pie Charts shows proportions of the whole in each category
Example Top 10 causes of death in the United States
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Internships
Trend Student Debt by State (grads of public 4 yr or more)
Slide 14
Slide 15
Unnecessary dimension in a pie chart
Section 31 continued Displaying Quantitative Data
Frequency Histograms
Relative Frequency Histogram of Exam Grades
Histograms
Histograms Showing Different Centers
Histograms - Same Center Different Spread
Histograms Shape
Shape (cont)Female heart attack patients in New York state
Shape (cont) outliers All 200 m Races 202 secs or less
Shape (cont) Outliers
Excel Example 2012-13 NFL Salaries
Statcrunch Example 2012-13 NFL Salaries
Heights of Students in Recent Stats Class (Bimodal)
Example Grades on a statistics exam
Example-2 Frequency Distribution of Grades
Example-3 Relative Frequency Distribution of Grades
Relative Frequency Histogram of Grades
Based on the histo-gram about what percent of the values are b
Stem and leaf displays
Example employee ages at a small company
Suppose a 95 yr old is hired
Number of TD passes by NFL teams 2012-2013 season (stems are 1
Pulse Rates n = 138
AdvantagesDisadvantages of Stem-and-Leaf Displays
Population of 185 US cities with between 100000 and 500000
Back-to-back stem-and-leaf displays TD passes by NFL teams 19
Below is a stem-and-leaf display for the pulse rates of 24 wome
Other Graphical Methods for Data
Unemployment Rate by Educational Attainment
Water Use During Super Bowl XLV (Packers 31 Steelers 25)
Heat Maps
Word Wall (customer feedback)
Section 32 Describing the Center of Data
2 characteristics of a data set to measure
Notation for Data Values and Sample Mean
Simple Example of Sample Mean
Population Mean
Connection Between Mean and Histogram
The median another measure of center
Student Pulse Rates (n=62)
The median splits the histogram into 2 halves of equal area
Mean balance point Median 50 area each half mean 5526 year
Medians are used often
Examples
Below are the annual tuition charges at 7 public universities
Below are the annual tuition charges at 7 public universities (2)
Properties of Mean Median
Example class pulse rates
2010 2014 baseball salaries
Disadvantage of the mean
Mean Median Maximum Baseball Salaries 1985 - 2014
Skewness comparing the mean and median
Skewed to the left negatively skewed
Symmetric data
Section 33 Describing Variability of Data
Recall 2 characteristics of a data set to measure
Ways to measure variability
Example
The Sample Standard Deviation a measure of spread around the m
Calculations hellip
Slide 77
Population Standard Deviation
Remarks
Remarks (cont)
Remarks (cont) (2)
Review Properties of s and s
Summary of Notation
Section 33 (cont) Using the Mean and Standard Deviation Toget
68-95-997 rule
The 68-95-997 rule If the histogram of the data is approximat
68-95-997 rule 68 within 1 stan dev of the mean
68-95-997 rule 95 within 2 stan dev of the mean
Example textbook costs
Example textbook costs (cont)
Example textbook costs (cont) (2)
Example textbook costs (cont) (3)
The best estimate of the standard deviation of the menrsquos weight
Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
Z-scores Standardized Data Values
z-score corresponding to y
Slide 97
Comparing SAT and ACT Scores
Z-scores add to zero
Recently the mean tuition at 4-yr public collegesuniversities
Section 34 Measures of Position (also called Measures of Relat
Slide 102
Quartiles and median divide data into 4 pieces
Quartiles are common measures of spread
Rules for Calculating Quartiles
Example (2)
Pulse Rates n = 138 (2)
Below are the weights of 31 linemen on the NCSU football team
Interquartile range another measure of spread
Example beginning pulse rates
Below are the weights of 31 linemen on the NCSU football team (2)
5-number summary of data
Slide 113
Boxplot display of 5-number summary
Slide 115
ATM Withdrawals by Day Month Holidays
Slide 117
Beg of class pulses (n=138)
Below is a box plot of the yards gained in a recent season by t
Rock concert deaths histogram and boxplot
Automating Boxplot Construction
Tuition 4-yr Colleges
Section 35 Bivariate Descriptive Statistics
Basic Terminology
Contingency Tables for Bivariate Categorical Data
Marginal distribution of class Bar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical Data - 3
TV viewers during the Super Bowl in 2013 What is the marginal
TV viewers during the Super Bowl in 2013 What percentage watch
TV viewers during the Super Bowl in 2013 Given that a viewer d
Section 35 Bivariate Descriptive Statistics (2)
Slide 135
Scatterplot Blood Alcohol Content vs Number of Beers
Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
The correlation coefficient r
Correlation Fuel Consumption vs Car Weight
Properties r ranges from -1 to+1
Properties (cont) High correlation does not imply cause and ef
Properties Cause and Effect
Properties Cause and Effect
End of Chapter 3
Basic Terminology Univariate data 1 variable is measured
on each sample unit or population unit For example height of each student in a sample
Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)
Contingency Tables for Bivariate Categorical Data
Example Survival and class on the Titanic
Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201
Marginal distributions marg dist of survival
7102201 323
14912201 677
marg dist of class
8852201 402
3252201 148
2852201 129
7062201 321
Marginal distribution of classBar chart
Marginal distribution of class Pie chart
Contingency Tables for Bivariate Categorical Data - 2
Conditional distributionsGiven the class of a passenger what is the chance the passenger survived
ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
Conditional distributions segmented bar chart
Contingency Tables for Bivariate Categorical
Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and
survivors What fraction of the first class passengers
survived ClassCrew First Second Third Total
Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323
Dead Count 673 123 167 528 1491 of col 760 378 586 748 677
Total Count 885 325 285 706 2201
202710
2022201
202325
TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only
1 80
2 235
3 582
4 277
TV viewers during the Super Bowl in 2013 What percentage watched the game and were female
1 418
2 388
3 512
4 198
TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male
1 452
2 488
3 268
4 277
Section 35Bivariate Descriptive Statistics
Contingency Tables for Bivariate Categorical Data
Scatterplots and Correlation for Bivariate Quantitative Data
Previous slidesNext
Student Beers Blood Alcohol
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Here we have two quantitative
variables for each of 16 students
1) How many beers
they drank and
2) Their blood alcohol
level (BAC)
We are interested in the
relationship between the
two variables How is
one affected by changes
in the other one
Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables
1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y
Student Beers BAC
1 5 01
2 2 003
3 9 019
4 7 0095
5 3 007
6 3 002
7 4 007
8 5 0085
9 8 012
10 3 004
11 5 006
12 5 005
13 6 01
14 7 009
15 1 001
16 4 005
Scatterplot Blood Alcohol Content vs Number of Beers
In a scatterplot one axis is used to represent each of the
variables and the data are plotted as points on the graph