HUDM4122 Probability and Statistical Inference January 26, 2015
Dec 17, 2015
HUDM4122Probability and Statistical Inference
January 26, 2015
ASSISTments
• Did everyone get an account for the ASSISTments system?
• Did anyone have difficulties setting up an account?
• First homework is due in a week
Today
• Ch. 1 in Mendenhall, Beaver, & Beaver
• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis
Variables
• What is a variable?
Variables
• What is a variable?
• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” – MBB p. 8
Which of these are examples of variables?
• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in
ASSISTments• Favorite vegetable• Favorite type of pie• Pi
What is a measurement?
What is a measurement?
• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas
A measurement
• Person furthest towards my left in the front row, what is your name?
Now I have a measurement
A measurement
• Person furthest towards my right in the second row, what is your name?
Now I have data
• A set of measurements
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
Everyone repeat after me
Everyone repeat after me
• “My data are in this Excel file.”
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that
conclusion.”
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that
conclusion.”• “His data were hard to collect.”
However…
However…
• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner
Any questions or concerns?
Univariate Data
• A single variable is collected
Height5’11”5’11”5’10”5’6”
Univariate Data
• Two variables are collected (for the same data point)
Height Drum-Playing Skill5’11” 15’11” 25’10” 45’6” 8
Multivariate Data
• 3+ variables are collected
Name Height Drum-Playing SkillJohn Lennon 5’11” 1
Paul McCartney 5’11” 2George Harrison 5’10” 4
Ringo Starr 5’6” 8
Any questions or concerns?
Types of Variables
Quantitative/Numerical Data
• Data that can be expressed as numbers
What are some examples
• Of numerical data?
Ordinal Data
• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be
equal
Examples of Ordinal Data
• Months of the year: January, February, March, April, …
• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree
• Quality of university: Highly selective, selective, somewhat selective, non-selective
Other examples of ordinal data?
Nominal data
• Values have no order or spacing
• Name• State of Residence– New Jersey is not greater or less than New York
Nominal data
• Values have no order or spacing
• Name• State of Residence– New Jersey is not greater or less than New York– Although my brother might disagree
Other Examples of Nominal Data?
Another name
• Nominal data is often also called categorical data
Another name
• Nominal data is often also called categorical data
• Technically ordinal data is also categorical, but no one ever uses the term that way
Any questions or concerns?
Exploratory Data Analysis
• “Analyzing data sets to summarize their main characteristics”
• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”
Goal
• Generate hypotheses• Understand your data better
Often (but not always)done with graphs
Which of these is your favorite type of graph?
• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem-and-leaf plot• Box plot• Other
Pie Chart
• Take a set of categories that add to 100%• Show the proportion each category has
Pie Chart: Example
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
Interpret This Graph Please
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
Never Ever Do This:Completely Visually Misleading
Fair use; critique
Let’s make a pie chart
• Using the “your favorite graph” data
Any questions?
Alternative: Bar Graphs
Pumpkin Apple Cherry Rhubarb Banana Cream
0
5
10
15
20
25
30
What is everyone's favorite pie?
Interpret this graph please
Pumpkin Apple Cherry Rhubarb Banana Cream
0
5
10
15
20
25
30
What is everyone's favorite pie?
What are the advantages/disadvantages relative to pie chart?
Pumpkin Apple Cherry Rhubarb Banana Cream
0
5
10
15
20
25
30
What is everyone's favorite pie?
By the way: X and Y axes
Pumpkin Apple Cherry Rhubarb Banana Cream
0
5
10
15
20
25
30
What is everyone's favorite pie?
X axis
Y axis
Strengths of bar graphs
• Categories don’t have to add to 100%• Easier to see small differences between
categories• You can compare variables too
Two-group bar graph
Football Team Chess Team Spiderman Team
0
10
20
30
40
50
60
School Rankings
Midtown High
Harlem Success Academy
Qua
lity
(Hig
her i
s Be
tter
)
Let’s make a bar graph
• Using the “your favorite graph” data
Any questions?
Some suggest always using bar graphs instead of pie charts
Some suggest always using bar graphs instead of pie charts
• “The only thing worse than a pie chart is several of them.” – Edward Tufte
• “Save the pies for dessert.” – Stephen Few
But they’re wrong
But they’re wrong
• Pie charts are good for representing part-whole relationships in really easy to see ways
• Pie charts are good at representing overall proportions
Nice example(Gabrielle, 2013)
Any questions?
Frequency Histogram
• A type of bar graph – But usually when people say “bar graph”, they do
not mean “frequency histogram”– Also: by convention, no space between bars
• X axis shows values or ranges of a quantitative variable
• Y axis shows how many data points have that value or range for the quantitative variable
Example from the book
Visits to Starbucks
Another Example
51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100
0
2
4
6
8
10
12
14
16
18
Exam Grade
Freq
uenc
y
Was this an easy exam or a hard exam?
51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100
0
2
4
6
8
10
12
14
16
18
Exam Grade
Freq
uenc
y
Would you rather be in the blue class or the orange class?
51-5561-65
71-7581-85
91-950
2
4
6
8
10
12
14
16
18
Exam Grade
Freq
uenc
y
0
2
4
6
8
10
12
14
16
18
Exam Grade
By the way: outliers
31-35
36-40
41-45
46-50
51-55
56-60
61-65
66-70
71-75
76-80
81-85
86-90
91-95
96-100
0
2
4
6
8
10
12
14
16
18
Exam Grade
Freq
uenc
y
OUTLIER
If there’s time, let’s make a frequency histogram
• Everybody: What’s your height in feet-inches?
• (Example: I’m 5’9”)
Any questions?
Line Graph
• Shows trends from left-to-right• The trend is usually over time• But it doesn’t have to be…
Example Line Graph
http://www.wilderdom.com/personality/L4-1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License
Example Line Graph(VanLehn, 2011)
(This graph shows perceptions, not data on effectiveness.)
Any questions?
Not going to discuss today
• Stem-and-leaf plot
• Very, very rare to see in actual use• Quite poor for any sizable data set
• If you want to learn about them, see the book
Future Classes
• Scatterplot• Box plot
Upcoming Classes
• 1/28 Describing Data with Numerical Measures– Ch. 2
• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3
• 2/4 Introduction to Probability– Ch. 4
Questions? Comments?