Top Banner
HUDM4122 Probability and Statistical Inference January 26, 2015
78

HUDM4122 Probability and Statistical Inference January 26, 2015.

Dec 17, 2015

Download

Documents

Holly Tyler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HUDM4122 Probability and Statistical Inference January 26, 2015.

HUDM4122Probability and Statistical Inference

January 26, 2015

Page 2: HUDM4122 Probability and Statistical Inference January 26, 2015.

ASSISTments

• Did everyone get an account for the ASSISTments system?

• Did anyone have difficulties setting up an account?

• First homework is due in a week

Page 3: HUDM4122 Probability and Statistical Inference January 26, 2015.

Today

• Ch. 1 in Mendenhall, Beaver, & Beaver

• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis

Page 4: HUDM4122 Probability and Statistical Inference January 26, 2015.

Variables

• What is a variable?

Page 5: HUDM4122 Probability and Statistical Inference January 26, 2015.

Variables

• What is a variable?

• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” – MBB p. 8

Page 6: HUDM4122 Probability and Statistical Inference January 26, 2015.

Which of these are examples of variables?

• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in

ASSISTments• Favorite vegetable• Favorite type of pie• Pi

Page 7: HUDM4122 Probability and Statistical Inference January 26, 2015.

What is a measurement?

Page 8: HUDM4122 Probability and Statistical Inference January 26, 2015.

What is a measurement?

• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas

Page 9: HUDM4122 Probability and Statistical Inference January 26, 2015.

A measurement

• Person furthest towards my left in the front row, what is your name?

Page 10: HUDM4122 Probability and Statistical Inference January 26, 2015.

Now I have a measurement

Page 11: HUDM4122 Probability and Statistical Inference January 26, 2015.

A measurement

• Person furthest towards my right in the second row, what is your name?

Page 12: HUDM4122 Probability and Statistical Inference January 26, 2015.

Now I have data

• A set of measurements

Page 13: HUDM4122 Probability and Statistical Inference January 26, 2015.

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

Page 14: HUDM4122 Probability and Statistical Inference January 26, 2015.

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Page 15: HUDM4122 Probability and Statistical Inference January 26, 2015.

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Page 16: HUDM4122 Probability and Statistical Inference January 26, 2015.

Everyone repeat after me

Page 17: HUDM4122 Probability and Statistical Inference January 26, 2015.

Everyone repeat after me

• “My data are in this Excel file.”

Page 18: HUDM4122 Probability and Statistical Inference January 26, 2015.

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that

conclusion.”

Page 19: HUDM4122 Probability and Statistical Inference January 26, 2015.

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that

conclusion.”• “His data were hard to collect.”

Page 20: HUDM4122 Probability and Statistical Inference January 26, 2015.

However…

Page 21: HUDM4122 Probability and Statistical Inference January 26, 2015.

However…

• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner

Page 22: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions or concerns?

Page 23: HUDM4122 Probability and Statistical Inference January 26, 2015.

Univariate Data

• A single variable is collected

Height5’11”5’11”5’10”5’6”

Page 24: HUDM4122 Probability and Statistical Inference January 26, 2015.

Univariate Data

• Two variables are collected (for the same data point)

Height Drum-Playing Skill5’11” 15’11” 25’10” 45’6” 8

Page 25: HUDM4122 Probability and Statistical Inference January 26, 2015.

Multivariate Data

• 3+ variables are collected

Name Height Drum-Playing SkillJohn Lennon 5’11” 1

Paul McCartney 5’11” 2George Harrison 5’10” 4

Ringo Starr 5’6” 8

Page 26: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions or concerns?

Page 27: HUDM4122 Probability and Statistical Inference January 26, 2015.

Types of Variables

Page 28: HUDM4122 Probability and Statistical Inference January 26, 2015.

Quantitative/Numerical Data

• Data that can be expressed as numbers

Page 29: HUDM4122 Probability and Statistical Inference January 26, 2015.

What are some examples

• Of numerical data?

Page 30: HUDM4122 Probability and Statistical Inference January 26, 2015.

Ordinal Data

• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be

equal

Page 31: HUDM4122 Probability and Statistical Inference January 26, 2015.

Examples of Ordinal Data

• Months of the year: January, February, March, April, …

• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree

• Quality of university: Highly selective, selective, somewhat selective, non-selective

Page 32: HUDM4122 Probability and Statistical Inference January 26, 2015.

Other examples of ordinal data?

Page 33: HUDM4122 Probability and Statistical Inference January 26, 2015.

Nominal data

• Values have no order or spacing

• Name• State of Residence– New Jersey is not greater or less than New York

Page 34: HUDM4122 Probability and Statistical Inference January 26, 2015.

Nominal data

• Values have no order or spacing

• Name• State of Residence– New Jersey is not greater or less than New York– Although my brother might disagree

Page 35: HUDM4122 Probability and Statistical Inference January 26, 2015.

Other Examples of Nominal Data?

Page 36: HUDM4122 Probability and Statistical Inference January 26, 2015.

Another name

• Nominal data is often also called categorical data

Page 37: HUDM4122 Probability and Statistical Inference January 26, 2015.

Another name

• Nominal data is often also called categorical data

• Technically ordinal data is also categorical, but no one ever uses the term that way

Page 38: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions or concerns?

Page 39: HUDM4122 Probability and Statistical Inference January 26, 2015.

Exploratory Data Analysis

• “Analyzing data sets to summarize their main characteristics”

• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”

Page 40: HUDM4122 Probability and Statistical Inference January 26, 2015.

Goal

• Generate hypotheses• Understand your data better

Page 41: HUDM4122 Probability and Statistical Inference January 26, 2015.

Often (but not always)done with graphs

Page 42: HUDM4122 Probability and Statistical Inference January 26, 2015.

Which of these is your favorite type of graph?

• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem-and-leaf plot• Box plot• Other

Page 43: HUDM4122 Probability and Statistical Inference January 26, 2015.

Pie Chart

• Take a set of categories that add to 100%• Show the proportion each category has

Page 44: HUDM4122 Probability and Statistical Inference January 26, 2015.

Pie Chart: Example

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Page 45: HUDM4122 Probability and Statistical Inference January 26, 2015.

Interpret This Graph Please

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Page 46: HUDM4122 Probability and Statistical Inference January 26, 2015.

Never Ever Do This:Completely Visually Misleading

Fair use; critique

Page 47: HUDM4122 Probability and Statistical Inference January 26, 2015.

Let’s make a pie chart

• Using the “your favorite graph” data

Page 48: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions?

Page 49: HUDM4122 Probability and Statistical Inference January 26, 2015.

Alternative: Bar Graphs

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

Page 50: HUDM4122 Probability and Statistical Inference January 26, 2015.

Interpret this graph please

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

Page 51: HUDM4122 Probability and Statistical Inference January 26, 2015.

What are the advantages/disadvantages relative to pie chart?

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

Page 52: HUDM4122 Probability and Statistical Inference January 26, 2015.

By the way: X and Y axes

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

X axis

Y axis

Page 53: HUDM4122 Probability and Statistical Inference January 26, 2015.

Strengths of bar graphs

• Categories don’t have to add to 100%• Easier to see small differences between

categories• You can compare variables too

Page 54: HUDM4122 Probability and Statistical Inference January 26, 2015.

Two-group bar graph

Football Team Chess Team Spiderman Team

0

10

20

30

40

50

60

School Rankings

Midtown High

Harlem Success Academy

Qua

lity

(Hig

her i

s Be

tter

)

Page 55: HUDM4122 Probability and Statistical Inference January 26, 2015.

Let’s make a bar graph

• Using the “your favorite graph” data

Page 56: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions?

Page 57: HUDM4122 Probability and Statistical Inference January 26, 2015.

Some suggest always using bar graphs instead of pie charts

Page 58: HUDM4122 Probability and Statistical Inference January 26, 2015.

Some suggest always using bar graphs instead of pie charts

• “The only thing worse than a pie chart is several of them.” – Edward Tufte

• “Save the pies for dessert.” – Stephen Few

Page 59: HUDM4122 Probability and Statistical Inference January 26, 2015.

But they’re wrong

Page 60: HUDM4122 Probability and Statistical Inference January 26, 2015.

But they’re wrong

• Pie charts are good for representing part-whole relationships in really easy to see ways

• Pie charts are good at representing overall proportions

Page 61: HUDM4122 Probability and Statistical Inference January 26, 2015.

Nice example(Gabrielle, 2013)

Page 62: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions?

Page 63: HUDM4122 Probability and Statistical Inference January 26, 2015.

Frequency Histogram

• A type of bar graph – But usually when people say “bar graph”, they do

not mean “frequency histogram”– Also: by convention, no space between bars

• X axis shows values or ranges of a quantitative variable

• Y axis shows how many data points have that value or range for the quantitative variable

Page 64: HUDM4122 Probability and Statistical Inference January 26, 2015.

Example from the book

Visits to Starbucks

Page 65: HUDM4122 Probability and Statistical Inference January 26, 2015.

Another Example

51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

Page 66: HUDM4122 Probability and Statistical Inference January 26, 2015.

Was this an easy exam or a hard exam?

51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

Page 67: HUDM4122 Probability and Statistical Inference January 26, 2015.

Would you rather be in the blue class or the orange class?

51-5561-65

71-7581-85

91-950

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

0

2

4

6

8

10

12

14

16

18

Exam Grade

Page 68: HUDM4122 Probability and Statistical Inference January 26, 2015.

By the way: outliers

31-35

36-40

41-45

46-50

51-55

56-60

61-65

66-70

71-75

76-80

81-85

86-90

91-95

96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

OUTLIER

Page 69: HUDM4122 Probability and Statistical Inference January 26, 2015.

If there’s time, let’s make a frequency histogram

• Everybody: What’s your height in feet-inches?

• (Example: I’m 5’9”)

Page 70: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions?

Page 71: HUDM4122 Probability and Statistical Inference January 26, 2015.

Line Graph

• Shows trends from left-to-right• The trend is usually over time• But it doesn’t have to be…

Page 72: HUDM4122 Probability and Statistical Inference January 26, 2015.

Example Line Graph

http://www.wilderdom.com/personality/L4-1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License

Page 73: HUDM4122 Probability and Statistical Inference January 26, 2015.

Example Line Graph(VanLehn, 2011)

(This graph shows perceptions, not data on effectiveness.)

Page 74: HUDM4122 Probability and Statistical Inference January 26, 2015.

Any questions?

Page 75: HUDM4122 Probability and Statistical Inference January 26, 2015.

Not going to discuss today

• Stem-and-leaf plot

• Very, very rare to see in actual use• Quite poor for any sizable data set

• If you want to learn about them, see the book

Page 76: HUDM4122 Probability and Statistical Inference January 26, 2015.

Future Classes

• Scatterplot• Box plot

Page 77: HUDM4122 Probability and Statistical Inference January 26, 2015.

Upcoming Classes

• 1/28 Describing Data with Numerical Measures– Ch. 2

• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3

• 2/4 Introduction to Probability– Ch. 4

Page 78: HUDM4122 Probability and Statistical Inference January 26, 2015.

Questions? Comments?