Which of these two relationships is “tighter?”

Post on 25-Feb-2016

20 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Which of these two relationships is “tighter?”. The relationship on the left appears “tighter” for three reasons:. Cognition bias . Simple linear relationships are easier to “eyeball” than complex relationships. Information bias . Rounding masks information. - PowerPoint PPT Presentation

Transcript

1

Factor Outcome5 105 1110 -52 206 81 236 71 221 219 -36 87 49 -35 93 178 23 172 207 59 -2

Which of these two relationships is “tighter?”

Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9

2

The relationship on the left appears “tighter” for three reasons:

Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9

1. Cognition bias. Simple linear relationships are easier to “eyeball” than complex relationships.

2. Information bias. Rounding masks information.

3. Confirmation bias. Tendency to focus on observations that confirm beliefs and ignore observations that contradict beliefs.

3

-4

-2

0

2

4

6

8

10

12

-4 -2 0 2 4 6 8 10 12

Factor

Out

com

e

Factor Outcome10 119 96 61 1-2 -23 42 25 310 97 78 102 15 41 38 78 92 410 117 610 9

4

Factor Outcome5 105 1110 -52 206 81 236 71 221 219 -36 87 49 -35 93 178 23 172 207 59 -2

-10

-5

0

5

10

15

20

25

0 2 4 6 8 10 12

Factor

Out

com

e

5

Lesson #1Never trust your eyes.

6

CorollaryDon’t trust summary statistics either.

Anscombe’s quartetFour data sets that yield identical summary

statistics.

7

x y x y x y x y10 8.04 10 9.14 10 7.46 8 6.588 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.719 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.4714 9.96 14 8.1 14 8.84 8 7.046 7.24 6 6.13 6 6.08 8 5.254 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.567 4.82 7 7.26 7 6.42 8 7.915 5.68 5 4.74 5 5.73 8 6.89

Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50Stdev 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03Corr

alpha hatbeta hat

Anscombe's quartet

0.503.000.50

3.000.50

3.000.50

I II III IV

3.00

0.82 0.82 0.82 0.82

8

9

Lesson #1Never trust your eyes.

(Don’t trust summary statistics either)

Lesson #2Always employ sanity checks.

10

6.0%

6.5%

7.0%

7.5%

8.0%

8.5%

9.0%

9.5%

10.0%

1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 20021.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

Conventional Mortgage Rates Mystery Variable from 2 Years Prior

11

6.0%

6.5%

7.0%

7.5%

8.0%

8.5%

9.0%

9.5%

10.0%

1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 20021.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

Conventional Mortgage Rates Mystery Variable from 2 Years Prior

Mystery variable explains 57% of the variation in mortgage rates.Relationship is: Rate 0.03 0.02 Mystery Variable

12

Mystery variable is Algeria’s GDP-relative-to-Trade

Spurious Results

An infinite number of factors can attempt to explain a given outcome.

Look hard enough and you are guaranteed to find a perfect predictor.

If the factor is “spurious,” what you are observing is random chance.

13

Mystery variable is Algeria’s GDP-relative-to-Trade.

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

16.0%

18.0%

1977

1979

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

1

1.5

2

2.5

3

3.5

4

Conventional Mortgage Rates Mystery Variable from 2 Years Prior

By random chance, the mystery variable predicts mortgage rates over this period.

14

DJIA will be down

tomorrow!

DJIA will be down

tomorrow!

.

.

.

DJIA will be up tomorrow!

DJIA will be up tomorrow!

.

.

.

200,000 letters

200,000 letters

If you wait long enough, randomness will tell you anything you want to hear.

DJIA will be down

tomorrow!

DJIA will be down

tomorrow!

.

.

.

DJIA will be up tomorrow!

DJIA will be up tomorrow!

.

.

.

100,000 letters

100,000 letters

DJIA will be down

tomorrow!

DJIA will be down

tomorrow!

.

.

.

DJIA will be up tomorrow!

DJIA will be up tomorrow!

.

.

.

50,000 letters

50,000 letters

DJIA will be down

tomorrow!

DJIA will be down

tomorrow!

.

.

.

DJIA will be up tomorrow!

DJIA will be up tomorrow!

.

.

.

25,000 letters

25,000 letters

15

0

20

40

60

80

100

120

140

160

180

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

10

20

30

40

50

60

Number of Sunspots in the Current Year (left axis)

Number of Republicans in the Senate 1 Year in the Future (right axis)

Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearlywww.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm

16

Counter argument:

Spurious or not, sunspots would have been useful at predicting Republicans in the Senate.

Fallacy:

We see the correlation in hindsight. To be useful, we need to detect the correlation before it ceases to exist.

17

0

20

40

60

80

100

120

140

160

180

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

20

30

40

50

60

70

80

Number of Sunspots in the Current Year (left axis)

Number of Republicans in the Senate 1 Year in the Future (right axis)

0

20

40

60

80

100

120

140

160

180

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

10

20

30

40

50

60

Number of Sunspots in the Current Year (left axis)

Number of Republicans in the Senate 1 Year in the Future (right axis)

Source: ftp.ngdc.noaa.gov/stp/solar_data/sunspot_numbers/yearlywww.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm

1960 – 1980 1981 – 2005

18

19

20

21

22

23

Lesson #1Never trust your eyes.

(Don’t trust summary statistics either)

Lesson #2Always employ sanity checks.

Lesson #3An observation is meaningless.

CorollaryAn anecdote is both meaningless and dangerous.

24

Left half of room: Don’t look.Right half of room: Write what you read.

25

The average person in Benin earns an annual income of $750 (in U.S. dollars).

26

Right half of room: Don’t look.Left half of room: Write what you read.

27

The average person in Andorra earns an annual income of $40,000 (in U.S. dollars).

28

The average person on planet Earth earns what annual income (in U.S. dollars)?

29

AnchoringWhen we see a piece of information, we evaluate subsequent information in light of the first piece of information.

InformationNews interview of a single mother working three jobs to support her family.

Policy QuestionDo we need welfare reform?

ProblemHow common is this example?

30

Left half of room: Don’t look.Right half of room: Read and answer.

31

Should we require school districts to pay to install seat belts on school buses?

1 2 3 4 5Definitely not! Absolutely!

32

Right half of room: Don’t look.Left half of room: Read and answer.

33

Every year in the U.S., 17,000 children are treated for injuries sustained in school buses accidents.Most of these injuries could have been avoided had the children been wearing seat belts.Should we require school districts to pay to install seat belts on school buses?

1 2 3 4 5Definitely not! Absolutely!

34

AvailabilityIt’s easier to see what’s in front of us that it is to see what isn’t.

InformationNews report showing the benefit of school bus seat belts.

Policy QuestionShould we require seat belts in school buses?

ProblemWhat is the expected benefit and what are the tradeoffs?

35

Lesson #1Never trust your eyes.

Lesson #2Always employ sanity checks.

Lesson #3An observation is meaningless.

CorollaryAn anecdote is both meaningless and dangerous.

Lesson #4Not everything that appears random is.

X1

Y

X2

1

2

ˆ 50.01 8.65ˆ 0.11 0.14

0.01

y X u

R

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

2

2

ˆ 1.18 7.56ˆ 0.50 0.06

0.55

y X u

R

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

1 1 2 2

1

2

2

ˆ 0.00 0.00ˆ 1.00 0.00ˆ 1.00 0.00

1.00

y X X u

R

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

X1

Y

X2

217

RegressionWhy do we do this?

218

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.

Approach #1: Calculate Average Time per Mile

Trucks in the data set required a total of 87 hours to travel a total of 4,000 miles. Dividing hours by miles, we find an average of 0.02 hours per mile journeyed.

Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

(0.02 hours per mile) (200 miles) = 4 hours

219

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.

Approach #2: Calculate Average Time per Delivery

Trucks in the data set required a total of 87 hours to make 29 deliveries. Dividing hours by deliveries, we find an average of 3 hours per delivery.

Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

(3 hours per delivery) (3 deliveries) = 9 hours

220

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.

Approach #3: Combine Average Time per Mile and Average Time per Delivery

Trucks in the data set required 0.02 hours per mile journeyed and 3 hours per delivery.

Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

(0.02 hours per mile) (200 miles) + (3 hours per delivery) (3 deliveries) = 13 hours

221

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.

Problems

1. Combining average time per delivery and average time per mile will double-count time if delivery and miles are correlated.

2. We have ignored a possible fixed effect – an amount of “overhead” time that is required regardless of the number of miles and deliveries.

Miles Traveled Deliveries Travel Time (hours)500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

222

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)

500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

0 1

0

1

Time (deliveries )

ˆ 5.38ˆ 1.14

i i iu

5.38 hours + (1.14 hours per delivery) (3 deliveries) = 8.8 hours

223

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)

500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

0 1

0

1

Time (miles )

ˆ 3.27ˆ 0.01

i i iu

3.27 hours + (0.01 hours per mile) (200 miles) = 5.27 hours

224

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)

500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

0 1 2

0

1

2

Time (miles ) (deliveries )

ˆ 1.13ˆ 0.01ˆ 0.92

i i i iu

1.13 hours + (0.01 hours per mile) (200 miles) + (0.92 hours per delivery) (3 deliveries)= 5.89 hours

225

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.Miles Traveled Deliveries Travel Time (hours)

500 4 11.3250 3 6.8500 4 10.9500 2 8.5250 2 6.2400 2 8.2375 3 9.4325 4 8450 3 9.6450 2 8.1

1 2

1

2

Time (miles ) (deliveries )

ˆ 0.01ˆ 1.07

i i i iu

(0.01 hours per mile) (200 miles) + (1.07 hours per delivery) (3 deliveries)= 5.21 hours

226

A trucking company wants to be able to predict the round-trip travel time of its trucks. Use the data below to predict the round-trip travel time for a truck that will be travelling 200 miles and making 3 deliveries.

Hours per Mile Hours per Delivery Fixed Hours Estimated Hours0.02 4.00

3.00 9.000.02 3.00 13.00

1.14 5.38 8.800.01 3.27 5.270.01 0.92 1.13 5.890.01 1.07 5.21

top related