Probability Theory: Paradoxes and Pitfalls Great Theoretical Ideas In Computer Science Steven Rudich, Anupam Gupta CS 15-251 Spring 2004 Lecture 19 March.

Post on 21-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Probability Theory:Paradoxes and Pitfalls

Great Theoretical Ideas In Computer ScienceGreat Theoretical Ideas In Computer Science

Steven Rudich, Steven Rudich, Anupam GuptaAnupam Gupta CS 15-251 Spring CS 15-251 Spring 20042004

Lecture 19 March 23, 2004Lecture 19 March 23, 2004 Carnegie Mellon Carnegie Mellon UniversityUniversity

Probability Distribution

A (finite) A (finite) probability distributionprobability distribution DD • a finite set a finite set SS of elements (samples) of elements (samples)• each each xx22SS has has probabilityprobability p(x)p(x) 22 [0,1] [0,1]

S

weights must sum to 1

0.30.30.2

0.1

0.050.050

“Sample space”

Probability Distribution

S

0.3

0.30.2

0.1

0.050.050

An “Event” is a subset

S

0.3

0.30.2

0.1

0.050.050

Pr[A] = 0.55

A

Probability Distribution

S

0.3

0.30.2

0.1

0.050.050

Total money = 1

Conditional probabilities

SA

Pr[x | A] = 0

Pr[y | A] = Pr[y] / Pr[A]

Conditional probabilities

SA

B

Pr [ B | A ] = x 2 B Pr[ x | A ]

Conditional probabilities

SA

B

Pr [ B | A ] = x 2 B Pr[ x | A ] = x 2 A Å B Pr[ x | A ]

= x 2 A Å B Pr[ x ] / Pr[A] = Pr[ A Å B ] / Pr[A]

Now, on to some fun puzzles!Now, on to some fun puzzles!

You have 3 dice

2 Players each rolls a 2 Players each rolls a die.die.

The player with the The player with the higher numberhigher number wins wins

A

B

C

You have 3 dice

Which die is Which die is best to have best to have – – AA, B, or , B, or CC ? ?

A

B

C

A is better than B

When rolled, 9 equally likely When rolled, 9 equally likely outcomesoutcomes

77 9 977 5 577 1 1

66 9 966 5 566 1 1

22 9 922 5 5 22 1 1

A A beats B beats B 5/95/9 of the time of the time

B is better than C

Again, 9 equally likely outcomesAgain, 9 equally likely outcomes

1 1 33 1 1 44 1 1 88

5 5 33 5 5 44 5 5 8 8

9 9 33 9 9 44 9 9 88

B beats B beats C C 5/95/9 of the time of the time

A beats B with Prob. 5/9B beats C with Prob. 5/9

Q) If you chose first, which die would Q) If you chose first, which die would you take?you take?

Q) If you chose second, which die Q) If you chose second, which die would you take?would you take?

C is better than A!

Alas, the same story!Alas, the same story!

3 3 22 3 3 66 3 3 7 7

4 4 22 4 4 66 4 4 77

88 22 8 8 66 8 8 77

C C beats beats A A 5/95/9 of the time! of the time!

First Moral

““Obvious” properties, such as Obvious” properties, such as transitivity, associativity, transitivity, associativity,

commutativity, etc… commutativity, etc… need to be rigorously argued.need to be rigorously argued.

Because sometimes they are Because sometimes they are

FALSE.FALSE.

Second Moral

Stay on your toes!Stay on your toes!

When reasoning about probabilities….When reasoning about probabilities….

Third Moral

To make money from a sucker in a To make money from a sucker in a bar, offer him the first choice of die. bar, offer him the first choice of die.

(Allow him to change to your “lucky” (Allow him to change to your “lucky” die any time he wants.)die any time he wants.)

Coming up next…

More of the pitfalls of probability.More of the pitfalls of probability.

NameName a body part that almost everyone on a body part that almost everyone on earth had an above average number of.earth had an above average number of.

FINGERS !!FINGERS !!

• Almost everyone has 10Almost everyone has 10• More people are missing some than haveMore people are missing some than have extras (# fingers missing > # of extras) extras (# fingers missing > # of extras)• Average: 9.99 …Average: 9.99 …

A Puzzle…

Almost everyone can be

above average!

Is a simple average a good statistic?Is a simple average a good statistic?

Several years ago Berkeley faced a law suit …

1.1. % of male applicants admitted to % of male applicants admitted to graduate school was 10%graduate school was 10%

2.2. % of female applicants admitted % of female applicants admitted to graduate school was 5%to graduate school was 5%

Grounds for discrimination? Grounds for discrimination?

SUITSUIT

Berkeley did a survey of its departments to find out which

ones were at fault

The result wasThe result was

SHOCKING…SHOCKING…

Every department was more likely to admit a female than a

male

#of females accepted #of females accepted to department Xto department X

#of males accepted #of males accepted to department Xto department X

#of female #of female applicants to applicants to department Xdepartment X

#of male applicants #of male applicants to department Xto department X

>>

How can this be ?

Answer

Women tend to apply to Women tend to apply to departments that admit a smaller departments that admit a smaller

percentage of their applicantspercentage of their applicants

WomenWomen MenMenDeptDept AppliedApplied AccepteAccepte

ddAppliedApplied AcceptedAccepted

AA 9999 44 11 00

BB 11 11 9999 1010

totaltotal 100100 55 100100 1010

Newspapers would publish these data…

Meaningless junk!Meaningless junk!

A single summary statistic A single summary statistic (such as an (such as an averageaverage, or a , or a medianmedian) ) may not summarize the data well ! may not summarize the data well !

Try to get a white ball

Choose one box and pick a random ball from it.

Max the chance of getting a white ball…

5/11 > 3/7

Better

Try to get a white ball

Better

6/9 > 9/14 Better

Try to get a white ball

Better

Better

Try to get a white ball

Better

Better

Better

11/20 < 12/21 !!!

Simpson’s Paradox

Arises all the time…Arises all the time…

Be careful when you interpret Be careful when you interpret numbersnumbers

Department of Transportation requires that each month all airlines report their “on-time

record”

# of on-time flights landing at # of on-time flights landing at nation’s 30 busiest airportsnation’s 30 busiest airports

# of total flights into those airports# of total flights into those airports

http://www.bts.gov/programs/oai/

Different airlines serve different airports with different frequency

An airline sending most of its planes into An airline sending most of its planes into fair weather airports will crush an airline fair weather airports will crush an airline

flying mostly into foggy airportsflying mostly into foggy airports

It can even happen that an airline has a better record at each airport, but gets a worse overall

rating by this method.

Alaska Alaska airlinesairlines

America WestAmerica West

% on % on timetime

# # flightsflights

% on % on timetime

# # flightsflights

LALA 88.988.9 559559 85.685.6 811811

PhoenixPhoenix 94.894.8 233233 92.192.1 52555255

San San DiegoDiego

91.791.7 232232 85.585.5 448448

SFSF 83.183.1 605605 71.371.3 449449

SeattleSeattle 85.885.8 21421466

76.776.7 262262

OVERALLOVERALL 86.786.7 37737755

89.189.1 72257225Alaska Air beats America West at each airport

but America West has a better overall rating!

An average may have several An average may have several different possible explanations… different possible explanations…

““Physicians are growing in number, Physicians are growing in number, but not in pay”but not in pay”

# Doctors# Doctors Average salary (1982)Average salary (1982)

19701970 334,000334,000 $103,900$103,900

19821982 480,000480,000 $99,950$99,950

Thrust of article: Market forces are at work

US News and World Report (’83)

Doctors earn more than ever.

But many old doctors have retired and been replaced with younger

ones.

Here’s another possibility

Rare diseases

Rare Disease

A person is selected at random and A person is selected at random and given test for rare disease given test for rare disease

“painanosufulitis”.“painanosufulitis”.

Only 1/10,000 people have it.Only 1/10,000 people have it.

The test is 99% accurate: it gives the wrong The test is 99% accurate: it gives the wrong answer (positive/negative) only 1% of the time.answer (positive/negative) only 1% of the time.

Does he have the disease?Does he have the disease?

What is the probability that he has the disease?What is the probability that he has the disease?

The person tests The person tests POSITIVE!!!POSITIVE!!!

sufferers· k/10,000

Disease Probability

•Suppose there are Suppose there are kk people in the population people in the population•At most At most k/k/10,00010,000 have the diseasehave the disease

k people

•But But k/100k/100 have false test results have false test results

So So k/k/100100 – k/ – k/10,00010,000 have false test results but have have false test results but have no disease!no disease!

false results k/100

It’s about 100 times more likely that he got a false positive!!

And we thought 99% accuracy was pretty good.

Conditional ProbabilitiesConditional Probabilities

You walk into a pet shop…

Shop A:Shop A: there are two parrots in a cage there are two parrots in a cage

The owner says “At least one parrot is male.”The owner says “At least one parrot is male.”

What is the chance that you get two males?What is the chance that you get two males?

Shop B:Shop B: again two parrots in a cage again two parrots in a cage

The owner says “The darker one is male.”The owner says “The darker one is male.”

What is the chance they are both male?What is the chance they are both male?FFFFFM FM MFMFMMMM

FFFF

FMFM

MFMF

MMMM

1/21/2 chance they are both chance they are both malemale

1/31/3 chance they are chance they are both maleboth male

Shop owner A says “At least one of the two is male”Shop owner A says “At least one of the two is male”

Shop owner B says “The dark one is male”Shop owner B says “The dark one is male”

Pet Shop Quiz

Intuition in probabilityIntuition in probability

Playing Alice and Bob

you beat Alice with probabilty 1/3you beat Alice with probabilty 1/3

you beat Bob with probability 5/6 you beat Bob with probability 5/6

You need to win two You need to win two consecutiveconsecutive games out of 3. games out of 3.

Should you playShould you play

Bob Alice BobBob Alice Bob or or Alice Bob AliceAlice Bob Alice??

Look closely

To win, we need To win, we need win middle game win middle game win one of {first, last} game.win one of {first, last} game.

must beat second player (for sure)must beat second player (for sure) must beat first player once in two tries.must beat first player once in two tries.

Should you playShould you playBob Alice BobBob Alice Bob or or Alice Bob AliceAlice Bob Alice??

Playing Alice and Bob

Bob Alice Bob:Bob Alice Bob:

Pr[ {WWW, WWL, LWW} ] Pr[ {WWW, WWL, LWW} ]

= = 11//3 3 (1 - (1 - 11//66* * 11//66) = 35/108.) = 35/108.

Alice Bob Alice:Alice Bob Alice:

Pr[ {WWW, WWL, LWW} ] Pr[ {WWW, WWL, LWW} ]

= = 55//6 6 (1 - (1 - 22//33* * 22//33)) = 50/108= 50/108

Bridge Hands have 13 cards

5 3 3 2 ?5 3 3 2 ?4 4 3 2 ?4 4 3 2 ?4 3 3 3 ?4 3 3 3 ?

What distribution of the 4 suits is most likely?

4 3 3 3

4 4 3 2

5 3 3 2

313 13

44 3

213 13 13

4 34 3 2

213 13 13

4 35 3 2

10 3#(4333) 34 11

4 9#(4432)510

Intuition could be wrongIntuition could be wrong

Work out the math to be 100% sureWork out the math to be 100% sure

“Law of Averages”

I flip a coin 10 times. It comes up heads each time!

What are the chances that my next coin flip is also heads?

“Law of Averages”?

“The number of heads and tails have to even out…”

Be Careful

Though the Though the sample averagesample average gets closer to ½, gets closer to ½, the deviation from the average may grow!the deviation from the average may grow!

After 100: 52 heads, sample average 0.52After 100: 52 heads, sample average 0.52deviation = 2deviation = 2

After 1000: 511 heads, sample average 0.511After 1000: 511 heads, sample average 0.511deviation = 11deviation = 11

After 10000: 5096 heads, sample average 0.5096After 10000: 5096 heads, sample average 0.5096deviation = 96deviation = 96

NN (odd) people, each of whom has a random bit (odd) people, each of whom has a random bit (50/50) on his/her forehead. (50/50) on his/her forehead.

No communication allowed. Each person goes to No communication allowed. Each person goes to a private voting booth and casts a vote for 1 or a private voting booth and casts a vote for 1 or 0. 0.

If the outcome of the election coincided with the If the outcome of the election coincided with the parityparity of the N bits, the voters “win” the election of the N bits, the voters “win” the election

A voting puzzle

Example:Example:

NN = 5, with bits 1 0 1 1 0 = 5, with bits 1 0 1 1 0

Parity = 1Parity = 1

If they vote If they vote 1 0 0 1 1,1 0 0 1 1, then majority = 1, they win. then majority = 1, they win.

If they vote If they vote 0 0 1 1 00 0 1 1 0, then majority = 0, they lose., then majority = 0, they lose.

A voting puzzle

NN (odd) people, each of whom has a (odd) people, each of whom has a randomrandom bit bit on his/her forehead. on his/her forehead.

No communication allowed. Each person goes to No communication allowed. Each person goes to a private voting booth and casts a vote for 1 or a private voting booth and casts a vote for 1 or 0. 0.

If the outcome of the election coincided with the If the outcome of the election coincided with the parityparity of the N bits, the voters “win” the election. of the N bits, the voters “win” the election.

How do voters maximize the How do voters maximize the probability of winning?probability of winning?

A voting puzzle

Note that each individual has no information about the parity

Since each individual is wrong half Since each individual is wrong half the time, the outcome of the the time, the outcome of the election is wrong half the timeelection is wrong half the time

Beware of the Fallacy!Beware of the Fallacy!

Solution

Note: to know parity is equivalent to knowing the bit on your forehead

STRATEGY:STRATEGY: Each person assumes the bit on his/her head is the same as the majority of bits he/she sees.

Vote accordingly (in the case of even split, vote 0).

STRATEGY:STRATEGY: Each person assumes the bit on Each person assumes the bit on his/her head is the same as the majority of bits his/her head is the same as the majority of bits he/she sees. Vote accordingly (in the case of he/she sees. Vote accordingly (in the case of even split, vote 0).even split, vote 0).

Two cases:Two cases:• difference of (# of 1’s) and (# of 0’s) > 1difference of (# of 1’s) and (# of 0’s) > 1• difference = 1difference = 1

Analysis

STRATEGY:STRATEGY: Each person assumes the bit on Each person assumes the bit on his/her head is the same as the majority of bits his/her head is the same as the majority of bits he/she sees. Vote accordingly (in the case of he/she sees. Vote accordingly (in the case of even split, vote 0).even split, vote 0).

ANALYSIS:ANALYSIS: The strategy works so long as the The strategy works so long as the difference in the number of 1’s and the number difference in the number of 1’s and the number of 0’s is at least two.of 0’s is at least two.

ProbabilityProbability

of winning = of winning = 2

1

( )21 1

N

NN

O N

Analysis

A Final Game

Greater or Smaller?

Alice and Bob play a gameAlice and Bob play a game

Alice picks two distinct random numbers Alice picks two distinct random numbers xx and and yy between 0 and 1 between 0 and 1

Bob chooses to know any one of them, say Bob chooses to know any one of them, say xx

Now, Bob has to tell whether Now, Bob has to tell whether x < yx < y or or x > yx > y

If Bob guesses at random, If Bob guesses at random,

chances of winning are 50%chances of winning are 50%

Can Bob improve his chances of Can Bob improve his chances of winning?winning?

Bob picks a number between 0 and 1 Bob picks a number between 0 and 1 at random, say at random, say zz..

If If x > zx > z, he says , he says xx is greater is greater

If If x < zx < z, he says , he says xx is smaller is smaller

Analysis

0 1x yz

If z lies between x and y, Bob’s answer is correctIf z lies between x and y, Bob’s answer is correct

Analysis

Since x and y are distinct, there is a non-zero Since x and y are distinct, there is a non-zero probability for z to lie between x and yprobability for z to lie between x and y

Hence, Bob’s probability of winning is more than 50%Hence, Bob’s probability of winning is more than 50%

0 1x yzz

If z lies between x and y, Bob’s answer is correctIf z lies between x and y, Bob’s answer is correct

If z does not lie between x and y, Bob’s If z does not lie between x and y, Bob’s answer is wrong 50% of the times.answer is wrong 50% of the times.

Final Lesson for today…

Keep your mind open towards new Keep your mind open towards new possibilities !possibilities !

top related