Top Banner
Decision Making Under Decision Making Under Uncertainty Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006
34

Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Decision Making Under Decision Making Under UncertaintyUncertainty

Russell and Norvig: ch 16

CMSC421 – Fall 2006

Page 2: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Utility-Based AgentUtility-Based Agent

environmentagent

?

sensors

actuators

Page 3: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Non-deterministic vs. Non-deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty

?

ba c

{a,b,c}

decision that is best for worst case

?

ba c

{a(pa),b(pb),c(pc)}

decision that maximizes expected utility valueNon-deterministic

modelProbabilistic model~ Adversarial search

Page 4: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Expected UtilityExpected UtilityRandom variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state Function U of XE.g., U is the utility of a stateThe expected utility of A is EU[A|E] = i=1,…,n p(xi|A)U(xi)

= i=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))

Page 5: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

One State/One Action One State/One Action ExampleExample

Page 6: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74

One State/Two Actions One State/Two Actions ExampleExample

Page 7: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57

-5 -25

Introducing Action CostsIntroducing Action Costs

Page 8: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

MEU Principle

rational agent should choose the action that maximizes agent’s expected utilitythis is the basis of the field of decision theorynormative criterion for rational choice of action

Page 9: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Not quite…

Must have complete model of: Actions Utilities States

Even if you have a complete model, will be computationally intractableIn fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationalityNevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before

Page 10: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

We’ll look at

Decision Theoretic Reasoning Simple decision making (ch. 16) Sequential decision making (ch. 17)

Page 11: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Preferences

An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes

Lottery L = [p, A; (1 – p), B]

Notation:A > B: A preferred to BA B : indifference between A and

BA ≥ B : B not preferred to A

Page 12: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Rational Preferences

Idea: preferences of a rational agent must obey constraints

Axioms of Utility Theory1. Orderability:

(A > B) v (B > A) v (A B)2. Transitivity:

(A > B) ^ (B > C) (A > C)3. Contitnuity:

A > B > C p [p, A; 1-p,C] B4. Substitutability:

A B [p, A; 1-p,C] [p, B; 1-p,C] 5. Monotonicity:

A > B (p ≥ q [p, A; 1-p, B] ≥ [q, A; 1-q, B])

Page 13: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Rational Preferences

Violating the constraints leads to irrational behavior

E.g: an agent with intransitive preferences can be induced to give away all its money

if B > C, than an agent who has C would pay some amount, say $1, to get B

if A > B, then an agent who has B would pay, say, $1 to get A

if C > A, then an agent who has A would pay, say, $1 to get C

….oh, oh!

Page 14: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Rational Preferences Utility

Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real-valued function U such that

U(A) ≥ U(B) A ≥BU([p1,S1;…,pn,Sn])=i piU(Si)

MEU principle: Choose the action that maximizes expected utility

Page 15: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Utility Assessment

Standard approach to assessment of human utilites:compare a given state A to a standard lottery Lp that has

best possible prize w/ prob. pworst possible catastrophy w/

prob. (1-p)adjust lottery probability p until ALpcontinue as before

instant death

A Lp

p

1 - p

Page 16: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Aside: Money Utility function

Given a lottery L with expected monetrary value EMV(L), usually U(L) < U(EMV(L)) e.g., people are risk-averse

Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?

Page 17: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Decision Networks

Extend BNs to handle actions and utilitiesAlso called Influence diagramsMake use of BN inferenceCan do Value of Information calculations

Page 18: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Decision Networks cont.

Chance nodes: random variables, as in BNsDecision nodes: actions that decision maker can takeUtility/value nodes: the utility of the outcome state.

Page 19: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

R&N example

Page 20: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Prenatal Testing Example

Page 21: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rainTake Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 1.0P(~umb|~take)=1.0

Page 22: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Evaluating Decision Networks

Set the evidence variables for current stateFor each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the

parent nodes of the utility node, using BN inference

Calculate the resulting utility for action

return the action with the highest utility

Page 23: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rainTake Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 1.0P(umb|~take)= 0

Page 24: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rainTake Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 0.8P(umb|~take)=0.1

umb rain P(umb,rain | take)

0 0 0.2 x 0.6

0 1 0.2 x 0.4

1 0 0.8 x 0.6

1 1 0.8 x 0.4

#1

#1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???

Page 25: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rainTake Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 0.8P(umb|~take)=0.1

umb rain P(umb,rain | ~take)

0 0 0. 9 x 0.6

0 1 0.9 x 0.4

1 0 0.1 x 0.6

1 1 0.1 x 0.4

#2

#2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???

So, in this case I would…?

Page 26: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Value of Information

Idea: Compute the expected value of acquiring possible evidence

Example: buying oil drilling rights Two blocks A and B, exactly one of them has oil, worth k Prior probability 0.5 Current price of block is k/2

What is the value of getting a survey of A done? Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5

Compute expected value of information (VOI) expected value of best action given the infromation minus expected

value of best action without informationVOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0

= ??

Page 27: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Value of Information (VOI)

suppose agent’s current knowledge is E. The value of the current best action is

i

iiA

))A(Do,E|)A(sult(ReP))A(sult(ReUmax)E|(EU

the value of the new best action (after new evidence E’ is obtained):

i

iiA

))A(Do,E,E|)A(sult(ReP))A(sult(ReUmax)E,E|(EU

the value of information for E’ is:

)E|(EU)E,e|(EU)E|e(P)E(VOIk

kekk

Page 28: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rain

forecast

Take Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 0.8P(umb|~take)=0.1

R P(F=rainy|R)

0 0.2

1 0.7

Page 29: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

VOI

VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU()

Page 30: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

umb rain P(umb,rain | take, rainy)

0 0

0 1

1 0

1 1

#1: EU(take|rainy)

umb rain P(umb,rain | take, ~rainy)

0 0

0 1

1 0

1 1

#3: EU(take|~rainy)

umb rain P(umb,rain | ~take, rainy)

0 0

0 1

1 0

1 1

#2: EU(~take|rainy)

umb rain P(umb,rain |~take, ~rainy)

0 0

0 1

1 0

1 1

#4: EU(~take|~rainy)

Page 31: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Umbrella Network

rain

forecast

Take Umbrella

happiness

take/don’t take

P(F=rainy) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 0.8P(umb|~take)=0.1

F P(R=rain|F)

0 0.2

1 0.7

Page 32: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

umb rain P(umb,rain | take, rainy)

0 0

0 1

1 0

1 1

#1: EU(take|rainy)

umb rain P(umb,rain | take, ~rainy)

0 0

0 1

1 0

1 1

#3: EU(take|~rainy)

umb rain P(umb,rain | ~take, rainy)

0 0

0 1

1 0

1 1

#2: EU(~take|rainy)

umb rain P(umb,rain |~take, ~rainy)

0 0

0 1

1 0

1 1

#4: EU(~take|~rainy)

Page 33: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

VOI

VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU()

Page 34: Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Summary: Simple Decision Making

Decision Theory = Probability Theory + Utility TheoryRational Agent operates by MEUDecision NetworksValue of Information