Top Banner
Artificial Intelligence CS 165A Tuesday, November 20, 2007 Knowledge Representation (Ch 10) Uncertainty (Ch 13)
35

Artificial Intelligence CS 165A Tuesday, November 20, 2007 Knowledge Representation (Ch 10) Uncertainty (Ch 13)

Jan 03, 2016

Download

Documents

Willis Hopkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

Artificial Intelligence

CS 165A

Tuesday, November 20, 2007

Knowledge Representation (Ch 10) Uncertainty (Ch 13)

Page 2: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

2

Notes

• HW #4 due by noon tomorrow

• Reminder: Final exam December 14, 4-7pm– Review in class on Dec. 6th

Page 3: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

3

Situation Calculus – actions, events

• “Situation Calculus” is a way of describing change over time in first-order logic– Fluents: Functions or predicates that can vary over time have an

extra argument, Si (the situation argument)

Predicate(args, Si) Location of an agent, aliveness, changing properties, ...

– The Result function is used to represent change from one situation to another resulting from an action (or action sequence)

Result(GoForward, Si) = Sj

“Sj is the situation that results from the action GoForward applied to situation Si

Result() indicates the relationship between situations

Review

Page 4: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

4

Situation Calculus

Represents the world in different “situations” and the relationship between situations

Review

Page 5: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

5

Situation Calculus

Represents the world in different “situations” and the relationship between situations

Review

Page 6: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

6

Examples

• How would you interpret the following sentences in First-Order Logic using situation calculus?

x, s Studying(x, s) Failed(x, Result(TakeTest, s))

x, s TurnedOn(x, s) LightSwitch(x) TurnedOff(x, Result(FlipSwitch, s))

Review

If you’re studying and then you take the test, you will fail.

(or) Studying a subject implies that you will fail the test for that subject.

If you flip the light switch when it is turned on, it will then be turned off.

Page 7: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

7

There are other ways to deal with time

• Event calculus– Based on points in time rather than situations– Designed to allow reasoning over periods of time

Can represent actions with duration, overlapping actions, etc.

• Generalized events– Parts of a general “space-time chunk”

• Processes– Not just discrete events

• Intervals– Moments and durations of time

• Objects with state fluents– Not just events, but objects can also have time properties

Page 8: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

8

Event calculus relations

• Initiates(e, f, t)– Event e at time t causes fluent f to become true

• Terminates(e, f, t)– Event e at time t causes fluent f to no longer be true

• Happens(e, t)– Event e happens at time t

• Clipped(f, t1, t2)

– f is terminated by some event sometime between t1 and t2

Page 9: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

9

Generalized events

• An ontology of time that allows for reasoning about various temporal events, subevents, durations, processes, intervals, etc.

time

Australia

Space-time chunk

Page 10: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

10

Time interval predicates

After(ReignOf(ElizabethII), ReignOf(GeorgeVI))

Overlap(Fifties, ReignOf(Elvis))

Start(Fifties) = Start(AD1950)

Meet(Fifties, Sixties)

Ex:

Page 11: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

11

Objects with state fluents

President(USA)

Page 12: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

12

Knowledge representation

• Chapter 10 covers many topics in knowledge representation, many of which are important to real, sophisticated AI reasoning systems– We’re only scratching the surface of this topic

– Best covered in depth in an advanced AI course and in context of particular AI problems

– Read through the Internet shopping world example in 10.5

• Now we move on to probabilistic reasoning, a different way of representing and manipulating knowledge– Chapters 13 and 14

Page 13: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

13

Quick Review of Probability

From here on we will assume that you know this…

Page 14: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

14

Probability notation and notes

• Probabilities of propositions

– P(A), P(the sun is shining)

• Probabilities of random variables

– P(X = x1), P(Y = y1), P(x1 < X < x2)

• P(A) usually means P(A = True) (A is a proposition, not a variable)

– This is a probability value

– Technically, P(A) is a probability function

• P(X = x1)

– This is a probability value (P(X) is a probability function)

• P(X)– This is a probability function or a probability density function

• Technically, if X is a variable, we should not write P(X) = 0.5

– But rather P(X = x1) = 0.5

Page 15: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

15

Discrete and continuous probabilities

• Discrete: Probability function P(X, Y) is described by an MxN matrix of probabilities– Possible values of each: P(X=x1, Y=y1) = p1

– P(X=xi, Y=yj) = 1

– P(X, Y, Z) is an MxNxP matrix

• Continuous: Probability density function (pdf) P(X, Y) is described by a 2D function– P(x1 < X < x2, y1 < Y < y2) = p1

– P(X, Y) dX dY = 1

Page 16: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

16

Discrete probability distribution

0

0.1

0.2

1 2 3 4 5 6 7 8 9 10 11 12

X

p(X)

1)( i

ixXp

Page 17: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

17

Continuous probability distribution

0

0.2

0.4

1 2 3 4 5 6 7 8 9 10 11 12

X

p(X)

1)(

Xp

Page 18: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

18

Continuous probability distribution

0

0.2

0.4

1 2 3 4 5 6 7 8 9 10 11 12

X

p(X)

aXp 8

6

)(P(X=5) = 0

P(X=x1) = 0

P(X=5) = ???

Page 19: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

19

Three Axioms of Probability

1. The probability of every event must be nonnegative– For any event A, P(A) 0

2. Valid propositions have probability 1– P(True) = 1

– P(A A) = 1

3. For disjoint events A1, A2, …

– P(A1 A2 …) = P(A1) + P(A2) + …

• From these axioms, all other properties of probabilities can be derived.– E.g., derive P(A) + P(A) = 1

Page 20: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

20

Some consequences of the axioms

• Unsatisfiable propositions have probability 0– P(False) = 0

– P(A A) = 0

• For any two events A and B– P(A B) = P(A) + P(B) – P(A B)

• For the complement Ac of event A– P(Ac) = 1 – P(A)

• For any event A– 0 P(A) 1

• For independent events A and B– P(A B) = P(A) P(B)

Page 21: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

21

Venn Diagram

True

A BA B

Visualize: P(True), P(False), P(A), P(B), P(A), P(B),P(A B), P(A B), P(A B), …

Page 22: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

22

Joint Probabilities

• A complete probability model is a single joint probability distribution over all propositions/variables in the domain

– P(X1, X2, …, Xi, …)

• A particular instance of the world has the probability

– P(X1=x1 X2=x2 … Xi=xi …) = p

• Rather than stating knowledge as– Raining WetGrass

• We can state it as– P(Raining, WetGrass) = 0.15

– P(Raining, WetGrass) = 0.01

– P(Raining, WetGrass) = 0.04

– P(Raining, WetGrass) = 0.8

0.8 0.04

0.01 0.15

WetGrass WetGrass

Raining

Raining

Page 23: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

23

Conditional Probability

• Unconditional, or Prior, Probability– Probabilities associated with a proposition or variable, prior to any

evidence

– E.g., P(WetGrass), P(Raining)

• Conditional, or Posterior, Probability– Probabilities after evidence is gathered

– P(A | B) – “The probability of A given that we know B”

– After (posterior to) procuring evidence

– E.g., P(WetGrass | Raining)

)(

),()|(

YP

YXPYXP ),()()|( YXPYPYXP or

Assumes P(Y) nonzero

Page 24: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

24

The chain rule

)()|(),( YPYXPYXP

),|()|()(

,

)()|(),|(

),(),|(),,(

YXZPXYPXP

lyequivalentor

ZPZYPZYXP

ZYPZYXPZYXP

By the Chain Rule

• Precedence: ‘|’ is lowest• E.g., P(X | Y, Z) means which?

P( (X | Y), Z )P(X | (Y, Z) )

Notes:

Page 25: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

25

Joint probability distribution

From P(X,Y), we can always calculate: P(X)P(Y)P(X|Y)P(Y|X)

P(X=x1)

P(Y=y2)

P(X|Y=y1)

P(Y|X=x1)

P(X=x1|Y)etc.

0.30.20.1

0.10.10.2

X

Y

x1 x2 x3

y1

y2

Page 26: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

26

0.2 0.1 0.1

0.1 0.2 0.3

x1 x2 x3

y1

y2

P(X,Y)

0.40.30.3

x1 x2 x3

P(X)

0.6

0.4y1

y2

P(Y)

0.50.3330.167

0.250.250.5

x1 x2 x3

y1

y2

P(X|Y)

0.750.6670.333

0.250.3330.667

x1 x2 x3

y1

y2

P(Y|X)P(X=x1,Y=y2) = ?

P(X=x1) = ?

P(Y=y2) = ?

P(X|Y=y1) = ?

P(X=x1|Y) = ?

Page 27: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

27

Probability Distributions

Continuous vars

Scalar* Scalar

Function of two variables MxN matrix

Function of two variables MxN matrix

Function of one variable M vector

Function of one variable N vector

Scalar* Scalar

Discrete vars

Function (of one variable) M vector

P(X=x)

P(X,Y)

P(X|Y)

P(X|Y=y)

P(X=x|Y)

P(X=x|Y=y)

P(X)

* - actually zero. Should be P(x1 < X < x2)

Page 28: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

28

Bayes’ Rule

• Since

and

• Then

)()|(),( YPYXPYXP

)()|(),( XPXYPYXP

)(

)()|()|(

YP

XPXYPYXP Bayes’ Rule

)()|()()|( XPXYPYPYXP

Page 29: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

29

Bayes’ Rule

• Similarly, P(X) conditioned on two variables:

),,|(

),,|(),,,|(),,,|(

32

31312321

N

NNN XXXP

XXXPXXXXPXXXXP

)|(

)|(),|(),|(

ZYP

ZXPZXYPZYXP

• Or N variables:

)|(

)|(),|(),|(

YZP

YXPYXZPZYXP

Page 30: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

30

Bayes’ Rule

)(

)()|()|(

DP

HPHDPDHP ii

i Posterior probability

(diagnostic knowledge)

Likelihood(causal knowledge) Prior probability

• This simple equation is very useful in practice– Usually framed in terms of hypotheses (H) and data (D)

Which of the hypotheses is best supported by the data?

Normalizing constant

)()|()|( iii HPHDPkDHP

Page 31: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

31

Bayes’ rule example: Medical diagnosis

• Meningitis causes a stiff neck 50% of the time

• A patient comes in with a stiff neck – what is the probability that he has meningitis?

• Need to know two things:– The prior probability of a patient having meningitis (1/50,000)

– The prior probability of a patient having a stiff neck (1/20)

• ?

• P(M | S) = (0.5)(0.00002)/(0.05) = 0.0002

)(

)()|()|(

SP

MPMSPSMP

Page 32: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

32

Example (cont.)

• Suppose that we also know about whiplash– P(W) = 1/1000

– P(S | W) = 0.8

• What is the relative likelihood of whiplash and meningitis?– P(W | S) / P(M | S)

016.005.0

)001.0)(8.0(

)(

)()|()|(

SP

WPWSPSWP

So the relative likelihood of whiplash vs. meningitis is (0.016/0.0002) = 80

Page 33: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

33

A useful Bayes rule example

A test for a new, deadly strain of anthrax (that has no symptoms) is known to be 99.9% accurate. Should you get tested? The chances of having this strain are one in a million.

What are the random variables?A – you have anthrax (boolean)

T – you test positive for anthrax (boolean)

Notation: Instead of P(A=True) and P(A=False), we will write P(A) and P(A)

What do we want to compute?P(A|T)

What else do we need to know or assume?Priors: P(A) , P(A)

Given: P(T|A) , P(T|A), P(T|A), P(T|A)

A

T

A

TA

T

A

T

Possibilities

Page 34: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

34

Example (cont.)

We know:Given: P(T|A) = 0.999, P(T|A) = 0.001, P(T|A) = 0.001, P(T|A)

= 0.999

Prior knowledge: P(A) = 10-6, P(A) = 1 – 10-6

Want to know P(A|T)P(A|T) = P(T|A) P(A) / P(T)

Calculate P(T) by marginalizationP(T) = P(T|A) P(A) + P(T|A) P(A) = (0.999)(10-6) + (0.001)(1 – 10-6)

0.001

So P(A|T) = (0.999)(10-6) / 0.001 0.001

Therefore P(A|T) 0.999

What if you work at a Post Office?

Page 35: Artificial Intelligence CS 165A Tuesday, November 20, 2007  Knowledge Representation (Ch 10)  Uncertainty (Ch 13)

35All people

People without anthraxPeople with anthrax

Good T

Bad T(0.1%)