Top Banner
CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models
40

Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Jul 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 1

Intelligent Systems (AI-2)

Computer Science cpsc422, Lecture 18

Oct, 21, 2016

Slide SourcesRaymond J. Mooney University of Texas at Austin

D. Koller, Stanford CS - Probabilistic Graphical Models

Page 2: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 2

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Recap one application

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 3: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Parameterization of Markov Networks

CPSC 422, Lecture 17 Slide 3

Factors define the local interactions (like CPTs in Bnets)

What about the global model? What do you do with Bnets?

X

X

Page 4: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

How do we combine local models?As in BNets by multiplying them!

CPSC 422, Lecture 17 Slide 4

Page 5: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Step Back…. From structure to factors/potentials

In a Bnet the joint is factorized….

CPSC 422, Lecture 17 Slide 5

In a Markov Network you have one factor for each maximal clique

Page 6: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

General definitions

Two nodes in a Markov network are independent if and only if every path between them is cut off by evidence

So the markov blanket of a node is…?

eg for C

eg for A C

CPSC 422, Lecture 17 6

Page 7: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 7

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 8: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Markov Networks Applications (1): Computer Vision

Called Markov Random Fields

• Stereo Reconstruction

• Image Segmentation

• Object recognition

CPSC 422, Lecture 17 8

Typically pairwise MRF

• Each vars correspond to a pixel (or superpixel )

• Edges (factors) correspond to interactions between adjacent pixels in the image

• E.g., in segmentation: from generically penalize discontinuities, to road under car

Page 9: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Image segmentation

CPSC 422, Lecture 17 9

Page 10: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Image segmentation

CPSC 422, Lecture 17 10

Page 11: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 12

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 12: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 Slide 13

Variable elimination algorithm for Bnets

To compute P(Z| Y1=v1 ,… ,Yj=vj ) :

1. Construct a factor for each conditional probability.

2. Set the observed variables to their observed values.

3. Given an elimination ordering, simplify/decompose sum of products

4. Perform products and sum out Zi

5. Multiply the remaining factors Z

6. Normalize: divide the resulting factor f(Z) by Z f(Z) .

Variable elimination algorithm for Markov Networks…..

Given a network for P(Z, Y1,… ,Yj Z1,… ,Zi), :

Page 13: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Gibbs sampling for Markov Networks

Example: P(D | C=0)

Resample non-evidence variables in a pre-defined order or a random order

Suppose we begin with A

A

B C

D E

F

A B C D E F

1 0 0 1 1 0

Note: never change evidence!

What do we need to sample?

A. P(A | B=0) B. P(A | B=0, C=0)

C. P( B=0, C=0| A)

CPSC 422, Lecture 17 14

Page 14: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Example: Gibbs sampling

Resample probability distribution of P(A|BC)

A

B C

D E

F

ϕ1

ϕ2 ϕ3

A=1 A=0

C=1 1 2

C=0 3 4

A=1 A=0

B=1 1 5

B=0 4.3 0.2A B C D E F

1 0 0 1 1 0

? 0 0 1 1 0

Φ1 × Φ2 × Φ3 = A=1 A=0

12.9 0.8

Normalized result = A=1 A=0

0.95 0.05

CPSC 422, Lecture 17 15

B=0 ; C=0

Page 15: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Example: Gibbs sampling

Resample probability distribution of B given A D

A

B C

D E

F

ϕ1

ϕ2

ϕ4

D=1 D=0

B=1 1 2

B=0 2 1

A=1 A=0

B=1 1 5

B=0 4.3 0.2A B C D E F

1 0 0 1 1 0

1 0 0 1 1 0

1 ? 0 1 1 0

Φ1 × Φ2 × Φ4 = B=1 B=0

1 8.6

Normalized result = B=1 B=0

0.11 0.89

CPSC 422, Lecture 17 16

Page 16: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 17

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 17: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

We want to model P(Y1| X1.. Xn)

• Which model is simpler, MN or BN?

CPSC 422, Lecture 18 Slide 18

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

• Naturally aggregates the influence of different parents

… where all the Xi are always observed

MN BN

Page 18: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Conditional Random Fields (CRFs)

• Model P(Y1 .. Yk | X1.. Xn)

• Special case of Markov Networks where all the Xi

are always observed

• Simple case P(Y1| X1…Xn)

CPSC 422, Lecture 18 Slide 19

Page 19: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

What are the Parameters?

CPSC 422, Lecture 18 Slide 20

Page 20: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 21

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 21: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 22

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 22: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 23

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

0

Page 23: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 24

n

i

iin xwwxxYP1

011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 24: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 25

n

i

iin xwwxxYP1

011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 25: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Sigmoid Function used in Logistic Regression

• Great practical interest

• Number of param wi is linear instead of exponential in the number of parents

• Natural model for many real-world applications

• Naturally aggregates the influence of different parents

CPSC 422, Lecture 18 Slide 26

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

Page 26: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Logistic Regression as a Markov Net (CRF)

Logistic regression is a simple Markov Net (a CRF) aka naïve markov model

Y

X1 X2… Xn

• But only models the conditional distribution, P(Y | X ) and not the full joint P(X,Y )

CPSC 422, Lecture 18 Slide 27

Page 27: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayes vs. Logistic RegressionY

X1 X2… Xn

Y

X1 X2

Xn

Naïve Bayes

LogisticRegression (Naïve Markov)

Conditional

Generative

Discriminative

CPSC 422, Lecture 18 Slide 28

Page 28: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 29

Learning Goals for today’s class

You can:

• Perform Exact and Approx. Inference in Markov Networks

• Describe a few applications of Markov Networks

• Describe a natural parameterization for a Naïve Markov

model (which is a simple CRF)

• Derive how P(Y|X) can be computed for a Naïve Markov

model

• Explain the discriminative vs. generative distinction and its

implications

Page 29: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 30

Next class Mon

Revise generative temporal models (HMM)To Do

Linear-chain CRFs

• Go to Office Hours x

• Learning Goals (look at the end of the slides for each lecture – complete list ahs been posted)

• Revise all the clicker questions and practice exercises

• More practice material has been posted

• Check questions and answers on Piazza

Midterm, Wed, Oct 26, we will start at 9am sharp

How to prepare….

Page 30: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

31

Generative vs. Discriminative Models

Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y).

Classification is then done using Bayesian inference But a generative model can also be used to perform any

other inference task, e.g. P(X1 | X2, …Xn, )• “Jack of all trades, master of none.”

Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ).

By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data.

CPSC 422, Lecture 18

Page 31: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

On Fri: Sequence Labeling

Y2

X1 X2… XT

HMM

Linear-chain CRF

Conditional

Generative

Discriminative

Y1 YT

..

Y2

X1 X2… XT

Y1 YT

..

CPSC 422, Lecture 18Slide 32

Page 32: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 33

Lecture Overview

• Indicator function

• P(X,Y) vs. P(X|Y) and Naïve Bayes

• Model P(Y|X) explicitly with Markov Networks

• Parameterization

• Inference

• Generative vs. Discriminative models

Page 33: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

P(X,Y) vs. P(Y|X)

Assume that you always observe a set of variables

X = {X1…Xn}

and you want to predict one or more variables

Y = {Y1…Ym}

You can model P(X,Y) and then infer P(Y|X)

Slide 34CPSC 422, Lecture 18

Page 34: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

P(X,Y) vs. P(Y|X)

With a Bnet we can represent a joint as the product of Conditional Probabilities

With a Markov Network we can represent a joint a the product of Factors

Slide 35CPSC 422, Lecture 18

We will see that Markov Network are also suitable for representing the conditional prob. P(Y|X) directly

Page 35: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Directed vs. Undirected

CPSC 422, Lecture 18 Slide 36

Factorization

Page 36: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayesian Classifier P(Y,X)A very simple and successful Bnets that allow to classify

entities in a set of classes Y1, given a set of features (X1…Xn)

Example:

• Determine whether an email is spam (only two classes spam=T and spam=F)

• Useful attributes of an email ?

Assumptions• The value of each attribute depends on the classification

• (Naïve) The attributes are independent of each other given the classification

P(“bank” | “account” , spam=T) = P(“bank” | spam=T)

Slide 37CPSC 422, Lecture 18

Page 37: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayesian Classifier for Email Spam

Email Spam

Email contains “free”

words

• What is the structure?

Email contains “money”

Email contains “ubc”

Email contains “midterm”

The corresponding Bnet represent : P(Y1, X1…Xn)

Slide 38

CPSC 422, Lecture 18

Page 38: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Can we derive : P(Y1| X1…Xn) for any x1…xn

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can also perform any other inference…e.g., P(X1| X3 )

Slide 39CPSC 422, Lecture 18

Page 39: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Can we derive : P(Y1| X1…Xn)

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can perform also any other inferencee.g., P(X1| X3 )

Slide 40CPSC 422, Lecture 18

Page 40: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 41