Top Banner
PGM Learning Probabilistic Graphical Models Overview Learning PGM Learning Tasks and Metrics
13

Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Jul 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

PGM Learning

ProbabilisticGraphicalModels Overview

Learning

Daphne Koller

PGM Learning Tasks and Metrics

Page 2: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Learning

domain expert

dataset of instances D={d[1],...d[M]}sampled from P*

True distribution P*(maybe corresponding

to a PGM M*)

Daphne Koller

Data

NetworkelicitationLearning

Page 3: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Known Structure, Complete Data

X1 X2

InducerInducerInitial

X1 X2

Daphne Koller

X1 X2 Y

x10 x2

1 y0

x11 x2

0 y0

x10 x2

1 y1

x10 x2

0 y0

x11 x2

1 y1

x10 x2

1 y1

x11 x2

0 y0

P(Y|X1,X2)

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

YInducerInducer

InputData

network Y

Page 4: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Unknown Structure, Complete Data

X1 X2

InducerInducerInitial

X1 X2

Daphne Koller

X1 X2 Y

x10 x2

1 y0

x11 x2

0 y0

x10 x2

1 y1

x10 x2

0 y0

x11 x2

1 y1

x10 x2

1 y1

x11 x2

0 y0

P(Y|X1,X2)

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

YInducerInducer

InputData

network Y

Page 5: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Known Structure, Incomplete Data

X1 X2

InducerInducerInitial

X1 X2

Daphne Koller

P(Y|X1,X2)

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

YInducerInducer

InputData

network Y

X1 X2 Y

? x21 y0

x11 ? y0

? x21 ?

x10 x2

0 y0

? x21 y1

x10 x2

1 ?

x11 ? y0

Page 6: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Unknown Structure, Incomplete Data

X1 X2

InducerInducerInitial

X1 X2

Daphne Koller

P(Y|X1,X2)

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

YInducerInducer

InputData

network Y

X1 X2 Y

? x21 y0

x11 ? y0

? x21 ?

x10 x2

0 y0

? x21 y1

x10 x2

1 ?

x11 ? y0

Page 7: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Latent Variables, Incomplete Data

X1 X2

InducerInducerInitial

X1 X2

H

Daphne Koller

P(Y|X1,X2)

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

YInducerInducer

InputData

network Y

X1 X2 Y

? x21 y0

x11 ? y0

? x21 ?

x10 x2

0 y0

? x21 y1

x10 x2

1 ?

x11 ? y0

Page 8: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

PGM Learning Tasks I• Goal: Answer general probabilistic queries

about new instances• Simple metric: Training set likelihood

Daphne Koller

mp m g– P(D : M) = Πm P(d[m] : M)

• But we really care about new data– Evaluate on test set likelihood – P(D’ : M)

Page 9: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

PGM Learning Tasks II• Goal: Specific prediction task on new instances

– Predict target variables y from observed variables x– E.g., image segmentation, speech recognition

• Often care about specialized objective

Daphne Koller

• Often care about specialized objective– E.g., pixel-level segmentation accuracy

• Often convenient to select model to optimize– likelihood Πm P(d[m] : M) or – conditional likelihood Πm P(y[m] | x[m] : M)

• Model evaluated on “true” objective over test data

Page 10: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

PGM Learning Tasks III• Goal: Knowledge discovery of M*

– Distinguish direct vs indirect dependencies– Possibly directionality of edges

Daphne Koller

Possibly directionality of edges– Presence and location of hidden variables

• Often train using likelihood– Poor surrogate for structural accuracy

• Evaluate by comparing to prior knowledge

Page 11: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Avoiding Overfitting• Selecting M to optimize training set

likelihood overfits to statistical noise• Parameter overfitting

P t s fit d is i t i i d t

Daphne Koller

– Parameters fit random noise in training data– Use regularization / parameter priors

• Structure overfitting– Training likelihood always increases for more

complex structures– Bound or penalize model complexity

Page 12: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Selecting Hyperparameters• Regularization for overfitting involves

hyperparameters:– Parameter priors

Daphne Koller

p– Complexity penalty

• Choice of hyperparameters makes a big difference to performance

• Must be selected on validation set

Page 13: Probabilistic Learning Graphical Models PGM Learning Tasks ...spark-university.s3.amazonaws.com/stanford-pgm/slides/5.1.1-Learn... · Graphical Models Overview Learning ... PGM Learning

Why PGM Learning• Predictions of structured objects

(sequences, graphs, trees)– Exploit correlations between several predicted

Daphne Koller

p pvariables

• Can incorporate prior knowledge into model• Learning single model for multiple tasks• Framework for knowledge discovery