Top Banner
Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019
74

Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Jun 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Probabilistic Graphical ModelsProbabilistic Graphical ModelsStructure learning in Bayesian networks

Siamak Ravanbakhsh Fall 2019

Page 2: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Learning objectivesLearning objectives

why structure learning is hard?two approaches to structure learning

constraint-based methodsscore based methods

MLE vs Bayesian score

Page 3: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

 

Page 4: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score  2O(n )2

Page 5: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

2O(n )2

Page 6: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

 

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

Page 7: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

Page 8: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

Perfect MAP

Page 9: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

Perfect MAP

Page 10: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

Perfect MAP

X ⊥ Y ∣ Z?

Page 11: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

first attempt: a DAG that is I-map for

Perfect MAP

p D I(G) ⊆ I(p )D

X ⊥ Y ∣ Z?

Page 12: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

minimal I-mapminimal I-map from CI test from CI test

input: IC test oracle; an orderingoutput: a minimal I-map G for i=1...n 

find minimal                                      s.t.set

X , … ,X 1 n

(X ⊥i X , … ,X −1 i−1 U ∣ U)U ⊆ {X , … ,X }1 i−1

X 1 X nX i

Pa ←X i U X ⊥ NonDesc ∣ Pa i X i X i

a DAG where removing an edge violates I-map property

Page 13: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

minimal I-mapminimal I-map from CI test from CI testProblems:

CI tests involve many variablesnumber of CI tests is exponentiala minimal I-MAP may be far from a P-MAP

Page 14: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

minimal I-mapminimal I-map from CI test from CI testProblems:

CI tests involve many variablesnumber of CI tests is exponentiala minimal I-MAP may be far from a P-MAP

different orderings give different graphs Example:

D,I,S,G,L(a topological ordering)

L,S,G,I,D L,D,S,I,G

Page 15: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI)

I(G) = I(p )D

first attempt: a DAG that is I-map for p D I(G) ⊆ I(p )D

can we find a perfect MAP with fewer IC testsinvolving fewer variables?

second attempt: a DAG that is P-map for

Page 16: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test

only up to I-equivalencethe same set of CIs

same skeletonsame immoralities

Page 17: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test

only up to I-equivalencethe same set of CIs

same skeletonsame immoralities

procedure:

1. find the undirected skeleton using CI tests2. identify immoralities in the undirected graph

Page 18: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

Page 19: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

Page 20: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

Page 21: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j

Page 22: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j = O((n ) ×2 O((n − 2) )d

O(n )d+2

Page 23: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

YX

Z

Page 24: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

YX

Z

Page 25: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

not immorality only if

X ⊥i X ∣j U⇒ Z ∈ UYX

Z

Page 26: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

not immorality only if

X ⊥i X ∣j U⇒ Z ∈ U

save the U when removing X-Ysee if Z in U?

if no, then we have immorality

X Y

Z

YX

Z

Page 27: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edges

Page 28: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergence

for exact CI tests, this guarantees the exact I-equivalence family

Page 29: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

for exact CI tests, this guarantees the exact I-equivalence family

Page 30: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

undirected skeleton+immoralities

for exact CI tests, this guarantees the exact I-equivalence family

Page 31: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

undirected skeleton+immoralities using rules R1,R2,R3

for exact CI tests, this guarantees the exact I-equivalence family

Page 32: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

Page 33: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

Page 34: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

Page 35: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

Page 36: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

Page 37: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

Page 38: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

over all possible datasets

Page 39: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

over all possible datasets

it is possible to derive the distribution of deviance measures

e.g.,         distributionreject a hypothesis (CI) for small p-values (.05)

χ2

.05

.95

Page 40: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

Page 41: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Mutual informationMutual information

how much information does X encode about Y?

reduction in the uncertainty of X after observing Y

Page 42: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y )

reduction in the uncertainty of X after observing Y

conditional entropy p(x)H(p(y∣x))∑x

Page 43: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

conditional entropy p(x)H(p(y∣x))∑x

Page 44: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

I(X,Y ) = p(x, y) log( )∑x,y p(x)p(y)p(x,y)

conditional entropy p(x)H(p(y∣x))∑x

Page 45: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

= D (p(x, y)∥p(x)p(y))KL

I(X,Y ) = p(x, y) log( )∑x,y p(x)p(y)p(x,y)

positive

conditional entropy p(x)H(p(y∣x))∑x

Page 46: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

Page 47: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

Page 48: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

using the empirical distribution

Page 49: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

using the empirical distribution

Page 50: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

= N p (x ,Pa ) log + log p (x )∑i∑x ,Pa i x iD i x i

(p (x )p (Pa )D i D x i

p (x ,Pa )D i x iD i )

using the empirical distribution

Page 51: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

= N p (x ,Pa ) log + log p (x )∑i∑x ,Pa i x iD i x i

(p (x )p (Pa )D i D x i

p (x ,Pa )D i x iD i )

using the definition of mutual information = N I (X ,Pa ) −∑i D i X iH (X )D i

using the empirical distribution

Page 52: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

Page 53: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

does not depend on structure

Page 54: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

does not depend on structure

I (X ,X )D i j

Page 55: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

structure learning algorithms use mutual information in the structure search:

Chow-Liu algorithm: find the max-spanning tree: edge-weights = mutual information

add direction to edges later

make sure each node has at most one parent (i.e., no v-structure)

does not depend on structure

I (X ,X )D i j

I (X ,X ) =D j i I (X ,X )D i j

Page 56: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

Page 57: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θlog

score (G,D) =B log P (D∣G) + log P (G)

Page 58: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

logscore (G,D) =B log P (D∣G) + log P (G)

G

Page 59: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

Page 60: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

Page 61: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

for large sample size

any exp-family member

Page 62: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

#parameters

for large sample size

any exp-family member

Page 63: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

#parameters

for large sample size

any exp-family member

Akaike Information Criterion (AIC) ℓ(D, θ ) −∗G K2

1

Page 64: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

Example

G 1

G 2

= ∣D∣

The Bayesian score is biased towards simpler structures

Page 65: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

Example The Bayesian score is biased towards simpler structures

= ∣D∣

data sampled from ICU alarm Bayesnet

Bayesian score of the true model (509 params.)simplified model (359 params)

simplified model (214 params)

Page 66: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

Page 67: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

Page 68: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

O(N )2 possible moves

Page 69: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

Page 70: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

use the decomposition of the score

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

Page 71: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

use the decomposition of the score

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

example ICU-alarm network

Page 72: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

Page 73: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

constraint-based methods:limit the max number of parents

rely on CI tests

identifies the I-equivalence class

Page 74: Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019. Learning

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

constraint-based methods:limit the max number of parents

rely on CI tests

identifies the I-equivalence class

score based methods:tree structure

use a Bayesian score + heuristic search

finds a locally optimal structure