Top Banner
Factor Graphs Algorithms Motivation JPDF Factorization of JPDF Graphical models Sum-Product Algorithm Sum-product and related algorithms for inference Manuel Yguel 1 Person in charge: Olivier Aycard 2 [email protected] 1 Institut National Polytechnique de Grenoble 2 Université Joseph Fourier, Grenoble Master II, IVR, 3I, I.C.A. 1 / 64
99

Sum-product and related algorithms for inference

Sep 11, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Sum-product and related algorithms forinference

Manuel Yguel1

Person in charge: Olivier Aycard2

[email protected]

1Institut National Polytechnique de Grenoble2Université Joseph Fourier, Grenoble

Master II, IVR, 3I, I.C.A.

1 / 64

Page 2: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

2 / 64

Page 3: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Probabilistic modelling

The modelling of phenomenons is almost surely uncertain:• communication between entities is subject to random

perturbations,• records of sensors are uncertain:

pixels of an image, range measurements of a laserrange-finder, etc.

• knowledge are approximative:camera extrinsec and intrinsec parameters, sensor orrobot localization, goals of people, etc.

• algorithms are approximative:approximations for real-time, first-order approximationsfor optimization and control, numerical precision, etc.

3 / 64

Page 4: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

4 / 64

Page 5: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Framework

• x1, x2, . . . , xn is a set of variables,• ∀i , xi takes on values in some (usually finite) domain (or

alphabet) Ai ,• let g(x1, . . . , xn) be a [0; 1]-valued function of x1, . . . , xn,

g is called the joint probabilistic distribution function(JPDF).

• the domain of g is S = A1 × A2 × . . .× An and is calledthe configuration space,

• each element of S is a particular configuration of thevariables, also called an event.

5 / 64

Page 6: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Example: the robot start problem

The robot does not start. The possible causes are:1 the battery is down,2 a wire is disconnected,

Furthermore observation can be made on the batteryvoltage. 4 variables can be defined:

Variable Alphabet or domainStart? {yes, no}

Power State? {up, down}Connected? {connected, disconnected}

Voltage Measure {[iV ; (i + 1)V [|i = 0, · · · , 199}.e = (no, up, disconnected, [24V ; 25V [) is a configuration ofthe 4 variables.g(e) ∈ [0; 1] is defined for all possible event it is also calledP(e) as a probability.

6 / 64

Page 7: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

7 / 64

Page 8: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Variable partition

For each problem: the set of variables is partitionned intothree subsets:

1 the set of questionned variables Q,2 the set of known variables K, (possibly empty),3 the set of unknown variables U , (possibly empty).

8 / 64

Page 9: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Example: the robot start problem

Power State evaluation:1 questionned variables Q = { Power State? },2 known variables K = {Start? , Voltage Measure },3 unknown variables U = {Connected? }.

Connection evaluation:1 questionned variables Q = { Connected? },2 known variables K = {Start? , Voltage Measure },3 unknown variables U = { Power State? }.

9 / 64

Page 10: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Goal

The goal of a probabilistic model is to calculate theconditional jpdf

B := P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q))

where ∀(i , j), xq(i) ∈ Q and xk(j) ∈ KIt is a set of functions, each one indexed by one differentconfiguration of the known variables:

(ak(1), . . . , ak(q)) 7−→ (pdf : Sq(1) × . . .× Sq(p) −→ [0; 1])

10 / 64

Page 11: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Example: the robot start problem

Power State evaluation:

B1 := P(PS|St, VM)

The variables Start? , Power State? , Connected? , VoltageMeasure are abreviated St , PS , C , VM respectively.

For each Start? and Voltage Measure configuration(∈ {yes, no} × [0.0V ; 200V ]) it defines a probabilisticfunction over the possible values of Power State? .

(no, 24V ) 7−→

0

0.2

0.4

0.6

0.8

1

up down

11 / 64

Page 12: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Use of probabilistic definitions

brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%

green eyes 6% 2% 6% 6%

12 / 64

Page 13: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Use of probabilistic definitions

brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%

green eyes 6% 2% 6% 6%

Marginal probability: calculating the probability of having blondhair.

P(blond hair) =∑

eye colorP(blond hair, eye color) = 18%

brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%

green eyes 6% 2% 6% 6%

12 / 64

Page 14: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Use of probabilistic definitions

brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%

green eyes 6% 2% 6% 6%

Conditional probability on eyes having blond hair:P(eye color|blond hair).

blond hairbrown eyes 5%

18%

blue eyes 11%18%

green eyes 2%18%

12 / 64

Page 15: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Probabilistic definitions

Conditional probability:P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q)) =

P(xq(1),...,xq(p),xk(1),...,xk(q))

P(xk(1),...,xk(q))

Marginalization (also called sum rule):• P(xq(1), . . . , xq(p), xk(1), . . . , xk(q))

=∑

(au(1),...,au(r))∈

Au(1)×...×Au(r)

g(x1, . . . , xn)

• P(xk(1), . . . , xk(q))

=∑

(aq(1),...,aq(p))∈

Aq(1)×...×Aq(p)

∑(au(1),...,au(r))∈

Au(1)×...×Au(r)

g(x1, . . . , xn)

13 / 64

Page 16: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Expanded expression of the goal function

Following conditional and marginal probability definitions Bequal: ∑

(au(1),...,au(r))

g(x1, . . . , xn)∑(aq(1),...,aq(p))

∑(au(1),...,au(r))

g(x1, . . . , xn)

14 / 64

Page 17: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Example: the robot start problem

Power State evaluation:

B1 := P(PS|St, VM)

=

∑c∈{connected,disconnected}

g(St, PS, c, VM)

∑ps∈{up,down}

∑c∈{connected,disconnected}

g(St, ps, c, VM)

15 / 64

Page 18: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDFDefinitions

Definitions and rulesfor JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithm

Worst inference complexity

Each variable takes values in a finite alphabet of size K ,K p+r sums are required,p and r variables in the questionned and unknown sets(resp.).

EXPONENTIAL COMPLEXITY in the number of variables.

16 / 64

Page 19: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

17 / 64

Page 20: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Product rule

Let o(i), i ∈ {1, . . . , n} any permutation of the variablesindices,

g(x1, . . . , xn) = P(xo(1))n∏

i=2

P(xo(i)|xo(1), . . . , xo(i−1))

(easy to demonstrate by recursion: just replace conditionalprobabilities by their definition)

Example: the robot start problemg(St, PS, C, VM)= P(PS)P(C|PS)P(VM|PS, C)P(St|PS, C, VM)

18 / 64

Page 21: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

19 / 64

Page 22: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Probabilistic independence and conditionalindependence

Two variables xi and xj are said independent if and only if:

∀(ai , aj) ∈ Ai × Aj , p(ai , aj) = p(ai)p(aj)

Two variables xi and xj are said conditionnaly independentgiven xk if and only if:∀(ai , aj , ak ) ∈ Ai × Aj × Ak , p(ai , aj |ak ) = p(ai |ak )p(aj |ak )

20 / 64

Page 23: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Example: the robot start problem

Most of the case, a lot of independencies or conditionalindependencies arise as reasonable hyptothesis in aprobabilistic modelling.Reasonable hypothesis:• Power State? and Connected? are independent• Voltage Measure and Connected? are conditionnally

independent given Power State?• Start? and Voltage Measure are conditionnally

independent given Power State? and Connected? .

g(St, PS, C, VM)

= P(PS)P(C|PS/////////////////)P(VM|PS, C/////////)P(St|PS, C, VM///////////////////)

= P(PS)P(C)P(VM|PS)P(St|PS, C)

21 / 64

Page 24: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

1 Power State? and Connected? are independent:P(PS, C) = P(PS)P(C).

P(C|PS) =P(C, PS)

P(PS)=

P(PS)//////////////////////////////////P(C)

P(PS)//////////////////////////////////

22 / 64

Page 25: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).

22 / 64

Page 26: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:

P(VM|PS, C) =P(VM, PS, C)

P(PS, C)

22 / 64

Page 27: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:

P(VM|PS, C) =P(VM, PS, C)

P(PS, C)

Use of product rule:

P(VM, PS, C) = P(PS)P(VM, C|PS)

= P(PS)P(VM|PS)P(C|PS)

22 / 64

Page 28: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:

P(VM|PS, C) =P(VM, PS, C)

P(PS, C)

P(VM|PS, C) =P(PS)P(VM|PS)P(C|PS)

P(PS, C)

P(VM|PS, C) =

P(PS)//////////////////////////////////P(VM|PS)

P(PS, C)/////////////////////////////////////////////////

P(C, PS)/////////////////////////////////////////////////

P(PS)//////////////////////////////////

22 / 64

Page 29: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).

P(VM|PS, C) = P(VM|PS)

22 / 64

Page 30: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

the robot start problem

Simple substitutions gives this simplifications from thehypothesis:

3 Start? and Voltage Measure are conditionnallyindependent given Power State? and Connected? :P(St, VM|PS, C) = P(St|PS, C)P(VM|PS, C).Same as for hypothesis (2) by replacing St by C andthe group PS, C by PS.

22 / 64

Page 31: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDFProduct rule

Independencies

Graphicalmodels

Sum-ProductAlgorithm

Independencies immediate utility: memory gain

If each variable takes values in a finite alphabet of size Kand no independence assumption is made:g(x1, . . . , xn) required a grid of size K n.The memory size of P(xi |x1, . . . , xi−1) is K × K i−1 = K i .If p conditional indepence assumptions are made thememory size reduced to: K × K i−1−p = K i−p

no independencies with independenciesg(St, PS, C, VM) P(PS)P(C)P(VM|PS)P(St|PS, C)

23 ∗ 200 = 1600 2 + 2 + 200× 2 + 2 ∗ 4 = 412

23 / 64

Page 32: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

24 / 64

Page 33: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

P(PS)P(C|PS)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 34: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

P(PS)P(C|PS)P(VM|PS, C)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 35: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

Full graph:

P(PS)P(C|PS)P(VM|PS, C)P(St|PS, C, VM)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 36: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

P(PS)P(C|PS/////////////////)P(VM|PS, C)P(St|PS, C, VM)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 37: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

P(PS)P(C)P(VM|PS, C/////////)P(St|PS, C, VM)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 38: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

P(PS)P(C)P(VM|PS, )P(St|PS, C, VM///////////////////)

Power State?

VoltageMeasure

Start?

Connected?

1

25 / 64

Page 39: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Bayesian networks (BN): definition (1)

• BN are Directed Acyclic Graphs (DAGs) that expressesa certain factorization of a JPDF.

• The graph as a polytree structure: it is possible todefine an order o over the nodes.If there is a directed path from xi to xj in the graph theno(j) > o(i).Let’s x1, . . . , xn ordered following o.

26 / 64

Page 40: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Bayesian networks: definition (2)

• o is used in the factorization of the JPDF induced bythe product rule:

g(x1, . . . , xn) = P(x1)n∏

i=2

P(xi |x1, . . . , xi−1)

If there is no edge from xk to xj then in the factorP(xj |x1, . . . , xj−1) xk can be simplified at the right handside:

P(xj |x1, . . . , xk , . . . , xj−1) := P(xj |x1, . . . , xk////////////, . . . , xj−1)

27 / 64

Page 41: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Bayesian networks: definition (3)

• The JPDF is equivalently defined by the BN or thefollowing factorization:

g(x1, . . . , xn) :=n∏

j=1

P(xj |paj)

where paj is the set of parents of xj . It is the set ofvariables xi such that there exists an edge from xi to xjin the BN.

28 / 64

Page 42: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Interests and drawbacks of Bayesian networks

+ each term of the factorization is a probabilitydistribution:

1 clear semantic,2 normalized;

− does not represent each possible decomposition withprobability distributions:

P(A, B, C, D, E) = P(C)P(D|C)P(A, B|C, D)P(E |C, B);

− does not represent each possible factorization of theJPDF.

29 / 64

Page 43: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

30 / 64

Page 44: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Factor Graphs

Hypothesis: g(x1, . . . , xn) factors into a product of severallocal functions, each having some subset of {x1, . . . , xn} asarguments:

g(x1, . . . , xn) =∏j∈J

fj(Xj)

where J is a discrete index set, Xj is a subset of {x1, . . . , xn}and fj(Xj) is a function that depends only on the variables inXj .If Xj = {v1, . . . , vp}, fj(Xj) = fj(v1, . . . , vp).

Factor graphs represent all possible factorizations of theJPDF.

31 / 64

Page 45: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Factor Graphs

Hypothesis: g(x1, . . . , xn) factors into a product of severallocal functions, each having some subset of {x1, . . . , xn} asarguments:

g(x1, . . . , xn) =∏j∈J

fj(Xj)

where J is a discrete index set, Xj is a subset of {x1, . . . , xn}and fj(Xj) is a function that depends only on the variables inXj .If Xj = {v1, . . . , vp}, fj(Xj) = fj(v1, . . . , vp).

Factor graphs represent all possible factorizations of theJPDF.

31 / 64

Page 46: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Factor Graphs

Sometimes factor graphs for JPDF are expressed asfollows:

g(x1, . . . , xn) =1Z

∏j∈J

fj(Xj)

where Z =∑

(x1,··· ,xn)

∏j∈J fj(Xj), such that g is normalized.

It is possible to consider a special factor node: f0 = 1Z with

X0 = ∅ so that f0 is not linked to any variable node.

In those cases: factors are not necessarily normalizedanymore and thus are not necessarily probabilisticdistributions.

32 / 64

Page 47: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Factor Graphs

Sometimes factor graphs for JPDF are expressed asfollows:

g(x1, . . . , xn) =1Z

∏j∈J

fj(Xj)

where Z =∑

(x1,··· ,xn)

∏j∈J fj(Xj), such that g is normalized.

It is possible to consider a special factor node: f0 = 1Z with

X0 = ∅ so that f0 is not linked to any variable node.

In those cases: factors are not necessarily normalizedanymore and thus are not necessarily probabilisticdistributions.

32 / 64

Page 48: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Factor Graphs

Definition: a factor graph is a bipartite graph that expressesthe structure of the factorization hypothesis. A factor graph

has a variable node xi for each variable xi and factor

node fjfor each local function fj . The nodes of the

graph only connect a variable node to a factor node(bipartite property). A variable node xi is edge-connected toa factor node fj if and only if xi is an argument of fj or xi ∈ Xj .

33 / 64

Page 49: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

GraphicalmodelsBayesian networks

Factor Graphs

Sum-ProductAlgorithm

Example: the robot start problem

VoltageMeasure

Power State? Start? Connected?

P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )

34 / 64

Page 50: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Bayesian inference for factor graphs: thesum-product algorithm

Reminder: the goal of a probabilistic model is to calculatethe conditional jpdf

B := P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q))

NOW: exploit factorization of the JPDF to speed upbayesian inference.

35 / 64

Page 51: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Factorization property

Idea: reorganize sums of products into products of sumsfollowing the distributive law:

ab + ac = a(b + c)

36 / 64

Page 52: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Factorization property

Idea: reorganize sums of products into products of sumsfollowing the distributive law:

ab + ac = a(b + c)

2 MULT, 1 ADD become 1 MULT, 1 ADD

36 / 64

Page 53: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Factorization property

Idea: reorganize sums of products into products of sumsfollowing the distributive law:

ab + ac + ad + ae + · · ·+ az = a(b + c + · · ·+ z)

25 MULT, 25 ADD become 1 MULT, 25 ADD

36 / 64

Page 54: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Outline

1 Motivation

2 JPDFDefinitionsDefinitions and rules for JPDF

3 Factorization of JPDFProduct ruleIndependencies

4 Graphical modelsBayesian networksFactor Graphs

5 Sum-Product AlgorithmSingle marginal function

Marginal for a chainMarginal for a tree

37 / 64

Page 55: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainChain definition (1)

A path without cycle or a chain describes a JPDF in whicheach variable has at most one parent and at most one child,in a BN point of view. It leads to the following factor graph:

g(x1, . . . , xn) = P(x1)n∏

j=2

P(xj |xj−1)

= f1(x1)f2(x1, x2) · · · fn(xn, xn−1)

38 / 64

Page 56: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainChain definition (2)

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

Definition: a chain without cycle is a sequence of verticesand edges in a graph:

c = v0, e1, v1, e2, · · · , vn−1, en, vn

such that the edge ei joins the vertices vi−1 and vi and thateach vertex appears only one time.

39 / 64

Page 57: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainSum rearrangements and message definition (1)

Example: g(x1, . . . , x6) = f1(x1)∏6

j=2 fj(xj , xj−1).

Marginal for x4:

P(x4) =∑

x1,x2,x3,x5,x6

g(x1, . . . , x6) =∑∼{x4}

g(x1, . . . , x6)

where the notation ∼ {xi} stands forx1, · · · , xi−1, xi+1, · · · , xn i.e. all variables except xi .

40 / 64

Page 58: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorization

P∼ {x4}

g(x1, . . . , x6) =

Px1, x2, x3

f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)

8<: Px5, x6

f5(x5, x4)f6(x6, x5)

9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

41 / 64

Page 59: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorization

P∼ {x4}

g(x1, . . . , x6) =

8<: Px1, x2, x3

f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)

9=;8<: P

x5, x6

f5(x5, x4)f6(x6, x5)

9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

41 / 64

Page 60: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorization

P∼ {x4}

g(x1, . . . , x6) =8<: Px1, x2, x3

f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)

9=; ×

8<: Px5, x6

f5(x5, x4)f6(x6, x5)

9=;µα(x4) × µβ(x4)

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µα(x4) µβ(x4)

41 / 64

Page 61: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorizationP

∼ {x4}g(x1, . . . , x6) =8<: P

x3

f4(x4, x3)

8<: Px1, x2

f3(x3, x2)f1(x1)f2(x2, x1)

9=;9=;

×

8<: Px5, x6

f5(x5, x4)f6(x6, x5)

9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µα(x4) µβ(x4)

µα(x3)

41 / 64

Page 62: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorizationP

∼ {x4}g(x1, . . . , x6) =8<: P

x3

f4(x4, x3)

8<: Px1, x2

f3(x3, x2)f1(x1)f2(x2, x1)

9=;9=;

×

8<: Px5

f5(x5, x4)

8<: Px6

f6(x6, x5)

9=;9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µα(x4) µβ(x4)

µα(x3) µβ(x5)

41 / 64

Page 63: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainx4 is a pivot for factorizationP

∼ {x4}g(x1, . . . , x6) =8<: P

x3

f4(x4, x3)

8<: Px2

f3(x3, x2)

8<: Px1

f1(x1)f2(x2, x1)

9=;9=;

9=;×

8<: Px5

f5(x5, x4)

8<: Px6

f6(x6, x5)

9=;9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µα(x4) µβ(x4)

µα(x3) µβ(x5)

µα(x2)

41 / 64

Page 64: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainGraphical definition of messages using recursionP

∼ {x4}g(x1, . . . , x6) =8<: P

x3

f4(x4, x3)

8<: Px2

f3(x3, x2)

8<: Px1

f1(x1)f2(x2, x1)

9=;9=;

9=;×

8<: Px5

f5(x5, x4)

8<: Px6

f6(x6, x5)

9=;9=;

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µα(x4) µβ(x4)µα(x3) µβ(x5)µα(x2)

42 / 64

Page 65: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainMathematical insight of the message operations (1)

•∑

∼ {x4}g(x1, . . . , x6) = µα(x4)× µβ(x4)

• x4 can take K values: {a14, . . . , aK

4 }• the product of two messages is a vector: µα(x4)

1 × µβ(x4)1

...µα(x4)

K × µβ(x4)K

43 / 64

Page 66: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainMathematical insight of the message operations (2)

• µβ(x4) =

∑x5

f5(x5, x4)

∑x6

f6(x6, x5)

• x5 can take P values: {a15, . . . , aP

5 }• (f5(x5, x4) is discretized) the next message is obtained

by a matrix vector operation:

26664µβ(x4)

1

...µβ(x4)

K

37775=

26666664f5(a1

5, a14) f5(a2

5, a14) . . . f5(aP

5 , a14)

f5(a15, a2

4) . . . f5(aP5 , a2

4)...

. . ....

f5(a15, aK

4 ) f5(a25, aK

4 ) . . . f5(aP5 , aK

4 )

37777775

26666664µβ(x5)

1

µβ(x5)2

...µβ(x5)

P

37777775

44 / 64

Page 67: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainMathematical insight of the message operations (2)

• µβ(x4) =

∑x5

f5(x5, x4)µβ(x5)

• x5 can take P values: {a1

5, . . . , aP5 }

• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:

26664µβ(x4)

1

...µβ(x4)

K

37775=

26666664f5(a1

5, a14) f5(a2

5, a14) . . . f5(aP

5 , a14)

f5(a15, a2

4) . . . f5(aP5 , a2

4)...

. . ....

f5(a15, aK

4 ) f5(a25, aK

4 ) . . . f5(aP5 , aK

4 )

37777775

26666664µβ(x5)

1

µβ(x5)2

...µβ(x5)

P

37777775

44 / 64

Page 68: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainMathematical insight of the message operations (2)

• µβ(x4) =

∑x5

f5(x5, x4)µβ(x5)

• x5 can take P values: {a1

5, . . . , aP5 }

• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:

26664µβ(x4)

1

...µβ(x4)

K

37775=

26666664f5(a1

5, a14) f5(a2

5, a14) . . . f5(aP

5 , a14)

f5(a15, a2

4) . . . f5(aP5 , a2

4)...

. . ....

f5(a15, aK

4 ) f5(a25, aK

4 ) . . . f5(aP5 , aK

4 )

37777775

26666664µβ(x5)

1

µβ(x5)2

...µβ(x5)

P

37777775

44 / 64

Page 69: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainMathematical insight of the message operations (2)

• µβ(x4) =

∑x5

f5(x5, x4)µβ(x5)

• x5 can take P values: {a1

5, . . . , aP5 }

• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:

26664µβ(x4)

1

...µβ(x4)

K

37775=

26666664f5(a1

5, a14) f5(a2

5, a14) . . . f5(aP

5 , a14)

f5(a15, a2

4) . . . f5(aP5 , a2

4)...

. . ....

f5(a15, aK

4 ) f5(a25, aK

4 ) . . . f5(aP5 , aK

4 )

37777775

26666664µβ(x5)

1

µβ(x5)2

...µβ(x5)

P

37777775

44 / 64

Page 70: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainFactor node information representation

In a discretized factor node: the matrix f5(x5, x4) is stored.

K

f5(a15, a1

4) f5(a25, a1

4) . . . f5(aP5 , a1

4)f5(a1

5, a24) . . . f5(aP

5 , a24)

.... . .

...f5(a1

5, aK4 ) f5(a2

5, aK4 ) . . . f5(aP

5 , aK4 )

︸ ︷︷ ︸

P

45 / 64

Page 71: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainInference complexity

For discretized variables (all with K cases):26666664µβ(x4)

1

µβ(x4)2

...µβ(x4)

K

37777775=

26666664f5(a1

5, a14) f5(a2

5, a14) . . . f5(aK

5 , a14)

f5(a15, a2

4) . . . f5(aK5 , a2

4)...

. . ....

f5(a15, aK

4 ) f5(a25, aK

4 ) . . . f5(aK5 , aK

4 )

37777775

26666664µβ(x5)

1

µβ(x5)2

...µβ(x5)

K

37777775

• for a message: K 2 sums and K 2 products,• marginalizing over one variable among N: N − 1

messages,inference for a chain:

O((N − 1)K 2) operations

to compare with O(K N−1) in the general case.

46 / 64

Page 72: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a chainContinuous factor nodes

Where a variable is continuous, sums becomes integrals.

−→ computational overhead

BUT only the definition of the functional f5(x5, x4) is needed(instead of matrices).

Example: f5(x5, x4) = N (x4, σ4)(x5) = 1σ4√

2πe

12 (

x5−x4σ4

)2

Warning: at factor nodestorage = function code definition + parameters (here σ4)

47 / 64

Page 73: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeSum rearrangements (1)

Extension of the chain algorithm to a tree is possible.Let the graph of g(x1, · · · , xn) =

∏j fj(Xj) be a tree.

For a marginal over xi :

P(xi) =∑

∼ {xi}

∏j

fj(Xj)

pick up a variable node, xi , as the root of the tree (that’salways possible with trees).

48 / 64

Page 74: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

For a marginal calculation over Start?:

VoltageMeasure

Power State? Start? Connected?

P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )

49 / 64

Page 75: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

For a marginal calculation over Start?:

VoltageMeasure

Power State? Start? Connected?

P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )

49 / 64

Page 76: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

For a marginal calculation over Start?:VoltageMeasure

Power State? Start? Connected?

P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

49 / 64

Page 77: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeSum rearrangements (2)

Let st(xi) the set of all the subtrees connected to xi .They are disjoint subtrees and then:

P(xi) =∑

∼ {xi}

∏j

fj(Xj) =∑

∼ {xi}

∏s∈st(xi )

Fs(xi , Ys)

• Ys is the set of all the variables in the subtree s,• Fs(xi , Ys) is the product of all the factors in the subtree

s,• in a tree there is at most one path that link one node to

another, so

∀(s1, s2) ∈ st(xi)2, Xs1 ∩ Xs2 = ∅

the factors of different subtree work on disjointvariables.

50 / 64

Page 78: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeSum rearrangements (2)

Let st(xi) the set of all the subtrees connected to xi .They are disjoint subtrees and then:

P(xi) =∑

∼ {xi}

∏j

fj(Xj) =∑

∼ {xi}

∏s∈st(xi )

Fs(xi , Ys)

• Ys is the set of all the variables in the subtree s,• Fs(xi , Ys) is the product of all the factors in the subtree

s,• in a tree there is at most one path that link one node to

another, so

∀(s1, s2) ∈ st(xi)2, Xs1 ∩ Xs2 = ∅

the factors of different subtree work on disjointvariables.

50 / 64

Page 79: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

∑∼{St}

g(St,PS,C,VM)=∑

∼{St}{P(VM|PS)}{P(C)P(St|PS,C)}{P(PS)}

51 / 64

Page 80: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

∑∼{St}

g(St,PS,C,VM)=∑

∼{St}Fj (VM,PS)Fb(St,PS,C)Fg(PS)

51 / 64

Page 81: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

∑∼{St}

g(St,PS,C,VM)=∑

∼{St}fj (VM,PS)fb2(C)fb1(St,PS,C)fg(PS)

51 / 64

Page 82: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeExample: the robot start problem

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

∑∼{St}

g(St,PS,C,VM)=∑

∼{St}{P(VM|PS)}{P(C)P(St|PS,C)}{P(PS)}

51 / 64

Page 83: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeSum rearrangements (3)

As the factors of different subtree work on disjoint variables,it is possible to exchange sums and product locally:

P(xi) =∑

∼ {xi}

∏j

fj(Xj) (1)

=∑

∼ {xi}

∏s∈st(xi )

Fs(xi , Ys) (2)

=∏

s∈st(xi )

∑Ys

Fs(xi , Ys) (3)

=∏

s∈st(xi )

µFs→xi (4)

where µFs→xi :=∑Ys

Fs(xi , Ys).

52 / 64

Page 84: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeFactor to node messages definition (1)

And each subtree s is connected to the node variablethrough a unique factor node fs due to the bipartite propertyof factor graphs such as the message from factor s to node iis defined as:

µfs→xi := µFs→xi .

VoltageMeasure

Power State?

Start? Connected?

P(VM |PS )

P(PS )

P(St |PS ,C )

P(C )

µfj→PS

µfb→PS

µfv→PS

53 / 64

Page 85: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (1)

For each subtree, the processus of pushing the sumsdeeper is continued:

µfs→xi :=∑Ys

Fs(xi , Ys)

=∑Ys

fs(xi , Xs)∏

m∈st(fs)

Fm(Ym)

where:• st(fs) is the set of all the subtrees connected to the

factor node fs;• each subtree m is connected to fs through a unique

variable node xm;• Fm(Ym) is the product of all the factors in the subtree m;• Ym is the set of all the variables in the subtree m.

54 / 64

Page 86: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (2)

µfs→xi :=∑Ys

Fs(xi , Ys)

=∑Ys

fs(xi , Xs)∏

m∈st(fs)

Fm(Ym)

=∑

Xs\{xi}fs(xi , Xs)

∏m∈st(fs)

∑Xm\Xs

Fm(Ym)

=∑

Xs\{xi}fs(xi , Xs)

∏m∈st(fs)

µxm→fs

µxm→fs :=∑

Ym\Xs

Fm(Ym) is the message from node m to

factor s.

55 / 64

Page 87: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (2)

µfs→xi :=∑Ys

Fs(xi , Ys)

=∑Ys

fs(xi , Xs)∏

m∈st(fs)

Fm(Ym)

=∑

Xs \ {xi}fs(xi , Xs)

∏m∈st(fs)

∑Ym\Xs

Fm(Ym)

=∑

Xs \ {xi}fs(xi , Xs)

∏m∈st(fs)

µxm→fs

µxm→fs :=∑

Ym\Xs

Fm(Ym) is the message from node m to

factor s.

55 / 64

Page 88: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (3)

Ys =⋃

m∈st(fs) Ym et Ym \ Xs = Ym \ {xm}

xa xb xt

f1

f2 f3 fi fj

56 / 64

Page 89: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeFactor to node messages definition (2)

Finally:from the previous expansion, factor to node messages fromfs to xi can be written completely recursively from xi , Xs andthe messages from other nodes than xi to fs.

µfs→xi :=∑Ys

Fs(xi , Ys)

=∑

Xs \ {xi}fs(xi , Xs)

∏m∈Xs\{xi}

µxm→fs

57 / 64

Page 90: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (4)

It is possible to expand the messages from node to factor asdone previously.

µxm→fs :=∑

Ym\{xm}Fm(Ym)

As:• Fm(Ym) =

∏k∈st(xm) Fk (Yk ) considering all subtrees

attached to the variable node xm• Ym \ {xm} =

⋃k Yk

• all the set of variables of the subtrees are disjoint

∑Ym\{xm}

Fm(Ym) =∏

k∈st(xm)

∑Yk

Fk (Yk )

=∏

k∈st(xm)

µfk→xm

58 / 64

Page 91: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (4)

As:• Fm(Ym) =

∏k∈st(xm) Fk (Yk ) considering all subtrees

attached to the variable node xm

• Ym \ {xm} =⋃

k Yk

• all the set of variables of the subtrees are disjoint

µxm→fs =∏

k∈st(xm)

µfk→xm

The recursion is done, node to factor messages are productof factor to node messages.

58 / 64

Page 92: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeNode to factor messages definition (5)

It is worth noticing that the factors in the subtrees attachedto xm are all the factors attached to xm except the precedentone in the path: fs.In general we note ne(v) the set of all the neighbour nodesof the node v in the factor graph.So that:

µxm→fs =∏

fk∈ne(xm)\fs

µfk→xm

59 / 64

Page 93: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Marginal for a treeFactor graph messages formulae: update rules

FACTOR TO NODE MESSAGE FORMULA:

µfs→xi =∑

xm∈ne(fs)\{xi}fs(xi , Xs)

∏xm∈ne(fs)\{xi}

µxm→fs

NODE TO FACTOR MESSAGE FORMULA:

µxm→fs =∏

fk∈ne(xm)\fs

µfk→xm

60 / 64

Page 94: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Factor graph messages definitionEnd messages

Definition of end messages:

• for node to factor message:

a vector of one

1...1

x f

µx→f (x) = 1

• for factor to node message:

a vector of function values

fi(a1l )

...fi(aK

l )

xf

µf→x(x) = f(x)

61 / 64

Page 95: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Sum-Product algorithm for trees

1 start from leaves: brodcast end messages to therespective neighbours,

2 apply message update rule recursively,3 in the root: multiply each incomming message.

62 / 64

Page 96: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Exercise: factor graph message definition for achain

Chain example:

x1 x2 x3 x4 x5 x6

f1 f2 f3 f4 f5 f6

µαf1→x1

µαf2→x2

µαf3→x3

µαf4→x4 µβ

f6→x5µβ

f5→x4

µαx1→f2

µαx2→f3

µαx3→f4 µβ

x6→f6µβ

x5→f5

63 / 64

Page 97: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Solution: factor graph message definition for achain

Factor to node message for a chain:

µβfi→xi−1

:=∑xi

fi(xi , xi−1)µβxi→fi

It is a sum of product, no real simplification. Except, there isonly no product with all the differents children of fi .Node to factor message for a chain:

µβxi→fi−1

:= µβfi+1→xi

In this case there is a big simplification: the message isexactly that send by the unique child. Again: there is only noproduct with all the differents children of xi .

64 / 64

Page 98: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Solution: factor graph message definition for achain

Factor to node message for a chain:

µβfi→xi−1

:=∑xi

fi(xi , xi−1)µβxi→fi

It is a sum of product, no real simplification. Except, there isonly no product with all the differents children of fi .Node to factor message for a chain:

µβxi→fi−1

:= µβfi+1→xi

In this case there is a big simplification: the message isexactly that send by the unique child. Again: there is only noproduct with all the differents children of xi .

Same remarks hold for α messages.64 / 64

Page 99: Sum-product and related algorithms for inference

Factor GraphsAlgorithms

Motivation

JPDF

Factorizationof JPDF

Graphicalmodels

Sum-ProductAlgorithmSingle marginalfunction

Marginal for a chain

Marginal for a tree

Bibliography I

• Kschischang, Frey, Loeliger, Factor Graphs and theSum-Product Algorithm (2001)http://citeseer.ist.psu.edu/kschischang01factor.html

• Christopher M. Bishop, Pattern Recognition andMachine Learning, chapter 8, Springer (2006)http://research.microsoft.com/ cmbishop/PRML/Bishop-PRML-sample.pdf

• David J.C. MacKay (2003). Message Passing andExact Marginalization in Graphs. In David J.C. MacKay,Information Theory, Inference, and LearningAlgorithms, pp. 241-247, pp. 334-340. Cambridge:Cambridge University Press.

65 / 64