-
Master Thesis
Tarkus Belief Propagation
On Message Passing Algorithms and Computational
CommutativeAlgebra
conducted at theSignal Processing and Speech Communications
Laboratory
Graz University of Technology, Austria
byCarlos Eduardo Cancino Chacón
Supervisor:Assoc.Prof. Dipl.-Ing. Dr. Franz Pernkopf
Assessors/Examiners:Assoc.Prof. Dipl.-Ing. Dr. Franz PernkopfDr.
Pejman Mowlaee, B.Sc., M.Sc., Ph.D.
Ao.Univ.-Prof. Dipl.-Ing. Dr.techn. Christian Magele
Graz, July 17, 2014
This work was funded by the Mexican National Council for Science
and Technology (CONACyT) under scholarship
217746.
-
ABSTRACT
Probabilistic graphical models are used in several areas,
including signal processing, artificial in-telligence, machine
learning and physics. Belief propagation is among the most popular
messagepassing algorithms for dealing with probabilistic inference.
This algorithm uses the structure ofthe graph representing the
conditional dependencies of the random variables in the
probabilisticmodel to e�ciently calculate marginal probability
distributions. On the other hand, compu-tational commutative
algebra studies the algorithms for characterizing solutions of
systems ofmultivariate polynomial equations. One of the main
concepts in this theory is that of Gröbnerbases. These bases allow
the description of the algebraic and geometric structures generated
bya system of polynomial equations, and can be used to compute the
solutions for such equations.In this thesis, it is shown that
belief propagation can be alternatively understood as
computingmarginal probabilities by solving a system of polynomial
equations. By doing so, the relation-ship between belief
propagation and computational commutative algebra can be explored.
Thenotion of convergence in belief propagation is analyzed in
algebraic terms, which leads to newconditions for convergence that
use the properties of Gröbner bases. These concepts are usedto
derive new proofs for the well-known convergence results of the
belief propagation algorithmfor graphical models with chain, tree
and single loop structures. Furthermore, using the frame-work of
computational commutative algebra, an alternative formulation of
belief propagationis proposed. We denominate this new approach
Tarkus belief propagation. This method is ex-perimentally compared
with standard belief propagation and exact inference using a 2⇥ 2
spinglass. The experimental results show that Tarkus belief
propagation is more computationallyexpensive and less reliable than
standard belief propagation. Nevertheless, this new approachsuggest
an interesting insight into the basic principles of probabilistic
inference, by showing somepossible applications of methods from
computational commutative algebra into the probabilisticgraphical
models.
-
KURZFASSUNG
Probabilistische graphische Modelle werden in unterschiedlichen
Bereichen, wie Signalverar-beitung, Künstliche Intelligenz,
Machine Learning oder Physik verwendet. Dabei ist BeliefPropagation
eines der am häufigsten verwendeten Message Passing-Algorithmen
für probabilis-tische Inferenz. Dieser Algorithmus verwendet die
Struktur des Graphen, welcher die kon-ditionellen Abhängigkeiten
der Zufallsvariablen repräsentiert, um die marginalen
Wahrschein-lichkeitsverteilungen e�zient zu modellieren.
Kommutative Algebra behandelt Algorithmenzum Lösen von Systemen
multivariater polynomischer Gleichungen. Ein wichtiges Konzeptdabei
sind die Gröbnerbasen. Diese Basen beschreiben die algebraischen
und geometrischenStrukturen, welche von einem System polynomischer
Gleichungen generiert werden und kannauch zur Lösung solcher
Gleichungen verwendet werden. In dieser Arbeit wird gezeigt, dass
Be-lief Propagation auch als das Berechnen von marginalen
Wahrscheinlichkeiten durch Lösen einespolynomiellen
Gleichungssystems verstanden werden kann. Dabei wird auch der
Zusammenhangzwischen Belief Propagation und kommutativer Algebra
untersucht. Die Idee der Konvergenz inBelief Propagation wird
algebraisch analysiert, was durch Gröbner Basen zu neuen
Bedingungenfür eine solche Konvergenz führt. Diese Konzepte
werden dafür verwendet, neue Beweise fürbekannte
Konvergenz-Resultate des Belief Propagation Algorithmus für
Graphenmodelle mitKetten-, Baum- und Single-Loop Strukturen zu
finden. Darüber hinaus wird durch die Ver-wendung von
computerisierter kommutativer Algebra eine alternative Formulierung
von BeliefPropagation vorgeschlagen. Wir nennen diesen neuen Ansatz
Tarkus Belief Propagation. DieseMethode wird experimentell an einem
2 ⇥ 2 Spin Glass sowohl mit der herkömmlichen BeliefPropagation,
wie auch mit exakter Inferenz verglichen. Die Resultate zeigen,
dass Tarkus BeliefPropagation einerseits mit einem höheren
Rechenaufwand einher geht, andererseits ermöglichtdiese Methode
interessante Einblicke in die Prinzipien probabilistischer
Inferenz.
-
RESUMEN
Los modelos probabiĺısticos gráficos son utilizados en
diversas áreas, incluyendo procesamientode señales, inteligencia
artificial, aprendizaje automático y f́ısica. Propagación de
Creencias esuno de los algoritmos de paso de mensajes mas usados en
el área de inferencia probabiĺıstica.Este algoritmo usa grafos,
cuya estructura representa dependencias condicionales de
variablesaleatorias en los modelos probabiĺısiticos, para calcular
eficientemente distribuciones marginalesde probabilidad. Por otra
parte, el álgebra conmutativa computacional estudia los
algoritmospara caracterizar las soluciones de sistemas de
ecuaciones polinomiales en múltiples variables.Uno de los
conceptos más importantes en esta teoŕıa son las bases de
Gröbner. Estas basespermiten describir los objetos algebraicos y
geométricos descritos por un sistema de ecuacionespolinomiales. En
esta tesis se muestra que el algoritmo de Propagación de Creencias
puede serentendido como un método para calcular distribuciones
marginales de probabilidad, resolviendoun sistema de ecuaciones
polinomiales. Esto permite explorar la relación entre Propagación
deCreencias y el álgebra conmutativa computacional. La noción de
convergencia en este algoritmoes analizada en términos
algebraicos, lo cual lleva a nuevas condiciones de convergencia,
lascuales usan las propiedades de las bases de Gröbner. Estas
condiciones son usadas para derivardemostraciones alternativas de
los casos conocidos de convergencia del algoritmo de Propagaciónde
Creencias en grafos aćıclicos y grafos monociclo. Usando el marco
teórico de álgebra con-mutativa computacional, se propone una
formulación alternativa al algoritmo de Propagaciónde Creencias,
la cual es denominada Propagación de Creencias Tarkus. Este
método es com-parado experimentalmente con el algoritmo
tradicional de Propagación de Creencias e inferenciaexacta usando
un vidrio de spin de 2 ⇥ 2. Los resultados muestran que el método
Tarkus escomputacionalmente más costoso e impreciso que el
algoritmo original. Sin embargo, esta nuevaformulación ofrece un
punto de vista interesante sobre los principios básicos de
inferencia prob-abiĺıstica.
-
Acknowledgments
I would like to thank Prof. Franz Pernkopf for introducing me to
the area of machine learningand probabilistic graphical models, and
giving me the opportunity to pursue this thesis under
hissupervision. I also thank him for his constant support and
patience through many discussionswe had over the course of this
thesis. I also would like to thank Sebastian Tschiatschek
andMichael Wohlmayr for their useful insight and interesting
commentaries in the preparation ofthis work.
Living in Graz could have been not such a great experience
without my dear friends LindaLüchtrath and Mario Watanabe. I will
always thank them for making me feel at home in thisland so far
away from Mexico. I thank Marisol Carrillo for her friendship, and
for helping me(re)discover latin-american music.
I can’t express enough gratitude to Rafael Cruz and José
Antonio Maqueda, for their uncondi-tional friendship.
Syninoforcimno! Also, I would like to thank Myriam Albor, because
her timeand space always win by a nose to my space-time.
I thank my parents for their unconditional love, encouragement
and for always believing in me.I’m grateful for having the most
awesome siblings in the world! I also thank Casilda Rivera
andWalter Sottolarz, for all their help and support. They have been
like a family to me all theseyears in Austria.
I thank the Mexican National Council for Science and Technology
(CONACyT) for funding thiswork under scholarship 217746.
This thesis is respectfully dedicated to the memory of Mr. Elias
Howe, who, in 1846, inventedthe sewing machine.
-
Statutory Declaration
I declare that I have authored this thesis independently, that I
have not used other than thedeclared sources/resources, and that I
have explicitly marked all material which has been quotedeither
literally or by content from the used sources.
date (signature)
-
Tarkus Belief Propagation
Contents
1 Introduction 131.1 Summary of Contributions . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 141.2 Organization . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 14
2 Probabilistic Graphical Models 172.1 Probability theory
overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 182.2 Graph theory overview . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 192.3 Markov Networks . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4
Probabilistic Inference . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 212.5 Belief Propagation . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Computational Commutative Algebra 293.1 Algebraic Geometry . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293.2 Gröbner Basis . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 323.3 Hilbert’s Nullstellensatz . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4
Elimination Theory . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 38
4 Algebraic Formulation of BP 414.1 Convergence of the (L)BP
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 424.2
Tarkus Belief Propagation . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 50
5 Experiments 535.1 Experimental Setup . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 535.2 Spin Glass . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 535.3 2⇥ 2 Spin Glass . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 545.4 Discussion . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6 Conclusions 596.1 Conjectures and Future Work . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 59
A Maple Code of the TBP for the 2⇥ 2 spin glass 65B Formal
definitions of mathematical structures 68
B.1 Probability theory . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 68B.2 Algebraic structures . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
C Dimension of a Variety 71
D Alternative proof of Theorem 8 75
– xi –
-
Tarkus Belief Propagation
List of Algorithms
2.1 LBP (·) (Loopy) Belief Propagation . . . . . . . . . . . . .
. . . . . . . . . . . . . 263.1 DivAlg(·) Division Algorithm in
K[x1, . . . , xn] . . . . . . . . . . . . . . . . . . . . 343.2
Groebner(·) Buchberger’s Algorithm . . . . . . . . . . . . . . . .
. . . . . . . . . . 353.3 rGB(·) Reduced Gröbner Basis . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 363.4 PolySolve!(·)
Solutions of a system of polynomials . . . . . . . . . . . . . . .
. . . 394.1 TBP (·) Tarkus Belief Propagation . . . . . . . . . . .
. . . . . . . . . . . . . . . . 51
– xii –
-
Tarkus Belief Propagation
1Introduction
Inference in PGMs is used in a wide range of applications,
including signal processing [1], artificialintelligence [2, 3], and
statistical physics [4]. Message Passing Algorithms (MPAs) over
PGMs,are among the most popular methods when dealing with inference
[5], due to their simplicity andcomputational e�ciency [6].
Introduced by Judea Pearl in 1982 [7], the Belief Propagation
(BP)algorithm is an MPA frequently applied in the field of
artificial intelligence [2,3], error correctingcodes [8], speech
recognition [9] and computer vision [10], among others. This
algorithm performsprobabilistic inference iteratively, by
exploiting the structure of the graph in a PGM, to e�cientlycompute
marginal probability distributions [11]. It was shown by Pearl that
BP converges toa solution, and this solution is equal to the true
marginal probability distribution for problemsthat can be
represented by graphical models without cycles [12].
The Loopy Belief Propagation (LBP) algorithm is an extension of
BP to graphs with cycles(also known as loops, and hence the name)
[5, 6]. It has been empirically shown that LBPprovides good
approximate results, although not necessarily the true marginal
probabilities[5, 13]. However, in general, LBP is not guaranteed to
converge to a solution [2, 3]. Yedidaet al. showed a connection
between the convergence of the BP and LBP algorithms and thefixed
points of the Bethe free energy [14], a concept first originated in
thermodynamics thatrepresents the available energy of a physical
system for performing mechanical work [15]. Theinvestigation of the
fixed points of the Bethe free energy has led to the derivation of
convergencecriteria for the LBP algorithm such as the ones proposed
by Ihler et al. [16] and Mooij et al. [17].Weiss showed that the
LBP algorithm converges for graphs with a single loop [3].
On the other hand, commutative algebra studies systems of
polynomial equations, trying to an-swer the questions whether such
systems have finitely or infinitely solutions, and how to
describethem [18]. Although some of the theoretical foundations of
commutative algebra date from theend of the 19th century, it is
only in recent years that it has regained its prominence, due tothe
increase of computational power and the development of new
algorithms [19]. One of themost important concepts in commutative
algebra is that of Gröbner Bases (GBs), introduced byBruno
Buchberger in 1965 [19]. Similar to bases of a vector space, a GB
is a finite set of poly-nomials that allows us to represent all
members of a (possibly infinite) set of polynomials
calledpolynomial ring by polynomial combinations of the elements of
such a GB. These ideas foundtheir way into many applications
ranging from pure mathematics [20] to signal processing [21]
– 13 –
-
1 Introduction
and robotics [18].
In this thesis we explore the relationship between BP and
computational commutative algebraby showing that the equations
describing the message passing in the BP algorithm can beunderstood
as a system of polynomial equations. This result allows us to use
the concept ofGBs to express convergence criteria for the BP
algorithm and to formulate an alternative methodfor computing the
marginal probabilities.
The methods proposed in this work are only suited for toy
examples. However they give aninteresting insight into the basic
principles of probabilistic inference, since they show that
theability of PGMs to represent joint probability distributions as
products of conditional probabilitydistributions can be exploited
using purely algebraic methods to answer probabilistic
inferencequeries.
1.1 Summary of Contributions
1. Convergence of the BP algorithm. Using the framework of
computational commuta-tive algebra, new conditions for convergence
of the BP algorithm can be derived. Theseconditions make use of the
properties of GBs to characterize the solutions of systems
ofpolynomial equations.
2. Convergence for Graphs without loops and for Graphs with a
single loop. Usingthe aforementioned convergence conditions, we
show that the (L)BP algorithm convergesfor graphs without loops and
for graphs with a single loop. While these are well-knownresults
[3, 12], the proofs provided in this thesis present a new and
interesting approach,by using the framework of computational
commutative algebra.
3. Tarkus Belief Propagation. An alternative formulation of the
BP algorithm using GBsis proposed. This algorithm exploits the fact
that BP can be understood as a systemof polynomial equations, and
uses the properties of GBs to find an equivalent system
ofequations, that might be easier to solve. Due to the eclectic
nature of this new algorithm,we call it Tarkus Belief Propagation
(TBP), as an homage to the 1971 eponymous albumby the british
progressive rock band Emerson, Lake & Palmer.
1.2 Organization
The organization of this thesis is summarized as follows:
Chapter 2: In this chapter, the concepts of probabilistic
graphical models, as well as probabilis-tic inference using the
Belief Propagation algorithm are briefly reviewed. We emphasize
howequations of message passing can be seen as a system of
multivariate polynomial equations.
Chapter 3: In this chapter, an overview of the concepts of
algebraic geometry and computationalcommutative algebra, which were
applied in this work, is presented. The concept of a�nevarieties,
i.e. a geometric object that represents the set of roots of a
system of polynomialequations, is briefly reviewed. It is shown how
we can characterize such varieties using Hilbert’sNullstellensatz
and GBs.
Chapter 4: In this chapter, an algebraic formulation of the BP
algorithm is presented. We usethe message passing equations (MPEs)
described by the BP algorithm to define a system ofpolynomials,
which has an associated a�ne variety. By computing the GB of such a
variety,
– 14 –
-
1.2 Organization
conditions for convergence can be found. These methods lead to
the introduction of the TBPalgorithm. This algorithm computes
marginal probability distributions by finding solutions ofthe
MPEs.
Chapter 5: In this chapter, the methods proposed in Chapter 4
are empirically compared toLBP and exact inference using a 2⇥ 2
spin glass.Chapter 6: In the last chapter, the conclusions and
future work of this thesis are provided.Some conjectures about the
e�ciency and stability of methods of computational
commutativealgebra, and its possible applications in PGMs are
presented.
– 15 –
-
Tarkus Belief Propagation
2Probabilistic Graphical Models
Probabilistic Graphical Models (PGMs) have become the method of
choice for dealing withuncertainty and distributions in several
research areas including computer vision [10], speechprocessing
[9], signal processing [1,22], machine learning [13] and in the
area of artificial intelli-gence [23]. By merging graphical models
and probabilistic inference, such a framework allows totransfer
concepts and ideas among di↵erent application areas [5]. One reason
for its popularityis that qualitative patterns of commonsense
reasoning “are naturally embedded within the syntaxof probability
calculus” [12, pp. 19].
PGMs describe the way a joint probability distribution over a
set of N random variables (RVs)can be factored into a product of
conditional probability distributions, defined over smallersubsets
of RVs [24]. Examples of the well-known statistical models that can
be represented asPGMs are hidden Markov models, Kalman filters and
Boltzmann machines [5,11]. The structureof the graphical model
represents the conditional independence between RVs and alleviates
thecomputational burden for model learning and inference [5,
6].
Among the most popular representations of PGMs are Markov
Networks (MN) or undirectedgraphical models, Bayesian Networks
(BNs), or directed graphical models, and Factor Graphs(FGs) [5].
Each representation captures di↵erent aspects of probabilistic
models, and therefore,has its specific advantages and disadvantages
[11]. For the sake of simplicity, in this thesis wefocus only on
MNs. Nevertheless the discussion and methods presented in this
chapter can alsobe extended to BNs and FGs.
If not stated otherwise, the definitions and notation for PGMs
used in this thesis are taken fromthe tutorial by Pernkopf, Perharz
and Tschiatschek [5]. For a more extensive treatment of
thissubject, we refer the reader to the standard text by Koller and
Friedman [11] and the abovementioned tutorial. The rest of this
chapter is organized as follows: In Sections 2.1 and
2.2,respectively, a short review of probability theory and graph
theory is provided. In Section 2.3the representation of PGMs using
MN is presented. In Section 2.4, the concept of
probabilisticinference is reviewed. We conclude this chapter in
Section 2.5, where the BP algorithm for thecase of MNs is described
in detail.
– 17 –
-
2 Probabilistic Graphical Models
2.1 Probability theory overview
The probability distribution of an RV1 can be characterized
using its cumulative distributionfunction, which is related to the
probability density function or the probability mass
function,respectively. These functions are defined as follows:
Definition 1. (Cumulative Distribution Function, Probability
Density Function, Prob-ability Mass Function). The cumulative
distribution function (CDF) of an RV X, denotedas F
X
(x), is defined as the probability of X taking a value less than
or equal to x , i.e.
FX
(x) = P (X x). (2.1)
For a set of N RVs X = {X1, . . . , XN}, the joint CDF is
defined as
FX
(x) = P (X1 x1 \ · · · \XN xN ). (2.2)
If X is a set of continuous RVs, the probability density
function (pdf) is defined as
pX
(x) =@nF
X
(x)
@x1 . . . @xn, (2.3)
where x = {x1, . . . , xN} is an ordered set of N values from R.
In the case of X being a set ofdiscrete RVs, the probability mass
function (pmf) is given as
pX
(x) = P (X1 = x1 \ · · · \XN = xN ), x = {x1, . . . , xN} 2
val(X), (2.4)
where val(X) denotes the set of values which can be assumed by a
set of random variables X.
In this thesis, it is assumed that pX
(x) represents the underlying probability distribution P andwith
slight abuse of notation, p
X
(x) is itself referred as probability distribution. Whenever it
isclear, the shorthand notation p(x) = p
X
(x) is used. We focus only on the case of discrete RVs,and we
will restrict p(x) to be discrete. The number of possible states of
variable X
i
is denotedas sp(X
i
) = |val(Xi
)|. For the case of a set of discrete RVs X the number of
possible states isgiven as
sp(X) =Y
i
sp(Xi
). (2.5)
Definition 2. (Marginal Distribution, Conditional Distribution).
Let X, Y and Z besets of RVs, where Y ✓ X and Z = X \Y, i.e. X = Y
[ Z. The joint distribution over X isthen p(X) = p(Y,Z). The
marginal distribution p(Y) over Y is given as
p(Y) =X
z2val(Z)
P (Y,Z = z) =X
z2val(Z)
p(Y, z). (2.6)
The conditional distribution p(Y|Z) over Y conditioned on Z
is
p(Y|Z) = p(Y,Z)p(Z)
. (2.7)
Using these definitions, we can use the Bayes’ rule to
manipulate conditional probability distri-
1The formal definitions of probability distributions and RVs can
be found in Appendix B.1.
– 18 –
-
2.2 Graph theory overview
butions. This rule states that
p(Z|Y) = p(Y|Z)p(Z)p(Y)
. (2.8)
Definition 3. (Conditional Statistical Independence). Assuming
X,Y and Z are mutuallydisjoint sets of RVs. X and Z are
conditionally statistical independent given Z, denoted asX?Y|Z i↵
p(X,Y|Z) = p(X|Z)p(Y|Z).In case that Z = ;, X and Y are called
statistically independent.The chain rule of probability uses
conditional distributions to factorize an arbitrary
distributionas
p(X) = p(X1)n
Y
i=2
p(Xi
|Xi�1, . . . , X1). (2.9)
This rule holds for any permutation of the indexes of the RVs
X.
2.2 Graph theory overview
Graph theory refers to the study of mathematical structures,
which are used to model pairwiserelations between objects [25]. The
definitions of graphs, the fundamental structures of thistheory,
are presented as follows:
Definition 4. (Graph). A graph G = (X,E) is a tuple consisting
in a set of vertices X (alsocalled nodes), and a set of edges
E.
A graph is said to be directed if all edges e 2 E are directed.
Conversely, if all edges are undi-rected, the graph is said to be
undirected. If the set of edges of a graph contains both
directedand undirected edges, the graph is called mixed. For this
thesis, only undirected graphical mod-els are considered,
therefore, whenever we are speaking of a graph, we are in fact
considering anundirected graph. We introduce the concepts of
neighborhood, cliques and paths, which definethe relationships
between vertices in a graph.
Definition 5. (Neighbor, Degree of a Vertex). Let G be a graph
and Xi
, Xj
2 X, i 6= j. If(X
i
�Xj
) 2 E, then Xi
is a neighbor of Xj
. The set of all neighbors of Xi
is
NbG(Xi) = {Xj | (Xi �Xj) 2 E, Xj 2 X}. (2.10)
The degree of a vertex Xi
, denoted by deg(Xi
) is the number of edges incident to the vertex [25].Edges with
themselves are counted twice.
Definition 6. (Clique, Maximal Clique) Let G be a graph and C ✓
X a subset of the nodesof the graph. C is a clique, if there exists
an edge between all pairs of nodes in C, i.e.
8Ci
, Cj
2 C, i 6= j, : (Ci
� Cj
) 2 E. (2.11)
A clique is called maximal, if adding any node X 2 X \C makes it
no longer a clique.
– 19 –
-
2 Probabilistic Graphical Models
Definition 7. (Path) Let G be a graph. A sequence of nodes Q =
(X1, . . . , Xn) is a path fromX1, . . . , Xn if
(Xi
�Xi+1) 2 E, for 1 i n� 1. (2.12)
X1
X2
X3 X4
(a) Tree
X1 X2 X3
(b) Chain
X1 X2
X4 X3
(c) Single Loop
Figure 2.1: Graphical representation of di↵erent graph
structures.
In this thesis, we consider three basic graph structures:
1. Trees are graphs in which any two di↵erent nodes are
connected by exactly one path. Asan example, consider the tree
shown in Figure 2.1 (a), given by
GBTree
= ({X1, X2, X3, X4}, {(X1 �X2), (X2 �X3), (X2 �X4)}) .
(2.13)
2. Chains are a special case of trees for which the maximal
degree is 2 for all vertices X 2 X.As an example, consider the
chain with three nodes shown in Figure 2.1 (b), given by
GChain3 = ({X1, X2, X3}, {(X1 �X2), (X2 �X3)}) . (2.14)
3. Loops are graphs in which for at least one vertex Xi
2 X there exists a path from Xi
toX
i
. As an example, consider the single loop with four nodes shown
in Figure 2.1 (c), givenby
G2⇥2SG = ({X1, X2, X3, X4}, {(X1 �X2), (X2 �X3), (X3 �X4), (X4
�X1)}) . (2.15)
It can be seen that every finite acyclic graph, i.e. every graph
without loops and a finite numberof nodes, can be formed by
concatenating trees. A proof of this result is shown in [25,
pp.
– 20 –
-
2.3 Markov Networks
311].
2.3 Markov Networks
In this section, we introduce an undirected graphical model,
known as Markov Networks (MN),or Markov Random Fields. Certain
factorization and conditional independence properties of ajoint
probability distribution can be expressed in such a way. They can
be defined as follows:
Definition 8. (Markov Network) A Markov Network is a tuple M =
(G, ), where G is anundirected graph with nodes X = {X1, . . . ,
XN} representing RVs, and C1, . . . ,CL are maximalcliques in G.
The set = {
C1 , . . . , CL
} is called set of potentials, and C
i
: val(Ci
) 7! R�0are nonnegative functions.
The joint probability distribution of X defined by the MN is
given by
pM(X1, . . . , XN ) =1
Z
L
Y
l=1
C
l
(Cl
), (2.16)
where Z is a normalization constant (also referred to as
partition function) calculated as
Z =X
x2val(X)
L
Y
l=1
C
l
(x(Cl
)). (2.17)
Using the above definition, we can compute the joint probability
distribution for the MNsgenerated by the example graphs from
Section 2.2. These joint probabilities are later used inChapter 5.
For the tree G
BTree
, the chain GChain3, and the single loop G2⇥2SG, respectively,
are
given by
pBtree
(X1, X2, X3, X4) =1
Z
X1,X2(X1, X2) X2,X3(X2, X3) X2,X4(X2, X4), (2.18)
pChain3(X1, X2, X3) =
1
Z
X1,X2(X1, X2) X2,X3(X2, X3), and (2.19)
p2⇥2SG(X1, X2, X3, X4) =1
Z
X1,X2(X1, X2) X2,X3(X2, X3) X3,X4(X3, X4) X4,X1(X4, X1).
(2.20)
2.4 Probabilistic Inference
In this section, concept of probabilistic inference is briefly
reviewed. Lets assume that X, the setof RVs in a MN M, is now
partitioned into the mutually disjoint sets O and Q, i.e. X =
O[Qand O \Q = ;. The variables O are called observed nodes,
referring to the observed evidencevariables, and Q denotes the set
of query variables. Given some observations (or evidence), thetask
of probabilistic inference using PGMs consists in assessing the
marginal or the most likelyconfiguration of variables [5] . There
are two kinds of inference queries:
1. Marginalization: This query tries to infer the marginal
distribution of the query variablesQ conditioned on the observation
O. Using Eq. (2.7), the conditional probability p(Q|O =
– 21 –
-
2 Probabilistic Graphical Models
o) is given by
p(Q|O = o) = p(Q,O = o)p(O = o)
. (2.21)
Using Eq. (2.6), the term p(O = o) can be computed as
p(O = o) =X
q2val(Q)
p(Q = q,O = o). (2.22)
2. Maximum a-posteriory (MAP): This query tries to infer the
most likely instantiationof the query variables Q given the
observations O, i.e.
q⇤ = argmaxq2Q
p(Q = q|O = o). (2.23)
Both of these queries can be answered directly evaluating the
sums in their corresponding equa-tions. Nevertheless this approach
becomes intractable for models with many variables [3, 26].As an
example, let us consider the marginal p(O = o) from Eq. (2.22).
Using Eq. (2.5) it can be
seen that the computation of the marginal involvesQ|Q|
i=1 sp(Qi) summations. If we assume thatsp(Q
i
) = k for all Qi
’s, the number of summations required to evaluate the marginal
p(O = o)simplifies to k|Q|. This implies, that the complexity of
the computation of the marginal p(O = o)is O(k|Q|), i.e. it grows
exponentially with the number of variables.
MPAs are e�cient methods developed for probabilistic inference,
by exploiting the factorizationof the joint probability induced by
the graph structure [26]. A solution for the marginalizationquery,
can be found using the BP algorithm, also known as the sum-product
algorithm [3, 16].For solving the MAP query, the corresponding MPAs
are the max-product algorithm, or itsalternative formulation in
log-domain, the max-sum algorithm [5,11]. The focus of this thesis
ison marginalization queries, i.e. on the sum-product
algorithm.
2.5 Belief Propagation
BP is a procedure which calculates the marginal distribution for
each unobserved node, con-ditioned on the observed nodes. BP is an
iterative process which can be seen as neighboringvariables passing
messages to each other, like: “I, variable X
i
, think on how likely is it that you,variable X
j
, are in state xj
”. This series of conversations are likely to converge to a
consen-sus after enough iterations, which determines the marginal
probabilities of all variables. Theseestimated marginals are called
beliefs [7].
Formally, let M = (G, ) be a MN, with X = {X1, . . . , XN}
variable nodes. A messageµX
i
!Xj
(xj
) from (variable) node Xi
to (variable) node Xj
represents how much Xi
believes thatX
j
will be in the state xj
2 val(Xj
). The belief bX
j
(xj
) of a variable node Xj
to be in state xj
isproportional to the product of all messages from the
neighboring factor nodes, i.e. µ
X
i
!Xj
(xj
)for all X
i
2 Nb(Xj
), i.e.
bX
j
(xj
) =1
Z
Y
X
i
2Nb(Xj
)
µX
i
!Xj
(xj
), (2.24)
where Z is a normalization constant, such thatP
i
bX
j
(xj
) = 1. This message passing updateis graphically represented in
Figure 2.2. We now prove that the BP algorithm converges to
– 22 –
-
2.5 Belief Propagation
... Xi
Xj
µ
X
k
!Xi
(xi
)
µ
X
i
!Xj
(xj
)
X
k
2NbG(Xi
)\{Xj}
Figure 2.2: Representation for the message passing from X
i
to X
j
the true marginal probabilities for the case of graphs without
loops. While this result was firstintroduced in [7], in this thesis
we present a proof that shows the polynomial nature of the
BPalgorithm.
Theorem 1. (Belief Propagation) Let M = (G, ) be an acyclic
pairwise MN, i.e. themaximal cliques consist of only two variables,
with X = {X1, . . . , XN} variable nodes. Then thebelief of X
j
calculated as in Eq. (2.24) is equal to the marginal probability
of Xj
, i.e.
pM(Xj) = bXj
(xj
), (2.25)
if the messages µX
i
!Xj
are computed as
µX
i
!Xj
(xj
) =X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
k
2Nb(Xi
)\{Xj
}
µX
k
!Xi
(xi
). (2.26)
Proof. While the core of this Theorem is the BP algorithm
proposed [7], in this proof, we makeemphasis on the polynomial
nature of the MPE described by Eq. (2.26). Using the
conditionalindependence relations described by the MN allow us to
algebraically manipulate the beliefs.Without loss of generality, we
compute b
X1(x1). The variables in X can be relabeled, such thatG has the
tree structure shown in Figure 2.3, where X1 is the root. We use
I11 = Nb(X1) ={X
I
11, . . . , X
I
1m
} to denote the neighbor variable nodes of X1, i.e. the nodes
located in level I1.The set Ik+1
I
k
i
= Nb(XI
k
i
) \ {XI
k�1j
} denotes the neighbors of variable node XI
k
i
, i.e. the i-th
variable node in level Ik, that are entirely in level Ik+1. We
use xI
k
i
as a shorthand notation
for xI
k
i
2 val(XI
k
i
). Substituting the messages from Eq. (2.26) in Eq. (2.24), the
beliefs can be
– 23 –
-
2 Probabilistic Graphical Models
I
Y
. . . I
2I
1
X
I
Y
1
.
.
.
.
.
. X
I
21
.
.
.
... ... XI
11
.
.
.
.
.
. X
I
2j
X1
.
.
.
.
.
.
.
.
.
X
I
Y
n
. . . . . . X
I
1m
Figure 2.3: Topology of a general tree used for the proof of
Theorem 1
computed as
bX1(x1) =
1
Z
Y
X
i
2I11
µX
i
!X1(x1)
=1
Z
0
B
B
@
X
x
I
11
X
I1 ,X1(x
I
11, x1)
Y
X
k
2I2I
11
µX
k
!XI
11(x
I
11)
1
C
C
A
⇥ . . .
⇥
0
B
@
X
x
I
1m
X
I
1m
,X1(xI1m
, x1)Y
X
k
2I2I
1m
µX
k
!XI
1m
(xI
1m
)
1
C
A
. (2.27)
Expanding every product of the formQ
k
µk!i(xi) results in
Y
X
k
2I2I
1i
µX
k
!XI
1i
(xI
1i
) =
0
B
@
X
x
I
2j
X
I
2j
,X
I
1i
(xI
2j
, xI
1i
)
0
B
@
X
x
I
3l
X
I
3l
,X
I
2j
(xI
3l
, xI
2j
)
. . .
0
@
X
x
I
Y
z
X
I
Y
z
,X
I
Y �1y
(xI
Y
z
, xI
Y �1y
)
1
A . . .
1
A
1
A . . .
1
A . (2.28)
– 24 –
-
2.5 Belief Propagation
Using that the sum is distributive over products (see Section
3.1), and that every potential ispairwise defined, the above
equation can be written as
Y
X
k
2I2I
1i
µX
k
!XI
1i
(xI
1i
) =X
x
I
2j
· · ·X
x
I
Y
z
X
I
2j
,X
I
1i
(xI
2j
, xI
1i
) . . . X
I
Y
z
,X
I
Y �1y
(xI
Y
z
, xI
Y �1y
). (2.29)
Here we recognize that the above equation is the sum over all
states xI
k
i
2 val(XI
k
i
) of variable
nodes XI
k
i
in levels I2, . . . , IY along the path that started in XI
1i
. Because of the tree structureof the graph, there is no
variable X
I
k
q
that belongs the neighborhoods of both XI
k�1i
and XI
k�1j
for all i 6= j. Hence, every pairwise potential appears only
once, and there is only a summationin every variable, (i.e. the
maximal degree of every pairwise potential is 1). It can be seen,
thatthis statement is not true in loopy graphs. Substituting Eq.
(2.29) in Eq. (2.27) results in
b
X1(x1) =1
Z
0
B@X
x
I
11
X
I
11,X1(xI11
, x1)
X
x
I
2j
· · ·X
x
I
Y
z
X
I
2j
,X
I
1i
(x
I
2j
, x
I
1i
) . . .
X
I
Y
z
,X
I
Y �1y
(x
I
Y
z
, x
I
Y �1y
)
1
CA
| {z }µ
X
I
11!X1(x1)
⇥ . . .
⇥
0
BB@X
x
I
1m
X
I
1m
,X1(xI1m
, x1)
X
x
I
2j
0
· · ·X
x
I
Y
z
0
X
I
2j
0,X
I
1i
0(x
I
2j
0, x
I
1i
0) . . .
X
I
Y
z
0,X
I
Y �1y
0(x
I
Y
z
0, x
I
Y �1y
0)
1
CCA
| {z }µ
X
I
1m
!X1 (x1)
=
1
Z
X
x
I
11
· · ·X
x
I
Y
n
X
I
11,X1(xI11
, x1) . . . XI
Y
n
,X
I
Y �1z
(x
I
Y
n
, x
I
Y �1z
). (2.30)
Comparing the right-hand-side of the above equation with Eq.
(2.16), the belief bX1(x1) simplifies
to
bX1(x1) =
X
x
I1
· · ·X
x
Y
n
pM(X1, . . . , XYn
), (2.31)
which according with Eq. (2.6), is the marginal probability
pM(X1).
It should be noted, that the messages per se do not necessarily
represent probabilities and needto be normalized. Nevertheless, to
avoid numerical instabilities [2,3,16], a common formulationof the
MPE defined by Eq. (2.26) includes a normalization as follows:
µX
i
!Xj
(xj
) = Zi!j
X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
k
2Nb(Xi
)\{Xj
}
µX
k
!Xi
(xi
), (2.32)
where Zi!j is a normalization constant2 such that
X
x
j
2val(Xj
)
µX
i
!Xj
(xj
) = 1. (2.33)
The above equations can be understood as a system of polynomial
equations, where the messagesµX
i
!Xj
(xj
) and the normalization constants Zi!j are the variables.
Theorem 1 guarantees that
as long as the graph has a tree-like structure, the beliefs
calculated obtained from message
2Since the potentials are strictly positive functions, it is
easy to see that Z
i!j > 0. Contrary to references
[3, 5, 11, 14, 16], in this thesis we decided to define this
constant as a multiplicative factor, in order to use the
theoretical framework of commutative algebra from Chapter 3. A
similar formulation can be found in [17].
– 25 –
-
2 Probabilistic Graphical Models
passing converge to the true marginals. The reason behind this
exactness is attributed to thefact that in a tree-like graph,
messages received by a node from its neighbors are independent.This
does not always hold in presence of loops, which make the neighbors
of a node correlatedand therefore, the messages are no longer
independent [26]. Nevertheless, convergence of BP ingraphs with
cycles has been experimentally confirmed for many applications
[27]. This LoopyBelief Propagation for general graphs has been
successfully applied in many areas [5,14]. In thiscase, the (L)BP,
shown in Algorithm 2.1, is said to converge to a solution [5, 11,
16], althoughnot necessarily to the true marginals, if there is a
finite k > 0 such that
µ(k)X
i
!Xj
(xj
) = µX
i
!Xj
(xj
). (2.34)
Algorithm 2.1: LBP (·) (Loopy) Belief Propagation(Adapted from
[26])input : a MN M = (G, {
C1 , . . . , CL
});maximum number of iterations k
max
;requested precision ✏
output: the set of beliefs {bX
j
(xj
) | Xj
2 X}Initialize messages m(0)
X
i
!Xj
(xj
) = 1 for all pairs of variable nodes Xi
, Xj
for which
(Xi
�Xj
) 2 E.for k = 1: k
max
do8(X
i
�Xj
) 2 E:1. Update:
m(k)X
i
!Xj
(xj
) :=X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
l
2Nb(Xi
)\{Xj
}
m(k�1)X
l
!Xi
(xi
). (2.35)
2. Compute normalization constant:
Z(k)i!j :=
1P
x
j
2val(Xj
)m(k)X
i
!Xj
(xj
)(2.36)
3. Normalize messages:
µ(k)X
i
!Xj
(xj
) := Z(k)i!jm
(k)X
i
!Xj
(xj
) (2.37)
if |µ(k+1)X
i
!Xj
(xj
)� µ(k)X
i
!Xj
(xj
)| < ✏ 8(Xi
�Xj
) 2 E thenbreak;
if k = kmax
thenreturn UNCONVERGED;else for X
i
2 X dobX
i
(xi
) :=Q
X
j
2Nb(Xi
) µXj!Xi(xi)
return {bX
j
(xj
) | Xj
2 X};
The work of Yedidia et al. represents perhaps the most
significant breakthrough in the studyof convergence of the (L)BP
algorithm. Their work came with the insight that the (L)BPconverge
to stationary points of an approximate free energy, known as Bethe
free energy in
– 26 –
-
2.5 Belief Propagation
statistical physics [14,27]. The investigation of the fixed
points of the Bethe free energy has ledto the derivation of
convergence criteria for the (L)BP algorithm. Following this work,
severalconvergence criteria for BP have been proposed, including
the work of Ihler et al. [16] and Mooijet al. [17]. Using results
from linear algebra, Weiss showed that the LBP algorithm
convergesfor graphs with a single loop [3].
In Chapter 4, a reformulation of this convergence condition in
terms of computational commu-tative algebra is given. In order to
do so, we use the fact that the equations describing themessages
involved in BP can be seen as a system of multivariate polynomials,
and thus, solv-ing the convergence problem is analog to the problem
of finding the solutions of a system ofpolynomial equations.
– 27 –
-
Tarkus Belief Propagation
3Computational Commutative Algebra
As discussed in Chapter 2, the Belief Propagation algorithm can
be interpreted as finding theroots of a system of polynomial
equations. This system is explicitly shown in Eq. (2.32) and
Eq.(2.33). Therefore it would be interesting to investigate some
theoretical background that allowsus to manipulate and solve such a
system. In this chapter, the framework of computationalcommutative
algebra used for this purpose is provided. If not stated otherwise,
the definitionsand notation are taken from the book by Cox, Little
and O’ Shea [18]. We refer the reader tothis standard text for a
more extensive treatment of this subject.
The rest of this chapter is structured as follows: In Section
3.1, a short overview of the basicconcepts of algebraic geometry
and commutative algebra is provided. In Section 3.2, the conceptof
Gröbner Basis is reviewed. In Section 3.3, the Hilbert’s
Nullstellensatz and its connection tothe existence and number of
solutions of a system of polynomial equations are discussed.
Weclose this Chapter with the basics of elimination theory as well
as some applications to solvingsystems of polynomial equations.
3.1 Algebraic Geometry
Algebraic Geometry is the study of systems of polynomial
equations and its relation with ge-ometrical objects. The solutions
of a system of polynomial equations form a geometric objectcalled
variety, whose corresponding algebraic object is called ideal.
These concepts allow usto answer such questions as whether a system
of polynomial equations has finitely or infinitelymany solutions,
and how to characterize them [18,19]. We begin our discussion about
solutionsof polynomials by defining the basic algebraic structures
called rings and fields. A more formaldefinition of these
structures is provided in Appendix B.2.
Definition 9. (Ring) A ring is a triple (A,+, ·), where A is a
set, and · and + are binaryoperations defined on A for which the
following conditions are satisfied:
1. (associative) (a+ b) + c = a+ (b+ c) and (a · b) · c = a · (b
· c) 8 a, b, c 2 A.2. (commutative) a+ b = b+ a and a · b = b · a
8a, b 2 A.
– 29 –
-
3 Computational Commutative Algebra
3. (distributive) a · (b+ c) = a · b+ a · c 8a, b, c 2 A.4.
(identities) There are 0, 1 2 A such that a+ 0 = a and a · 1 = a 8a
2 A.5. (additive inverse) Given a 2 A, there is b 2 A such that a+
b = 0.
A common example of a ring is the set of all integer numbers Z.
As a notation remark, in thiswork, Z�0 denotes the set of all
integers equal or larger than zero.
Definition 10. (Field) A field is a triple (K,+, ·), where K is
a set, and · and + are binaryoperations defined on K for which the
following conditions are satisfied
1. (associative) (a+ b) + c = a+ (b+ c) and (a · b) · c = a · (b
· c) 8 a, b, c 2 K,2. (commutative) a+ b = b+ a and a · b = b · a
8a, b 2 K,3. (distributive) a · (b+ c) = a · b+ a · c 8a, b, c 2
K,4. (identities) There are 0, 1 2 K such that a+ 0 = a and a · 1 =
a 8a 2 K,5. (additive inverse) Given a 2 K, there is b 2 K such
that a+ b = 0,6. (multiplicative inverse) Given a 6= 0 2 K, there
is c 2 K such that a · c = 1
Common examples of fields are the set of all rational numbers Q,
the set of real numbers R andthe set of complex numbers C. With
this definitions, we can study the most important
algebraicstructure used in this thesis, a polynomial ring.
Definition 11. (Monomial, Total degree, Polynomial, Polynomial
Ring) Given a fieldK; x1, . . . , xn; a1, . . . , an 2 K and ↵1, .
. . ,↵n 2 Z�0, a monomial in x1, . . . , xn is a product inthe
form
x↵11 . . . x↵
n
n
. (3.1)
The total degree of a monomial is given as
|↵| =n
X
i
↵i
. (3.2)
As a shorthand notation, we will write x↵11 . . . x↵
n
n
= x↵. A polynomial f in x1, . . . , xn withcoe�cients in K is a
finite linear combination of monomials, that can be written as
f =X
↵
a↵
x↵. (3.3)
The set of all polynomials in x1, . . . , xn with coe�cients in
K is called a polynomial ring, denotedby K[x1, . . . , xn], which
satisfies all the conditions of a commutative ring for the sum and
productof polynomials.
A fieldK is said to be algebraically closed if it contains the
roots of every non-constant polynomialf 2 K[x1, . . . , xn]. The
field of real numbers R is not algebraically closed, because f(x) =
x2+1has no root in R. On the other hand, C is an example of an
algebraically closed field. Usingthese definitions, it is possible
to introduce the basic geometric objects called varieties.
Definition 12. (A�ne Space, A�ne Variety) Let K be a field and n
2 Z�0. The set
Kn = {(a1, . . . , an) : a1, . . . , an 2 K} (3.4)
– 30 –
-
3.1 Algebraic Geometry
is called the a�ne space over K. Furthermore, let f1, . . . , fs
be polynomials in K[x1, . . . , xn],then the set
V(f1, . . . , fs) = {(a1, . . . , as) 2 Kn : fi(a1, . . . , as)
= 0 8 1 i s} (3.5)
is called the a�ne variety defined by f1, . . . , fs over the
a�ne space Kn.
Figure 3.1: Example of a Variety in R3: Paraboloid 12x2+ 2y
2 � z = 0
As an example, Figure 3.1 shows the variety V(12x2 + 2y2 � z) ⇢
R3, i.e. the paraboloid given
by the set of all points that satisfy 12x2 + 2y2 � z = 0.
The analog algebraic objects that allow us to characterize
varieties are ideals. These objects canbe defined as follows:
Definition 13. (Ideal, Finitely generated Ideal, Basis of an
Ideal) A subset of a poly-nomial ring I ⇢ K[x1, . . . , xn] is an
ideal if it satisfies the following conditions:
1. 0 2 I,2. a, b 2 I ) a+ b 2 I and3. if a 2 I and b 2 K[x1, . .
. , xn], then ab 2 I.
A finitely generated ideal, is the one that can be generated by
a finite set of polynomials{f1, . . . , fs}, called basis, defined
as
hf1, . . . , fsi = {f | f = g1f1 + · · ·+ gsfs, gi 2 K[x1, . . .
, xn]}. (3.6)
The concept of an ideal is similar to that of a vector subspace
in linear algebra, while the notionof a basis of an ideal is analog
to the basis of a vector space [18]. The relationship
betweenfinitely generated ideals and varieties can be stated in the
following lemma:
Lemma 1. Let I ⇢ K[x1, . . . , xn] be an ideal and V(I) ⇢ Kn be
an a�ne variety given by
V(I) = {(a1, . . . , an) 2 Kn : f(a1, . . . , an) = 0 8f 2 I}.
(3.7)
If I = hf1, . . . , fsi, then V(I) = V(f1, . . . , fs).
– 31 –
-
3 Computational Commutative Algebra
The proof of this lemma can be found in [18, pp. 79]. This
result guarantees that the a�nevariety of a system of polynomial
equations is exactly the same as the variety of the ideal
gener-ated by the polynomials of such a system. The following lemma
further explores the relationshipbetween varieties and ideals, by
connecting the concepts of bases and varieties of an ideal:
Lemma 2. If f1, . . . , fs and g1, . . . , gt are bases of the
same ideal I ⇢ K[x1, . . . , xn], i.e.hf1, . . . , fsi = hg1, . . .
, gti it follows that V(f1, . . . , fs) = V(g1, . . . , gt).
The proof of this lemma can be found in [18, pp. 33]. This
result is interesting, because it statesthat di↵erent bases of the
same ideal are associated with the same variety. Using this
lemma,the problem of finding the roots of a system of polynomial
equations can be simplified by findingan appropriate basis which
generates the same ideal (and thus, the same variety).
3.2 Gröbner Basis
Next, we introduce the concept of ordering, which proves to be
very useful at characterizingmonomials, polynomials, ideals and
varieties [18, 20].
Definition 14. (Monomial ordering) A monomial ordering is any
relation > on a set ofmonomials {x↵ | ↵ 2 Zn�0}, which satisfies
the following conditions:
1. > is a total ordering on Zn�0, which means that only one
of the three possibilities of↵,� 2 Zn�0 should be true, i.e.
x↵ > x� , x↵ = x� , x� > x↵, (3.8)
2. if ↵ > � and � 2 Zn�0, then ↵+ � > � + �, and3. every
nonempty subset of Zn�0 has a smallest element under >.
Some important monomial orderings are the lexicographic
ordering, or lex order, the gradedlexicographic ordering, or grlex
order, and the graded reverse lexicographic order, or grevlexorder
[20, 28]. These orderings are defined as follows:
Definition 15. (Lex order, Grlex order, Grevlex order). Let ↵,�
2 Zn�0. We denote1. Lexicographic ordering: ↵ >
lex
� if in the vector di↵erence ↵� � = [↵1 � �1, . . . ,↵n � �n]the
leftmost nonzero entry is positive.
2. Graded Lexicographic ordering: ↵ >grlex
� if |↵| =Pni=1 ↵i > |�| =
P
n
i=1 �i, or |↵| = |�|and ↵ >
lex
�.
3. Graded reverse lexicographic ordering: ↵ >grevlex
� if |↵| > |�|, or |↵| = |�| and therightmost nonzero entry
of ↵� � is negative.
With the concept of ordering, we can use the following
terminology to describe the structure ofa polynomial:
Definition 16. (Multidegree, Leading Coe�cient, Leading
Monomial, Leading Term).Let f =
P
↵
a↵
x↵ be a nonzero polynomial in K[x1, . . . , xn], and > be a
monomial ordering, then
– 32 –
-
3.2 Gröbner Basis
1. Multidegree:
multideg(f) = max(↵ 2 Zn�0 : a↵ 6= 0), (3.9)
where the maximum is taken with respect to the ordering
>.
2. Leading Coe�cient:
LC(f) = amultideg(f) 2 K. (3.10)
3. Leading Monomial:
LM(f) = xmultideg(f) 2 K[x1, . . . , xn]. (3.11)
4. Leading Term:
LT(f) = LC(f) · LM(f) 2 K[x1, . . . , xn]. (3.12)
Definition 17. (Least common multiple, S-polynomial) Let f, g 2
K[x1, . . . , xn] be twopolynomials, with multideg(f) = ↵ and
multideg(g) = �, and then let � = (�1, . . . , �n) with�i
= max(↵i
,�i
). The least common multiple of LM(f) and LM(g) is given by
l.c.m(LM(f),LM(g)) = x� . (3.13)
The S-polynomial of f and g is defined as
S(f, g) =l.c.m(LM(f),LM(g))
LT(f)f � l.c.m(LM(f),LM(g))
LT(g)g. (3.14)
Using these concepts, it is possible to generalize the algorithm
for dividing a polynomial byanother polynomial in one variable to
an algorithm for dividing a polynomial by a set of polyno-mials in
several variables. The main idea is that given F = (f1, . . . ,
fs), an s-tuple of polynomialsin K[x1, . . . , xn], and a monomial
ordering >, a polynomial f 2 K[x1, . . . , xn] can be written
as
f = a1f1 + · · ·+ asfs + fF, (3.15)
where a1, . . . , as, fF 2 K[x1, . . . , xn], and either fF = 0
or fF is a linear combination of monomi-
als, none of which is divisible by any of LT(fi
), . . . ,LT(fs
). In this context, fF
, the remainderof f on division by F, is called normal form of f
. Algorithm 3.1 shows how to compute
a1, . . . , as, fF
. A proof for this algorithm can be found in [18, pp. 64].
We are interested in using these concepts to test if a
polynomial f 2 K[x1, . . . , xn] shares zeroswith a set of
polynomials {f1, . . . , fs} ⇢ K[x1, . . . , xn]. This problem can
be formulated as deter-mining if f 2 I, with I = hf1, . . . , fsi.
Geometrically speaking, using Lemma 1, this is equivalentas
deciding whether V(I) lies on the variety of V(f). This is known in
the literature as the IdealMembership problem [18]. We can
intuitively understand this problem with an example. Letus assume,
that we have a system of polynomial equations F = 0, and an
polynomial equationf = 0 such that f 2 F. We are interested in
determining if f = 0 has common solutions toF \ {f} = 0. Finding a
systematic way to knowing if such solutions exist would be helpful
todetermine if a system of polynomial equations has a solution. The
Ideal Membership problemcan be solved using a special kind of bases
called Gröbner Bases. They can be characterized
– 33 –
-
3 Computational Commutative Algebra
Algorithm 3.1: DivAlg(·) Division Algorithm in K[x1, . . . ,
xn](Taken from [18])input : f1, . . . , fs, f
output: a1, . . . , as, fF
Initialize a1 := 0; . . . ; as := 0; fF
:= 0 and p := fwhile p 6= 0 do
i := 1divocc := falsewhile i s and divocc := false do
if LT(fi
) divides LT(p) then
ai
:= ai
+ LT(p)LT(fi
)
p := p� LT(p)LT(fi
)fidivocc := true
elsei := i+ 1
if divocc = false then
fF
:= fF
+ LT(p)p := p� LT(p)
using the following Theorem:
Theorem 2. (Gröbner Basis) Given I ⇢ K[x1, . . . , xn] an
ideal, and a fixed monomial order,the following statements are
equivalent
1. G = {g1, . . . , gt} is a GB for I.2. Let LT(I) be the set of
leading terms of elements of I and hLT(I)i the ideal generated
by
those elements, then
LT(I) = hLT(gi
), . . . ,LT(gt
)i. (3.16)
3. (Ideal Membership) For f 2 K[x1, . . . , xn], f 2 I i↵ fG =
0, i.e. the remainder of f ondivision by G is zero.
4. (Buchberger’s Criterion) S(gi
, gj
)G
= 0 for all pairs of polynomials gi
, gj
2 G.
Proof. (Taken from [18])
1 , 2. Statement 2 is the definition of a GB. To prove that this
is a basis for I, i.e.
I = hg1, . . . , gti = {h | h = h1g1 + · · ·+ htgt, hi 2 K[x1, .
. . , xn]}, (3.17)
Dickinson’s Lemma and the Hilbert Basis Theorem are required.
These results can be found in[18, pp. 71 and pp. 76,
respectively].
1 , 3. Let’s suppose that f 2 I. Since G is a basis for I, f can
be written as f = h =h1g1+ · · ·+htgt. Comparing with Eq. (3.15) it
follows that fG = 0. Conversely, if fG = 0, andG is a GB, it is
trivial to see that f lies in I.
– 34 –
-
3.2 Gröbner Basis
3 , 4. Let’s assume that G is a GB of I. Using Eq. (3.14), we
can write the S-polynomial as
S(gi
, gj
) =l.c.m(LM(g
i
),LM(gj
))
LT(gi
)gi
� l.c.m(LM(gi),LM(gj))LT(g
j
)gj
= hi
gi
+ hj
gj
, (3.18)
which means that S(gi
, gj
) 2 I. By Statement 3, this means that S(gi
, gj
)G
= 0. For the sake
of brevity, the proof of the converse statement, i.e. starting
from S(gi
, gj
)G
= 0, show that G isa GB of I, is omitted here, but it can be
found in [18, pp. 85].
The third statement of this theorem suggests a method for
transforming an arbitrary basis F ofan ideal I into a Gröbner
basis. Such a method was first proposed by Bruno Buchberger in
1965[19], and is shown in Algorithm 3.2. Nevertheless, this
algorithm is computationally expensivesince it involves the
computation of all pairs of S-polynomials, and therefore heavily
depends ona good selection of the monomial ordering [29, 30]. A
weak criterion to avoid the computationof some useless
S-polynomials, i.e. a criterion to determine beforehand if an
S-polynomial isreduced to zero, and thus, avoid computing the
remainder using the generalized division algo-rithm, is presented
in the following proposition:
Proposition 1. Given a finite set G ⇢ K[x1, . . . , xn] and
polynomials f, g 2 G, such thatl.c.m(LT(f),LT(g)) 6= LT(f),LT(g),
then S(f, g)G = 0.
Algorithm 3.2: Groebner(·) Buchberger’s Algorithm(Taken from
[18])input : F = {f1, . . . , fs}, a basis for I, and >, a
monomial orderingoutput: G = {g1, . . . , gt} a GB for IInitialize
G := Fgeq := falsewhile geq = false do
G0:= G
for each pair {p, q}, p 6= q in G0 doif l.c.m(LT(p),LT(q)) 6=
LT(p),LT(q) then
S := S(p, q)G
0
, using DivAlg(S(p, q),G0)
if S 6= 0 thenG := G [ {S}
if G0= G then
geq := true
The proof for this proposition can be found in [18, pp. 104].
More recent alternatives toBuchberger’s algorithm like the F4
algorithm by Faugére [30], include more sophisticated criteriato
avoid the computation of useless S-polynomials, as well as linear
algebra techniques to improvethe performance of the generalized
division algorithm. Such methods are included in mostcommercial
computer algebra systems. However it should be noted that the
performance of thesealgorithms heavily depends on K, i.e. the field
in which the coe�cients of the polynomials lie,and a good selection
of a monomial ordering. It has been experimentally shown that
computinga GB with respect to the grevlex order is usually faster
than computing a GB with respect to thelex order or the grlex order
[31]. The Gröbner bases described by Theorem 2 are not unique,
– 35 –
-
3 Computational Commutative Algebra
Algorithm 3.3: rGB(·) Reduced Gröbner Basis(Taken from
[18])input : G = {g1, . . . , gt}, a GB basis for Ioutput: G
0= {p1, . . . , ps} a reduced GB for I
for gi
2 G dogi
:= 1LC(gi
)gi
for gi
2 G doif g
i
2 hLT(G \ {gi
})i thenG := G \ {g
i
}for g
i
2 G dopi
:= gi
G\{gi
}, using DivAlg(gi
,G \ {gi
})return G
0= {p1, . . . , ps}
i.e. for an ideal I, several GBs can be found. This leads to the
definition of reduced Gröbnerbases. In addition to the conditions
above, further requirements for a GB to be a reduced GBare that
LC(g) = 1, and no monomial of g lies in hLT(G \ {g})i for all g 2
G. A reduced GB isunique for a given monomial order. A full proof
of this conditions lies beyond the scope of thisthesis, but it can
be found in [18, pp. 92]. An algorithm for computing a reduced GB
is shownin Alg. 3.3.
3.3 Hilbert’s Nullstellensatz
One of the most important applications of GBs is determining if
a system of polynomial equa-tions has a solution. This can be
interpreted geometrically as knowing whether the varietygenerated
by such a system is non-empty. Hilbert’s Nullstellensatz answers
this question.
Theorem 3. ((Weak) Hilbert’s Nullstellensatz) Let K be an
algebraically closed field andlet I ⇢ K[x1, . . . , xn] be an ideal
satisfying V(I) = ;. Then I = K[x1, . . . , xn].A proof by
induction can be found in [18, pp. 170]. A criterion for
determining the existence ofcommon zeros of polynomial equations
can be expressed as follows:
Corollary 1. Let K be an algebraically closed field, and {f1, .
. . , fs} ⇢ K[x1, . . . , xn] be a set ofpolynomials. Then if the
reduced Gröbner basis of the ideal generated by these polynomials
withrespect to any ordering is G = {1}, they do not have a common
zero.
Proof. (Taken from [18])
The ideal generated by G is I = h1i, which can be written as
I = {h | h = hi
· 1, hi
2 K[x1, . . . , xn]}, (3.19)
which means that every polynomial hi
2 K[x1, . . . , xn] is also an element of I, and thereforeK[x1,
. . . , xn] ⇢ I. Since by definition I ⇢ K[x1, . . . , xn], this
implies that I = K[x1, . . . , xn], andtherefore, by the weak
Nullstellensatz, V(I) = ;. From Lemma 1, it follows that V(I) =
V(1).Since G is the basis of the ideal generated by {f1, . . . , fs
2 K[x1, . . . , xn], i.e. I = hf1, . . . , fsi =h1i. Using Lemma 2,
V(f1, . . . , fs) = V(1) = ;, and therefore, the polynomials {f1, .
. . , fs} donot have common zeros in K.
– 36 –
-
3.3 Hilbert’s Nullstellensatz
The following proposition allows us to generalize the above
result to the case of non-reduced GBs:
Proposition 2. Let G be a GB for the ideal I with respect to a
fixed monomial ordering >and let c 2 K[x1, . . . , xn] be a
constant polynomial. The reduced GB G0 of I with respect to
themonomial ordering > is given by G
0= {1} i↵ c 2 G.
Proof. First, we show that if c 2 G, it follows that G0 = {1}.
We construct G0 using Algorithm3.3. Without loss of generality, the
GB can be written as G = {f1, . . . , c, . . . , fs}. Dividing
allpolynomials in G by their leading coe�cients results in
G0=
⇢
f1LC(f1)
, . . . , 1, . . . ,fs
LC(fs
)
�
. (3.20)
It is easy to see that h1i = K[x1, . . . , xn]. Since 1 2 G0 and
all polynomials fiLC(fi
) 2 K[x1, . . . , xn],it follows that fiLC(f
i
) 2D
LT(G0 \n
f
i
LC(fi
)
oE
for all fi
’s in G not equal to c, and hence, all of
these polynomials are discarded from G0. Finally, it follows
that the only polynomial remaining
in G0is 1, and therefore, G
0= {1}.
Now, it we show that if G0= {1}, it follows that c 2 G. Assuming
that G0 = {1} implies that
the polynomial fiLC(fi
) , with fi 6= 1 in the original non-reduced GB G, lies in the
ideal generatedby the leading monomials of the polynomials in G
minus the leading monomial of the leading
monomial of fi
, i.e. fiLC(fi
) 2D
LT(G0 \n
f
i
LC(fi
)
oE
. This is true for all polynomials fi
6= 1 2 G.Adding these polynomials to G
0results in
G0=
⇢
1,f1
LC(f1), . . . ,
fs
LC(fs
)
�
. (3.21)
Finally, multiplying every polynomial in G0by its leading
coe�cient, which is a constant in K,
results in G = {c, f1, . . . , fs}, with c a constant. This
means that c 2 G.
It follows from this proposition, that if a constant polynomial
is an element of the GB of theideal generated by a system of
polynomial equations, the a�ne variety of such a system is
empty,and therefore, the system has no solution.
We are now interested in determining whether an a�ne variety
V(I) is a finite set. This questioncan be answered knowing the
dimension of this variety, but a more thorough discussion aboutthe
concept of dimension of a variety lies beyond the scope of this
chapter. A more formaltreatment of this concept is provided in
Appendix C.
Definition 18. (Zero-dimensional Ideal) An ideal I ⇢ K[x1, . . .
, xn] is called zero-dimensionalif the a�ne variety V(I) is a
finite set.
It follows from the previous definition that a system of
polynomial equations has only a finitenumber of solutions if the
ideal generated by those polynomials is zero-dimensional. The
fol-lowing theorem provides a method for determining if a system of
equations has finitely manysolutions:
Theorem 4. Let V(I) ⇢ Kn be an a�ne variety and I ⇢ K[x1, . . .
, xn] be an ideal, > be agraded monomial ordering in K[x1, . . .
, xn], and G a GB for I with respect to such ordering. Iis
zero-dimensional i↵ for each i, 1 i n, there is some m
i
� 0 such that xmii
= LT(g) forsome g 2 G.
– 37 –
-
3 Computational Commutative Algebra
The proof of this theorem is omitted here, since it uses results
from the formal definition ofdimension of a variety, but it is
included in Appendix C. The following result proposes a
quan-titative estimate of the number of solutions of a system of
polynomial equations:
Proposition 3. Let I ⇢ K[x1, . . . , xn] be an ideal in an
algebraically closed field with GBG = {g1, . . . , gt} such that
LT(gi) = xmi
i
. Then it follows that the variety V(I) contains at mostm1 ⇥m2 ⇥
· · ·⇥mn points.The proof of this proposition can be found in
Appendix C.
3.4 Elimination Theory
Given a system of polynomial equations F ⇢ K[x1, . . . , xn],
elimination theory allows us touse the theoretical framework of
commutative algebra to find the solutions to F = 0 in twosteps:
1. (Elimination Step) Find a consequence gt
(xn
) = 0 2 K[x1, . . . , xn] of the original equationsin F, which
involves only x
n
, i.e. eliminates all other variables x1, . . . , xn�1 from the
system.
2. (Extension Step) Once gt
= 0 is solved, determine values of xn
that could extend thesesolutions to solutions of the original
system F = 0.
In order to generalize these ideas, the following definition is
required:
Definition 19. (Elimination Ideal) Let I = hf1, . . . , f2i 2
K[x1, . . . , xn] be an ideal. The l-thelimination ideal I
l
is the ideal given by
Il
= I \K[xl+1, . . . , xn] (3.22)
The l-th elimination ideal consists of all consequences of f1 =
· · · = fs = 0 which eliminate thevariables x1, . . . , x
l
, and is an ideal of K[xl+1, . . . , xn]. Therefore, the
elimination of variables
x1, . . . , xl
consists of finding nonzero polynomials in Il
. The following theorem allows us to dothat systematically:
Theorem 5. (Elimination Theorem) Let I ⇢ K[x1, . . . , xn] be an
ideal, and G its GB withrespect to the lex order. Then for every 0
l n the set
Gl
= G \K[xl+1, . . . , xn] (3.23)
is the GB of the l-th elimination ideal Il
The proof of this theorem can be found in [18, pp. 117]. Using
this result, a strategy for solv-ing systems of polynomial
equations f1 = · · · = fs = 0 using GBs can be found in
Algorithm3.4. In this way, solving systems of polynomial equations
using the formalisms of GBs repre-sents a natural generalization of
the Gaussian algorithm for solving systems of linear
equations[18].
Compared to the grlex, the grevlex and other monomial orderings,
computation of GBs withrespect to the lex order is often more
di�cult. Therefore, a usual strategy for finding the zerosof a
system of polynomial equations that generate a zero-dimensional
ideal is to compute theGB of such an ideal with respect to a more
e�cient monomial ordering then to convert this GBinto a
lexicographic GB [29,33]. Among the most popular
ordering-conversion methods for GBs
– 38 –
-
3.4 Elimination Theory
Algorithm 3.4: PolySolve!(·) Solutions of a system of
polynomials(Taken from [18])input : F = {f1 = 0, . . . , fs = 0}, a
system of polynomial equations in K[x1, . . . , xn]output: Sols the
set solutions of F
Assume: I = hf1, . . . , fsi is zero-dimensional.Sols := ;G :=
Groebner(F, >
lex
), the GB of hf1, . . . , fsi with respect to the lex orderG :=
rGB(G), the reduced GB of hf1, . . . , fsir := Roots(g
n
), the roots of generator in xn
by applying one-variable techniques (includingnumerical methods
such as Newton-Raphson [18,32])Sols := Sols [ {r}for i := 1 : n� 1
do
Compute r := Roots(gn�i) the roots of the generator in xi
applying back substitution.
Sols := Sols [ {r}return Sols
are the Gröbner Walk [33] and the FGLM [31] algorithms. These
methods are built-in in mostcommercial computer algebra systems.
Worth mentioning is the work of Jean Charles Faugère,who has not
only contributed with the development of the F4 and FGLM
algorithms, but alsowrote the standard Groebner package in
MAPLE.
– 39 –
-
Tarkus Belief Propagation
4Algebraic Formulation of BP
Using the framework of computational commutative algebra
presented in Chapter 3, it is possi-ble to express an alternative
formulation of the Belief Propagation algorithm. In order to do
sowe need to formalize the notion of the system of polynomials that
represent the message passingequations:
Definition 20. (Associated set of polynomials of a MN) Given a
MN M and its systemof MPE defined by the BP algorithm, i.e. the
message updates from Eq. (2.32) and the nor-malization constraints
from Eq. (2.33) (Section 2.5), its associated set of polynomials
(ASP)FM ⇢ K[µ1!2(x1), . . . , µ2!1(xn), Z1!2, . . . , Z2!1] is
given as
FM =
(
µi!j(xj)� Zi!j
P
x
i
i,j
(xi
, xj
)Q
k
µk!i(xi)
P
x
j
µi!j(xj)� 1 : 8 {i, j, k} 2 G, xi 2 val(i)
)
(4.1)
It is important to remark that the variables of the polynomials
in FM are the messages µi!j(xj)and the normalization constants
Z
i!j . To avoid cluttered notation, the shorthand notationF = FM
will be used, whenever it is clear that the set is associated with
the MN M. As anexample, the ASP of the MN of the tree shown in
Figure 2.1, whose graph G
BTree
is given in
– 41 –
-
4 Algebraic Formulation of BP
Eq. (2.13), with binary variables val(X) = {x, x̄} 8Xi
2 X is given as
F
Btree
=
8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:
µ1!2(x) � Z1!2⇣
X1,X2(x, x) +
X1,X2(x̄, x)
⌘
µ1!2(x̄) � Z1!2⇣
X1,X2(x, x̄) +
X1,X2(x̄, x̄)
⌘
µ2!1(x) � Z2!1⇣
X1,X2(x, x)µ3!2(x)µ4!2(x) +
X1,X2(x, x̄)µ3!2(x̄)µ4!2(x̄)
⌘
µ2!1(x̄) � Z2!1⇣
X1,X2(x̄, x)µ3!2(x)µ4!2(x) +
X1,X2(x̄, x̄)µ3!2(x̄)µ4!2(x̄)
⌘
µ2!3(x) � Z2!3⇣
X2(x)
X2,X3(x, x)µ1!2(x)µ4!2(x) +
X2,X3(x̄, x)µ1!2(x̄)µ4!2(x̄)
⌘
µ2!3(x̄) � Z2!3⇣
X2(x)
X2,X3(x, x̄)µ1!2(x)µ4!2(x) +
X2,X3(x̄, x̄)µ1!2(x̄)µ4!2(x̄)
⌘
µ2!4(x) � Z2!4⇣
X2(x)
X2,X4(x, x)µ1!2(x)µ3!2(x) +
X2,X4(x̄, x)µ1!2(x̄)µ3!2(x̄)
⌘
µ2!4(x̄) � Z2!4⇣
X2(x)
X2,X4(x, x̄)µ1!2(x)µ3!2(x) +
X2,X4(x̄, x̄)µ1!2(x̄)µ3!2(x̄)
⌘
µ3!2(x) � Z3!2⇣
X2,X3(x, x) +
X2,X3(x, x̄)
⌘
µ3!2(x̄) � Z3!2⇣
X2,X3(x̄, x) +
X2,X3(x̄, x̄)
⌘
µ4!2(x) � Z4!2⇣
X2,X4(x, x) +
X2,X4(x, x̄)
⌘
µ4!2(x̄) � Z4!2⇣
X2,X4(x̄, x) +
X2,X4(x̄, x̄)
⌘
µ1!2(x) + µ1!2(x̄) � 1µ2!1(x) + µ2!1(x̄) � 1µ2!3(x) + µ2!3(x̄) �
1µ2!4(x) + µ2!4(x̄) � 1µ3!2(x) + µ3!2(x̄) � 1µ4!2(x) + µ4!2(x̄) �
1
9>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>=
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>;
.
(4.2)
The rest of this chapter is structured as follows: In Section
4.1, we revisit the condition ofconvergence of the BP in terms of
determining the dimension and cardinality of VM, the a�nevariety
generated by the associated set of polynomials of a MN. In Section
4.2 an equivalentalternative to the BP algorithm using Gröbner
basis is proposed.
4.1 Convergence of the (L)BP algorithm
In Chapter 2, it was shown that convergence of the iterative
(L)BP algorithm required the mes-sage updates calculated with Eq.
(2.37) to converge to the normalized messages after enough
iterations, i.e. µ(k)X
i
!Xj
(xj
) = µX
i
!Xj
(xj
). This is equivalent to the following proposition:
Proposition 4. Let M be a MN and F be the ASP of M. The (L)BP
algorithm converges toa solution (not necessarily the true
marginals), if the system of equations defined by the set Fas
vector (i.e. each polynomial f
i
2 F represents the i-th component of vector F) vanishes.
Proof. From the message update rule from Eq. (2.37), we have
that
µ(k)X
i
!Xj
(xj
)� Z(k)i!j
X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
l
2Nb(Xi
)\{Xj
}
m(k�1)X
l
!Xi
(xi
) = 0. (4.3)
Using Eq. (2.36), we have that m(k�1)X
l
!Xi
(xi
) = 1Z
(k�1)l!i
µ(k�1)X
l
!Xi
(xi
), which substituting in the above
equation results in
µ(k)X
i
!Xj
(xj
)� Z(k)i!j
Y
l
1
Z(k�1)l!i
X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
l
2Nb(Xi
)\{Xj
}
µ(k�1)X
l
!Xi
(xi
) = 0.
(4.4)
– 42 –
-
4.1 Convergence of the (L)BP algorithm
By construction, the messages computed with Algorithm 2.1
satisfy that
X
x
j
2val(Xj
)
µ(k)X
i
!Xj
(xj
)� 1 = 0. (4.5)
Using the convergence criterium µ(k)X
i
!Xj
(xj
) = µX
i
!Xj
(xj
), we can rewrite Eq. (4.4) and Eq.
(4.5) as
µX
i
!Xj
(xj
)� Zi!j
X
x
i
2val(Xi
)
X
i
,X
j
(xi
, xj
)Y
X
k
2Nb(Xi
)\{Xj
}
µX
k
!Xi
(xi
) = 0
X
x
j
2val(Xj
)
µX
i
!Xj
(xj
)� 1 = 0, (4.6)
where Zi!j = Z
(k)i!j
Q
l
1
Z
(k�1)l!i
. This condition is valid for every message. Comparing Eq.
(4.6)
with Eq. (4.1) implies that convergence of the (L)BP algorithm
is ensured if F = 0 has asolution.
It is easy to see, that using the convergence condition from
Proposition 4, we can use compu-tational commutative algebra to
analyze conditions for convergence of the (L)BP algorithm.
Acriterion for convergence can be stated as follows:
Lemma 3. Let F be the ASP of M, a MN, and GF
be the reduced Gröbner basis of I, theideal generated by F with
respect to an arbitrary monomial ordering. The BP algorithm doesnot
converge to a solution if:
1. GF
= {1}, i.e. there is no solution to F = 0,2. I is not
zero-dimensional, i.e. there are infinitely many solutions to F =
0.
Proof. Let V(I) be the a�ne variety of I. Since I = hf1, . . . ,
fDi by construction, then, byLemma 1, V(I) = V(f1, . . . , fD) for
fi 2 F, i.e. the set of solutions of the system F = 0.Assuming
thatG
F
= {1}means thatV(I) = ;, which implies that the system F = 0 do
not havea solution, and thus the convergence condition from
Proposition 4 is not satisfied. The resultfollows directly from
Corollary 1 and is a consequence of the weak Nullstellensatz
(Theorem 3).
If I is not zero-dimensional then, by definition, the variety
V(I) is not a finite set, and thus,F = 0 has infinitely many
solutions. This implies that the BP algorithm doesn’t converge to
afixed point.
Using the above Lemma and properties of the GB, a method for
determining the convergenceof the BP algorithm can be expressed in
the following Theorem:
Theorem 6. (Convergence of BP) Let F ⇢ K[µ1, . . . , µD] be the
ASP of an MN M, andG
F
be the reduced GB of I, the ideal generated by the polynomials
in F, with respect to somemonomial ordering >. The BP algorithm
converges to a solution if for every variable µ
i
thereexists an element g
j
2 GF
whose leading term is LT(gj
) = µmii
, and the number of fixed pointsis bounded by m1 ⇥ · ·
·⇥mD.Proof. If there exists an ordering > such that the leading
term of some g
j
2 GF
can be writtenas LT(g
j
) = µmii
, means that hLT(I)i = µmii
. From Theorem 4 follows that I is zero-dimensional.Furthermore,
it also follows that G
F
6= {1}, since GF
is a reduced GB. Hence, by Lemma 3, thismeans that the
conditions for existence of solutions of F = 0 are satisfied. From
Proposition 3,
– 43 –
-
4 Algebraic Formulation of BP
it follows that the variety V(I) has at most nsols
solutions, with 1 nsols
m1 ⇥ · · · ⇥ mD.Hence, BP converges to a solution.
Let’s illustrate this results with a binary Markov chain with 3
nodes, MMChain3, whose graph
is given by Eq. (2.14), (Fig. 2.1). The ASP of MMChain3 is given
by
F =
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
µ1!2(x)� Z1!2 ( X1,X2(x, x) + X1,X2(x̄, x))µ1!2(x̄)� Z1!2 (
X1,X2(x, x̄) + X1,X2(x̄, x̄))µ2!1(x)� Z2!1 ( X1,X2(x, x)µ3!2(x) +
X1,X2(x, x̄)µ3!2(x̄))µ2!1(x̄)� Z2!1 ( X1,X2(x̄, x)µ3!2(x) +
X1,X2(x̄, x̄)µ3!2(x̄))µ2!3(x)� Z2!3 ( X2,X3(x, x)µ1!2(x) +
X2,X3(x̄, x)µ1!2(x̄))µ2!3(x̄)� Z2!3 ( X2,X3(x, x̄)µ1!2(x) +
X2,X3(x̄, x̄)µ1!2(x̄))µ3!2(x)� Z3!2 ( X2,X3(x, x) + X2,X3(x,
x̄))µ3!2(x̄)� Z3!2 ( X2,X3(x̄, x) + X2,X3(x̄, x̄))µ1!2(x) +
µ1!2(x̄)� 1µ2!1(x) + µ2!1(x̄)� 1µ2!3(x) + µ2!3(x̄)� 1µ3!2(x) +
µ3!2(x̄)� 1
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
(4.7)
If we let the variables be lexicographically ordered as
µ2!3(x̄) >lex
µ2!3(x) >lex
µ2!1(x̄) >lex
µ2!1(x) >lex
Z2!3 >lex
Z3!1
>lex
Z2!1 >lex
Z1!2 >lex
µ3!2(x̄) >lex
µ3!2(x) >lex
µ1!2(x̄) >lex
µ1!2(x), (4.8)
the GB of the ideal generated by F with respect to the lex order
is given by
GF
=
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
µ1!2(x)� q1(q1+q2)µ1!2(x̄)� q2(q1+q2)µ3!2(x)�
q11(q11+q12)µ3!2(x̄)� q12(q11+q12)Z1!2 � 1(q1+q2)Z3!2 �
1(q11+q12)Z2!1 � q11+q12(q11q3+q11q5+q12q4+q12q6)Z2!3 �
q1+q2(q1q7+q1q9+2q2q8)µ2!1(x)�
q11q3+q12q4(q11q3+q11q5+q12q4+q12q6)µ2!1(x̄)�
q11q5+q12q6(q11q3+q11q5+q12q4+q12q6)µ2!3(x)�
q1q7+q2q8(q1q7+q1q9+2q2q8)µ2!3(x̄)� q1q9+q2q8(q1q7+q1q9+2q2q8)
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
, (4.9)
– 44 –
-
4.1 Convergence of the (L)BP algorithm
where the factors q1, . . . , q12 are calculated as
q1 = X1,X2(x, x) + X1,X2(x̄, x)
q2 = X1,X2(x, x̄) + X1,X2(x̄, x̄)
q3 = X1,X2(x, x)
q4 = X1,X2(x, x̄)
q5 = X1,X2(x̄, x)
q6 = X1,X2(x̄, x̄)
q7 = X2,X3(x, x)
q8 = X2,X3(x, x̄)
q9 = X2,X3(x̄, x)
q10 = X2,X3(x̄, x̄)
q11 = X1,X2(x, x) + X1,X2(x̄, x)
q12 = X1,X2(x, x̄) + X1,X2(x̄, x̄). (4.10)
We can see that there is a polynomial g 2 G for each variable
µi!j(k), whose leading term can
be written in form LT(g) = (µi!j(k))
m, with m = 1 for all variables. It follows from Theorem 6that
the BP algorithm for a MN with tree nodes converges to a single
solution. We can expandthis result to a general finite Markov Chain
as follows:
Theorem 7. The BP algorithm converges to a solution for the MN M
being a Markov Chain.
X1 X2 . . . XN�1 XN
Figure 4.1: Graph of a Markov Chain with N nodes.
Proof. Let M be the MN of a Markov Chain with N nodes shown in
Figure 4.1. We consider F,the ASP of M with RVs that have m states,
to be a set of polynomials that depends only on themessage
variables. The normalization constants are absorbed by the
potentials
j,l
(·, ·)’s, sincethe normalization constants were introduced to
avoid numerical instability, and are, therefore,not strictly
necessary. F can be written as
F =
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
µ1!2(x1)�P
m
i=1 1,2(xi, x1)...
µ1!2(xm)�P
m
i=1 1,2(xi, xm)...
µi!i+1(xj)�
P
m
l=1 i,i+1(xl, xj)µi�1!i(xl)...
µN!N�1(xj)�
P
m
i=1 N,N�1(xi, xj)...
µi+1!i(xj)�
P
m
l=1 i+1,i(xl, xj)µi+2!i+1(xl)...
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
. (4.11)
– 45 –
-
4 Algebraic Formulation of BP
Let the message variables be lexicographically ordered as
µN�1!N (xm) >lex · · · >lex µN�1!N (x1) >lex · · ·
>lex µ2!1(xj)>
lex
µN!N�1(xm) >lex · · · >lex µ1!2(x1). (4.12)
Given the above ordering of the message variables, it can be
seen, that we can relabel eachmessage variable and potential by
assigning them a unique integer ID. By doing so, in the restof this
proof, we use the shorthand notation µ
i
for representing a message variable µj!l(k) and
i
to represent a potential j,l
(·, ·). In the ASP described in Eq. (4.11) we identify two
di↵erentforms of polynomials, namely,
I. µi