Version 1 History: Wogas 2, U. Torino, CREST-SBM 2 · 2019. 11. 20. · History: Wogas 2, U. Torino, CREST-SBM 2 Algebra of reversible Markov chains Giovanni Pistone Maria Piera Rogantin

Version 1History: Wogas 2, U. Torino, CREST-SBM 2

Algebra of reversible Markov chains

Giovanni Pistone · Maria Piera Rogantin

November 20, 2019

Abstract We prove that the Kolmogorov’s conditions for reversibility define a toric ideal. We derive new parameterizations forreversible Markov chains.

1 Introduction

A transition matrix Pv→w, v,w ∈V , satisfies the detailed balance conditions if κ(v)> 0, v ∈V , and

κ(v)Pv→w = κ(w)Pw→v, v,w ∈V.

It follows that π(v)∝ κ(v) is an invariant probability and the Markov chain Xn, n= 0,1, . . . , with invariant probability π and transitionmatrix Pv→w, v,w ∈V , has reversible two-step joint distribution

P(Xn = v,Xn+1 = w) = P(Xn = w,Xn+1 = v) , v,w ∈V, n≥ 0.

Reversible Markov Chains (MCs) are important in Statistical Physics, e.g. in the theory of entropy production, and in appliedprobability, e.g. the simulation method Monte Carlo Markov Chain (MCMC).

In Section 2 we recall some basic notion from Dobrushin et al (1988), (Strook 2005, Ch 5), Diaconis and Rolles (2006), Hastings(1970), Peskun (1973), Liu (2008).

In Section 3 and discuss the algebraic theory that results from the detailed balance conditions. The exposition is intended for bothalgebrists and probabilists, because the results pertain to the area of Algebraic Statistics, see e.g. Pistone et al (2001), Drton et al(2009), Gibilisco et al (2009). This justifies the abundance of elementary examples and tutorial sections.

2 Background

2.1 Reversible process and the graph of 2-distributions

Let V be a finite state space with elements v,w . . . and #V = N. The stochastic process (Xn)n≥0 with state space V is 2-reversible if

P(Xn = v,Xn+1 = w) = P(Xn = w,Xn+1 = w) , v,w ∈V,n≥ 0.

In particular, the process is 1-stationary: by summing over w ∈V , we have:

P(Xn = v) = P(Xn+1 = v) = π(v), v ∈V,n≥ 0.

Supporting institutions. G. Pistone: DIMAT Politecnico di Torino, Collegio Carlo Alberto

G. PistoneCollegio Carlo AlbertoVia Real Collegio, 3010024 Moncalieri, ItalyE-mail: [email protected]

M.P. RogantinDIMA Universita di GenovaVia Dodecaneso, 3516146 Genova, ItalyE-mail: [email protected]

arX

iv:1

007.

4282

v1 [

mat

h.ST

] 2

4 Ju

l 201

0

2 G. Pistone, M.P. Rogantin

Let V1 be the set of all singleton of V and V2 be the set of all subset of V of cardinality 2. The following parametrization of the2-dimensional distributions has been used in Diaconis and Rolles (2006):

θ{v} = P(Xn = v,Xn+1 = v) , {v} ∈V1,

θ{v,w} = P(Xn = v,Xn+1 = w)+P(Xn = w,Xn+1 = v) = 2P(Xn = v,Xn+1 = w) , {v,w} ∈V2.

The number of parameters is N +(N

2

), i.e.

(N+12

); moreover it holds

1 = ∑v,w∈V

P(Xn = v,Xn+1 = w) = ∑{v}∈V1

θ{v}+ ∑{v,w}∈V2

θ{v,w},

hence θ = (θ{v} : {v} ∈V1,θ{v,w} : {v,w} ∈V2) belongs to the simplex ∆(V1∪V2).We assume now we are given the undirected connected graph G = (V,E ) such that P(Xn = v,Xn+1 = w) = 0 if {v,w} /∈ E . The

relevant parameter θ = (θ{v} : {v} ∈V1,θ{v,w} : {v,w} ∈ E ) now belongs to the simplex ∆(V1∪E ).The probability π can be written using the θ parameters:

π(v) = ∑w∈V

P(Xn = v,Xn+1 = w) = θ{v}+12 ∑

w : {v,w}∈Eθ{v,w}

or, in matrix form,

π = θV +12

Γ θE ,

where Γ is the incidence matrix of the graph G.

Example 1 (Running example) Consider the graph G = (V,E ) with V = {1,2,3,4} and E = {{1,2} ,{2,3} ,{3,4} ,{1,4} ,{2,4}},see left side of Figure 1. Here

2 Γ =

{1,2} {2,3} {3,4} {1,4} {2,4}

1 1 0 0 1 02 1 1 0 0 13 0 1 1 0 04 0 0 1 1 1

Proposition 1 1. The map

γ : ∆(V1∪E ) 3 θ =

[θVθE

]7−→ π =

[IV1

12Γ][θV

θE

]∈ ∆(V1)

is a surjective Markov map.2. The image of (0,θE ), θE ∈ ∆(E ), is the convex hull of the half point of each edge of ∆(V ) which belongs to E .

Proof 1. The image of the convex set ∆(V1 ∪E ) is the convex hull of the extreme points ei, i ∈ V , and e{x,y}, {x,y} ∈ E , henceγ (∆(V1∪E )) is the convex hull of the columns of the matrix

[IV1

12Γ].

2. By inspection of the columns of (1/2)Γ .ut

2.2 From the 1-margin to the 2-margin

Given π , the fiber γ−1(π) is contained in an affine space parallel to the subspace θV +(1/2)Γ θE . Each fiber contains special solutions.One is the constant case (π,0E ). If the graph has full connections, G = (V,V2), there is the independence solution θ{v} = π(v)2,θ{v,w} = 2π(v)π(w).

If π(v) > 0, v ∈ V , a strictly positive solution is obtained as follows. Let d(x) = #{y : {x,y} ∈ E } be the degree of the vertex vand define a transition probability by A(v,w) = 1/2d(v) if {v,w} ∈ E , A(v,v) = 1/2, and A(v,w) = 0 otherwise. A is the transitionmatrix of a random walk on the graph G , stopped with probability 1/2. Define a probability on V ×V with Q(x,y) = π(x)A(x,y).If Q(x,y) = Q(y,x) we are done we have a 2-reversible probability with marginal π . Otherwise, we have the following Hastings-Metropolis construction.

Proposition 2 Let Q be a strictly positive probability on V ×V and let π(x) = ∑y Q(v,w). If f :]0,1[×]0,1[→]0,1[ is a symmetricfunction such that f (x,y)≤ x∧ y then

P(v,w) =

{f (Q(v,w),Q(w,v)) if v 6= w,π(v)−∑w : w6=v P(v,w) if v = w.

is a 2-reversible strictly positive probability on V ×V such that π(v) = ∑w P(v,w).

ARMC 3

Proof For v 6= w we have P(v,w) = P(w,v)> 0. As P(v,w)≤ Q(v,w), v 6= w, it follows

P(v,v) = π(v)− ∑w : w6=v

P(v,w)

≥∑w

Q(v,w)− ∑w : w 6=w

Q(v,w)

= Q(v,w)> 0.

We have ∑w P(v,w) = π(v) by construction and, in particular, P is a probability on V ×V . ut

Remark 1 1. The proposition applies to(a) f (x,y) = x∧ y. This is the standard Hastings choice.(b) f (x,y) = xy/(x+ y). This was suggested by Barker.(c) f (x,y) = xy. In fact, as x < 1, we have xy < x. It is an algebraic function which is included in Hastings’s proposals.

2. Given P, the corresponding parametersθ{v,w} = 2P(v,w) and θ{v} = P(v,v)

are strictly positive. We have shown the existence of a mapping from the interior of ∆(V ) to the interior of ∆(V1∪V2).3. The mapping θ 7→ (π,Pv→w) is a rational mapping from ∆(V1∪V2) into ∆(V )⊗∆(V )⊗V .

Example 2 Let V = {0,1,2} and π be a binomial distribution on V , π(i) =(2

i

)pi(1− p)2−i. Choose

qi j =

{1/2 if i 6= j

0 if i = j.

Then

ai j =π( j)q ji

π(i)qi j=

12

(p

1− p

) j−i

and pi j =12(ai j ∧1) .

We have:

a0,1 = 2p

1− pa0,2 =

(p

1− p

)2

a1,2 =12

p1− p

.

If p < 1/3 then a0,1, a0,1 and a0,1 are all smaller than 1 and the transition probability matrix is:

P =

1− p

1−p −12

(p

1−p

)2 p1−p

12

(p

1−p

)2

12

12 −

14

p1−p

14

p1−p

12

12 0

2.3 Reversible Markov Chain

Now consider a special case. Assume the reversible process (Xn)n∈N is a Markov chain and consider the undirected graph G (V,E )such that {v,w} ∈ E if, and only if, θ{v,w} > 0.

The transition probability are:

Pv→w =P(Xn = v,Xn+1 = w)

P(Xn = v)=

θ{v,w}

∑w θ{v,w}

Pw→v =P(Xn = w,Xn+1 = v)

P(Xn = w)=

θ{v,w}

∑v θ{v,w}

so that, denoting ∑w θ{v,w} by κ(v) and ∑v θ{v,w} by κ(w):

κ(v)Pv→w = κ(w)Pw→v (1)

and ∑v κ(v) = 1.Vice-versa, if there exist positive constants κ(v), v ∈ V such that (1), by summing on w we obtain that κ is an unnormalized

invariant probability:κ(v) = ∑

wκ(w)Pw→v.

Note that if P is reversible, then the backward transition is Pbv→w = κ(w)Pw→v/κ(v) = Pv→w.

Let G (V,E ) a connected graph. We denote by ω a closed path, that is a path on the graph such that the last vertex coincides withthe first one: ω = v0v1 . . .vnv0 and by r(ω) the reversed path r(ω) = v0vn . . .v1v0.


1 2

4 3

1 2

4 3

Fig. 1 From the undirected graph to the directed graph of transitions

Theorem 1 (Kolmogorov’s theorem) Let (Xn)n∈N be a Markov process on V with support of the transitions G . The process isreversible if and only if for all closed path ω

P(ω|X0 = v1) = P(−ω|X0 = v1). (2)

Proof Assume that the process is reversible. Then we have:

κ(v1)Pv1→v2 = κ(v2)Pv2→v1

κ(v2)Pv2→v3 = κ(v3)Pv3→v2

...κ(vn)Pvn→v1 = κ(v1)Pv1→vn .

By multiplying together all these equality and simplifying the κ’s we obtain

Pv1→v2Pv2→v3 · · ·Pvn→v1 = Pv1→vn · · ·Pv3→v2Pv2→v1 .

Vice-versa assume that all the circuit have the property 2. We denote by v and w the first and the next to last vertices, respectively.By summing on the intermediate vertices on all circuits with boundary vertex v and w, we obtain:

∑v2v3...vn−1

Pi→v2Pv2→v3 · · ·Pw→v = ∑v2v3...vn−1

Pv→w · · ·Pv3→v2Pv2→v

andP(n−2)

v→w Pw→v = Pv→wP(n−2)v→w

where P(n−2)v→w denote the transition probability in (n−2) steps. If n→ ∞

π(w)Pw→v = Pv→wπ(v)

and the chain is reversible. ut

Suomela (1979) has a proof that does not use the ergodic theorem. This proof is related with our algebraic discussion below.

3 Algebraic theory

The present section is devoted to the algebraic structure implied by the Kolmogorov’s theorem for reversible Markov chains. We refermainly to the textbooks Berge (1985) and Bollobas (1998) for graph theory, and to the textbooks Cox et al (1997) and Kreuzer andRobbiano (2000) for computational commutative algebra. The theory of toric ideals is treated in detail in Sturmfels (1996) and Bigattiand Robbiano (2001). General references for algebraic methods in Stochastics are e.g. Drton et al (2009), Gibilisco et al (2009). Therelevance of Graver bases, see Sturmfels (1996), has been pointed out to us by Shmuel Onn has in view of the applications discussedin De Loera et al (2008) and Onn (to appear).

3.1 Kolmogorov’s ideal

We denote by G = (V,E ) an (undirected simple) graph. We split each edge into two opposite arcs to get a connected directed graph(without loops) denoted by O = (V,A ). The arc going from vertex v to vertex w is denoted (v→ w). The graph O is such that(v→ v) /∈A and (v→ w) ∈A if, and only if, (w→ v) ∈A . Because of our application to Markov chains, we want two arcs on eachedge, as in Figure 1. Because of that, some of our statements about graphs differ from the standard presentation, where to each edgeis given one single orientation. For example, the set of arcs leaving a vertex v, denoted out(v), and the set of arcs entering the samevertex, denoted in(v), correspond to the same set of edges.

The reversed arc is the image of the 1-to-1 function r : A →A defined by r(v→ w) = (w→ v). A path is a sequence of verticesω = v0v1 · · ·vn such that (vk−1→ vk) ∈A , k = 1, . . . ,n. The reversed path is denoted by rω = vnvn−1 · · ·v0. Equivalently, a path is asequence of inter-connected arcs ω = a1 . . .an, ak = (vk−1→ vk), and rω = r(an) . . .r(a1).

ARMC 5

P1 23 P2 3

4 P3 44 P4 1

3 P4 2

1 2

4 3

(3)

(3) (4)

(4)

(1)

4 1 1 22 3 3 44 1 1 23 42 3 3 4 4 2 2 34 1 1 22 3 3 4

Fig. 2 Illustration of a path, its traversal counts, its monomial term

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

Fig. 3 The 6 cycles of a graph: ωA = 1→ 2 2→ 4 4→ 1, ωB = 2→ 3 3→ 4 4→ 2, ωC = 1→ 2 2→ 3 3→ 2 4→ 1, ωD = rωA, ωE = rωB,ωF = rωC .

A closed path ω = v0v1 · · ·vn−1v0 is any path going from a vertex v0 to itself; rω = v0vn−1 · · ·v1v0 is the reversed closed path. Ina closed path any vertex can be the initial and final vertex. If we do not distinguish any initial vertex, the equivalence class of paths iscalled a circuit. A closed path is elementary if it has no proper sub-closed-path, i.e. if does not meet twice the same vertex except theinitial one v0. The circuit of an elementary closed path is a cycle. We denote by C the set of cycles of O .

Consider the indeterminates P = [Pv→w], (v→ w) ∈ A , and the polynomial ring k[Pv→w : (v→ w) ∈ A ]. For each path ω =a1 · · ·an, ak ∈A , k = 1, . . . ,n, we define the monomial term

ω = a1 · · ·an 7→ Pω =n

∏k=1

Pak .

For each a ∈A , let Na(ω) be the number of traversals of the arc a by the path ω . Hence,

Pω = ∏a∈A

PNa(ω)a .

See the illustration in Figure 2.Note that ω 7→Pω is a representation of the non-commutative concatenation of arcs on the commutative product of indeterminates.

Two closed paths associated to the same circuit are mapped to the same monomial term because they have the same traversal counts.The monomial term of a cycle is square-free. Figure 3 presents the 6 cycles of a square with one diagonal.

Definition 1 (K-ideal) The Kolmogorov’s ideal or K-ideal of the graph G is the ideal of the ring k[Pv→w : (v→ w) ∈A ] generatedby the binomials Pω −Prω , where ω is any circuit. The K-variety is the k-affine variety of the K-ideal.

Our main application concerns the real case k = R, but the combinatorial structure of the K-ideal does not depend on the choiceof a specific field. A interesting choice for computations could be the Galois field k = Z2.

For a given connected graph G , we say that a transition matrix P = [Pv→w], u,v ∈V , is compatible with G if Pv→w = 0 whenever(v→ w) /∈A and v 6= w. If out(v) be the set of arcs leaving v, and define the simplex

∆(v) =

{Pv→· ∈ Rout(v)

+ : ∑w∈out(v)

Pv→w(w)≤ 1

}.

A transition matrix P compatible with G is a point in the product of simplexes ∆(O) =×u∈V ∆(u).

Proposition 3 (Examples of K-ideals) Let P be compatible with G and reversible.

1. The transition matrix Pv→w, (v→ w) ∈A , is a point of the intersection of the variety of the K-ideal with ∆(O).2. Let (Xn)n≥0 be the stationary Markov chain with transition P. Then the joint probabilities p(v,w) = P(Xn = u,Xn+1 = v), (v→

w) ∈A , are points in the intersection of the K-variety and the simplex ∆(A ) ={

p ∈ RA+ : ∑a∈A P(a)≤ 1

}.

Proof 1. It is the first part of the Kolmogorov’s theorem.


2. Let ω = v0 . . .vnv0 be a closed path. If π is the stationary probability, by multiplying the Kolmogorov’s equations by the productof the initial probabilities at each transition, we obtain

π(v0)π(v1) · · ·π(vn)Pv0→v1 · · ·Pvn→v0 = π(v0)π(vn) · · ·π(vn)Pv0→vn · · ·Pv1→v0 ,

hencep(v0,v1)p(v1,v2) · · · p(vn,v0) = p(v0,vn)p(vn,vn−1) · · · p(v1,v0).

ut

The K-ideal has a finite basis because of the Hilbert’s basis theorem. Precisely, a finite basis is obtained by restricting to cycles,which are finite in number.

Proposition 4 (Finite basis of the K-ideal) The K-ideal is generated by the set of binomials Pω −Prω , where ω is cycle.

Proof Let ω = v0v1 · · ·v0 be a closed path which is not elementary and consider the least k≥ 1 such that vk = vk′ for some k′< k. Thenthe sub-path ω1 between the k′-th vertex and the k-th vertex is an elementary closed path and the residual path ω2 = v0 · · ·vk′vk+1 · · ·v0is closed and shorter than the original one. The arcs of ω are in 1-to-1 correspondence with the arcs of ω1 and ω2. The procedure canbe iterated and stops in a finite number of steps. Hence, given any closed path ω , there exists a finite sequence of cycles ω1, . . . ,ωl ,such that the list of arcs in ω is partitioned into the lists of arcs of the ωi’s. From Pωi −Prωi = 0, i = 1, . . . , l, it follows

Pω =l

∏i=1

Pωi =l

∏i=1

Prωi = Prω .

ut

The K-ideal is generated by a finite set of binomials and this set has the same number as the set of unordered which in fact is aGrobner basis. We refer to Cox et al (1997) and Kreuzer and Robbiano (2000) for a detailed discussion.

We review below the basic definitions of this theory, which is based on the existence of a monomial order �, i.e. a total order onmonomial terms which is compatible with the product. Given such an order, the leading term LT( f ) of the polynomial f is defined.A generating set is a Grobner basis if the set of leading terms of the ideal is generated by the leading terms of monomials in thegenerating set. A Grobner basis is reduced if the coefficient of the leading term of each element of the basis is 1 and no monomial inany element of the basis is in the ideal generated by the leading terms of the other element of the basis. The Grobner basis propertydepends on the monomial order. However, a generating set is said to be a universal Grobner basis if it is a Grobner basis for allmonomial orders.

The finite algorithm for testing the Grobner basis property depends on the definition of syzygy. Given two polynomial f and g inthe polynomial ring K, their syzygy is the polynomial

S( f ,g) =LT(g)

gcd(LT( f ),LT(g))f − LT( f )

gcd(LT( f ),LT(g))g.

A generating set of an ideal is a Grobner basis if, and only if, it contains the syzygy S( f ,g) whenever it contains f and g, see (Coxet al 1997, Ch 6) or (Kreuzer and Robbiano 2000, Th. 2.4.1 p. 111).

Proposition 5 (Universal G-basis) The binomials Pω −Prω , where ω is any cycle, form a reduced universal Grobner basis of theK-ideal.

Proof Choose any monomial order � and let ω1 and ω2 be two cycles with ωi � rωi, i = 1,2. Assume first they do not have any arcin common. Then gcd(Pω1 ,Pω2) = 1 and the syzygy is

S(Pω1 −Prω1 ,Pω2 −Prω2) = Pω2(Pω1 −Prω1)−Pω1(Pω2 −Prω2) = Pω1Prω2 −Prω1Pω2 ,

which belong to the K-ideal.Let now α be the common part. The syzygy of Pω1 −Prω1 and Pω2 −Prω2 is

Pω1−α Prω2 −Pω2−α Prω1 = Prα(Pω1−α Prω2−rα −Pω2−α Prω1−rα) = 0,

which again belong to the K-ideal because ω1−α + r(ω2−α) is a union of cycles. In fact ω1−α and ω2−α have in common theextreme vertices, corresponding to the extreme vertices of α . Notice that α is the common part of ω1 and ω2 only if it is traversed inthe same direction by the both cycle. The previous proof does not depend on the choice of the leading term of the binomials, thereforethe Grobner basis is universal. The Grobner basis is reduced because no monomial of a cycle can divide a monomial of a differentcycle. ut

Example 3 (Running example continue) An illustration of the previous proof is in Figure 4.

Remark 2 (Monomial basis) A monomial order is obtained by first introducing a total order on arcs. For example, one could give atotal order on vertexes, then order lexicographically the arc. We do not see any special order with particular meaning in this problem.The issue is related with the monomial basis which is linear basis of the quotient ring.

ARMC 7

t1 t

3gcd t

1{ },t

3

tA t

Cgcd t

A{ }, t

C tC

gcd tA

{ },tC

r tA

r

))t

Br t

Bt

Cgcd t

A{ },r

0

Fig. 4 ωA is the green cycle and ωC is the red cycle. In blue we have represented the common part.

3.2 Cycle and cocycle spaces

We adapt to our context some standard notions from algebraic graph theory, precisely the cycle an cocycle spaces, see e.g. (Berge1985, Ch 2) and (Bollobas 1998, II.3).

Let C be the set of cycles. For each cycle ω ∈ C we define the cycle vector of ω to be z(ω) = (za(ω) : a ∈A ), where

za(ω) =

+1 if a is an arc of ω,

−1 if r(a) is an arc of ω,

0 otherwise.

Note that zr(a)(ω) = −za(ω). If z+, z−, are the positive and the negative parts of z, respectively, then z+a (ω) = Na(ω) and z−a (ω) =

Na(rω). It follows that Pω =PN(ω) =Pz+(ω) =∏a∈A Pz+a (ω)a and the binomial Pω−Prω is written as Pz+(ω)−Pz−(ω). More generally,

the definition can be is extended to any circuit ω by defining

za(ω) = Na(ω)−Nr(a)(ω).

Let Z(O) be the cycle space, i.e. the vector space generated in kA by the cycle vectors.For each proper subset S of the set of vertices, /0 6= S (V we define the cocycle vector of S to be u(S) = (ua(S) : a ∈A ), where:

ua(S) =

+1 if a exits from S,−1 if a enters into S,0 otherwise.

a ∈A .S

1 2

4 3

Note that ur(a)(S) =−ua(S).Let U(O) be the cocycle space, i.e. the vector space generated in kA by the cocycle vectors. Let U be the matrix whose rows are

the cocycle vectors u(S), /0 6= S (V . We call such a matrix U = [ua(S)] /0 6=S(V,a∈A the cocycle matrix.The cycle space and the cocycle space are orthogonal in kA . In fact, for each cycle vector z(ω) and cocycle vector u(S), we have

zr(a)(ω)ur(a)(S) = (−za(ω))(−ua(S)) = za(ω)ua(S), a ∈A ,

so that

z(ω) ·u(S) = ∑a∈A

za(ω)ua(S) = ∑a∈ω

za(ω)ua(S)+ ∑r(a)∈ω

za(ω)ua(S)

= 2 ∑a∈ω

za(ω)ua(S) = 2

[∑

a∈ω,ua(S)=+11− ∑

a∈ω,ua(S)=−11

]= 0.

It is shown e.g. in the previous references that the cycle space is the orthogonal complement of the cocycle space for undirectedgraphs. In our setting it is the orthogonal complement relative to the subspace of vectors x such that xr(a) =−xa. Consider the E ×Amatrix E whose element in position (e,a) is 1 if the arc a belongs to the edge e, otherwise it is zero. If we form the block matrix

A =

[EU

]


Table 1 An example of cycle and cocycle spaces. The top matrix is the E matrix; the bottom matrix is the U matrix, where three linearly independent rows arehighlighted. The two row vectors are a lattice basis of kerZ(A)

1→ 2 1→ 4 2→ 3 2→ 4 3→ 4 2→ 1 4→ 1 3→ 2 4→ 2 4→ 31,2 1 0 0 0 0 1 0 0 0 01,4 0 1 0 0 0 0 1 0 0 02,3 0 0 1 0 0 0 0 1 0 02,4 0 0 0 1 0 0 0 0 1 03,4 0 0 0 0 1 0 0 0 0 1{1} 1 1 0 0 0 −1 −1 0 0 0{2} −1 0 1 1 0 1 0 −1 −1 0{3} 0 0 −1 0 1 0 0 1 0 −1{4} 0 −1 0 −1 −1 0 1 0 1 1{12} 0 1 1 1 0 0 −1 −1 −1 0{13} 1 1 −1 0 1 −1 −1 1 0 −1{14} 1 0 0 −1 −1 −1 0 0 1 1{23} −1 0 0 1 1 1 0 0 −1 −1{24} −1 −1 1 0 −1 1 1 −1 0 1{34} 0 −1 −1 −1 0 0 1 1 1 0{123} 0 1 0 1 1 0 −1 0 −1 −1{124} 0 0 1 0 −1 0 0 −1 0 1{134} 1 0 −1 −1 0 −1 0 1 1 0{234} −1 −1 0 0 0 1 1 0 0 0

z(ωA) 1 −1 0 1 0 −1 1 0 −1 0z(ωB) 0 0 1 −1 1 0 0 −1 1 −1

then Z (O) = kerZ A.The matrix A has dimension #E +#V −1. In fact E can be re-arranged as [I#E |I#E ], with I#E the identity matrix, and U has #V −1

linearly independent rows, the dimension of the cocycle space. Remember that a basis of the cocycle space is obtained by consideringa spanning three and separating vertices at each of its edges.

Example 4 (Running example continue) Table 1 shows the matrix A o and a lattice basis of kerZ A, computed with CoCoA. Threelinearly independent rows of U are highlighted.

We recall from Onn (to appear) the definition of Graver basis. Let z(ω1) and z(ω2) be two element of the cycle space Z(O). Weintroduce a partial order and its set of minimal elements as follows.

Definition 2 (Graver basis)

1. z(ω1) is conformal to z(ω2), z(ω1)v z(ω2), if the component-wise product is non-negative and |z(ω1)| ≤ |z(ω2)| component-wise,i.e. za(ω1)za(ω2)≥ 0 and |za(ω1)| ≤ |za(ω2)| for all a ∈A .

2. A Graver basis of Z(O) is the set of the minimal elements with respect to the conformity partial order v.

Proposition 6 1. For each cycle vector z ∈ Z(O), z = ∑ω∈C λ (ω)z(ω), there exist cycles ω1, . . . ,ωn ∈ C and positive integersα(ω1), . . . ,α(ωn), such that z+ ≥ z+(ωi), z− ≥ z−(ωi), i = 1, . . . ,n and

z =n

∑i=1

α(ωi)z(ωi).

2. The set {z(ω) : ω ∈ C } is a Graver basis of Z (O).

Proof 1. For all ω ∈ C we have −u(ω) = u(rω), so that we can assume all the λ (ω)’s to be non-negative. Notice also that we canarrange things in such a way that at most one of the two direction of each cycle has a positive λ (ω). We define

A+(z) = {a ∈A : za > 0} , A−(z) = {a ∈A : za < 0} ,

and consider two subgraph of O with a restricted set of arcs, O+(z) = (V,A+(z)), O−(z) = (V,A−(z)). We drop from now on thedependence on z for ease of notation. We note that rA+ = A− and rA− = A+.We show first that A+ must contain a cycle. If O+ where acyclic, it would exists a vertex v such that out(v)∩A+ = /0 andin(v)∩A+ 6= /0. Let u(v) be the cocycle vector of {v}; we derive a contradiction to the assumption z ·u(v) = 0. In fact,

z ·u(v) = ∑a∈A+

zaua(v)+ ∑a∈A−

zaua(v) = 2 ∑a∈A+

zaua(v) = 2 ∑a∈A+∩in(v)

zaua(v)≤−1.

Let ω1 be a cycle in A+ and define an integer α(ω1) ≥ 1 such that z+−α(ω1)z+(ω1) ≥ 0 and it is zero for at least one a. Thevector z1 = z−α(ω1)z(ω1) belongs to the cycle space Z (O), and moreover A+(z1)⊂A+(z).

ARMC 9

+ 2 =

1 1 1 12 2 2 2

4 4 4 43 3 3 3

+ 2

(3)

(3) (4)

(4)

(1)

Fig. 5 An element of the cycle space

+ 3=

1 11 2 22

4 44 3 33

(3)

(3) (4)

(4)

(1)

Fig. 6 Computation of the conformal representation of the z of Figure 5

By repeating the same step a finite number of times we obtain a new representation of z in the form z = ∑ni=1 α(ωi)z(ωi) where

the support of each α(ωi)z+(ωi) is contained in A+. It follows

z+ =n

∑i=1

α(ωi)z+(ωi) and z− =n

∑i=1

α(ωi)z−(ωi) (3)

2. In the previous decomposition each z(ωi), i = 1, . . . ,n is conformal to z. In fact, from z+ ≥ z+(ωi) and z− ≥ z−(ωi), it followszaza(ωi) = z+a z+a (ωi)− z−a z−a (ωi)≥ 0 and |za(ωi)|= z+a (ωi)− z−a (ωi)≤ z+a + z−z = |za|. Moreover z(ωi)@ z.

ut

Example 5 (Running example continue) We give an illustration of the previous proof. Consider the cycle vectors

1→ 2 2→ 1 2→ 3 3→ 2 3→ 4 4→ 3 4→ 1 1→ 4 2→ 4 4→ 2z(ωA) = ( 1 −1 0 0 0 0 1 −1 1 −1 )

z(ωB) = ( 0 0 1 −1 1 −1 0 0 −1 1 )

z(ωC) = ( 1 −1 1 −1 1 −1 1 −1 0 0 )

and the element of the cycle space z = z(ωA)+2z(ωB)+2z(ωC), see Figure 5. We have

z = z(ωA)+2z(ωB)+2z(ωC)= (3 ,−3 , 4 ,−4 , 4 ,−4 , 0 , 0 ,−1 , 1)z+ = z+(ωB)+3z+(ωC) = (3 , 0 , 4 , 0 , 4 , 0 , 0 , 0 , 0 , 1)

as it is illustrated in Figure 6.

3.3 Toric ideal

We want now to show that the K-ideal is a toric ideal, see (Sturmfels 1996, Ch 4) and Bigatti and Robbiano (2001).Consider the ring k[Pa : a ∈A ] and the Laurent ring k(te, tS : /0 6= S (V,e ∈ E ), together with their homomorphism h defined by

h : Pa 7−→∏e

s(e)Oa(e)∏S

tua(S)S = tA(a),

i.e.h : Pv→w 7−→ s(v,w)∏

Stuv→w(S)S , (4)

The kernel I(A) of h is called the toric ideal of A,

I(A) ={

f ∈ k[Pa : a ∈A ] : f (tA(a) : a ∈A ) = 0}.

The toric ideal I(A) is a prime ideal and the binomials

Pz+ −Pz− , z ∈ ZA , Az = 0,

are a generating set of I(A) as a k-vector space. In particular, a finite generating set of the ideal is formed by selecting a finite subsetof such binomials.


Proposition 7 (The K-ideal is toric) The K-ideal is the toric ideal of the matrix A.

Proof For each cycle ω the cycle vector z(ω) belongs to kerZ A ={

z ∈ ZA : Az = 0}

. Moreover, Pz+(ω) = Pω , Pz−(ω) = Prω , there-fore the K-ideal is contained in the toric ideal I(A).

To prove the equality we must show that each binomial in the toric ideal belongs to the K-ideal. From Equation (3) of Proposition6, it follows that

Pz+ −Pz− =n

∏i=1

(Pz+(ωi))α(ωi)−n

∏i=1

(Pz−(ωi))α(ωi)

belongs to the K-ideal. ut

The Graver basis of a toric ideal is the set of binomials whose exponents are the positive and negative parts of a Graver basis.From propositions 6 and 7 it follows that

Proposition 8 The binomials of the cycles form a Graver basis of the K-ideal.

Remark 3 We mention important consequences of the properties of the K-ideal we do not discuss further here.

1. It follows from general properties of toric ideals that a a Graver basis is a universal Grobner basis and that a universal Grobnerbasis is a Markov basis, Sturmfels (1996). The Markov basis property is related with the connectedness of random walks on thefibers of A, see the seminal Diaconis and Sturmfels (1998) and many other papers.

2. The knowledge of a Graver basis for the K-ideal provides efficient algorithms for discrete optimization, see De Loera et al (2008)and Onn (to appear).

3.4 Positive K-ideal

The knowledge that the K-ideal is toric is relevant, because it provides a parametric representation of the strictly positive points onthe variety, i.e. the strictly positive transition probabilities on O are given by:

Pv→w = s(v,w)∏S

tuv→w(S)S = s(v,w) ∏

S : v∈S,w/∈StS ∏

S : w∈S,v/∈St−1S s(v,w)> 0 tS > 0. (5)

We observe that the first set of parameters, s(v,w), is a function of the edge. As the rows of E are linearly independent, suchparameters carry #E degrees of freedom to represent symmetric transition matrices. The second set of parameters, tS, representthe deviation from symmetry. The second set of parameters is not identifiable because the rows of the U matrix are not linearlyindependent. The parametrization (5) can be used to derive an explicit form of the invariant probability, in particular a parametricform of Theorem in Suomela (1979). All properties of the parametrization are collected in the following Proposition.

Proposition 9 Consider the strictly non-zero points on the K-variety.

1. The symmetric parameters s(e), e ∈ E , are uniquely determined in (5). The parameters tS, S ⊂ V are confounded by kerU ={U tt = 0}

2. An identifiable parametrization is obtained by taking a subset of parameters corresponding to linearly independent rows, denotedby tS, S⊂S :

Pv→w = s(v,w) ∏S⊂S : v∈S,w/∈S

tS ∏S⊂S : w∈S,v/∈S

t−1S (6)

3. The detailed balance equations, κ(v)Pv→w = κ(w)Pw→v, are verified if, and only if,

κ(v) ∝ ∏S : v∈S

t−2S (7)

Proof 1. We have:

logP = Ets+U tt, P = (Pv→w : (v→ w) ∈A ),s = (s(e) : e ∈ E ), t = (tS : /0 6= S (V ).

If Ets1 +U tt1 = Ets2 +U tt2, then Et(s1− s2) = 0 because the rows of E are orthogonal to the rows of U . Finally, s1 = s2 becauseE has full rank. Finally U tt1 =U tt2.

2. The sub-matrix of A formed by E and by the rows of U in S has full rank.

ARMC 11

3. Using Equations (5), we have:

κ(v) s(v,w) ∏S : v∈S,w/∈S

tS ∏S : w∈S,v/∈S

t−1S = κ(w) s(v,w) ∏

S : w∈S,v/∈StS ∏

S : v∈S,w/∈St−1S

which is equivalent toκ(v) ∏

S : v∈S,w/∈St2S = κ(w) ∏

S : w∈S,v/∈St2S .

By multiplying both terms in the equality by ∏S : v∈S,w∈S t2S , we obtain

κ(v) ∏S : v∈S

t2S = κ(w) ∏

S : w∈St2S ,

so that κ(v) = ∏S : v∈S t−2S depends only on v and satisfy the balanced conditions.

ut

We are now in the position of stating an algebraic version of Kolmogorov’s theorem.

Definition 3 The detailed balance ideal is the ideal of k[κ(v) : v ∈V,Pv→w,(v→ w) ∈A ]⟨∏v∈V

κ(v)−1,κ(v)Pv→w−κ(w)Pv→w, (v→ w) ∈A

⟩.

The first polynomial states the positivity of κ’s parameters.

Proposition 10 1. The matrix [Pv→w]v→w∈A is a point of the variety of the K-ideal if and only if there exists κ = (κ(v) : v ∈V ) suchthat (κ,P) belongs to the variety of the detailed balance ideal.

2. The detailed balance ideal is a toric ideal.3. The K-ideal is the κ-elimination ideal of the detailed balance ideal.

Proof 1. It is a rephrasing of Item 3 of Proposition 9.2. This ideal is the kernel of the homomorphism defined by (4), i.e. Pv→w 7−→ s(v,w)∏S tuv→w(S)

S together with κ(v) 7→∏S : v∈S t−2S .

3. The elimination ideal is generated by dropping the parametric equations of the indeterminates to be eliminated.ut

3.5 Parametrization of reversible transitions

An other parametrization follows from Proposition 9. It is irrational, but is derived from the natural toric parametrization and, in thecase of transition probabilities, it involves explicitly the unnormalized invariant probability.

Proposition 11 1. There exist a (non algebraic) parametrization of the non-zero K-variety of the form

Pv→w = s(v,w) κ(w)1/2κ(v)−1/2 (8)

2. P is a reversible transition probability which is strictly positive on the graph G and has invariant probability proportional to κ

if, and only if, it is of the form (8) and moreover κ(v)1/2 ≥ ∑w6=v s(u,w)κ(w)−1/2.

Proof 1. Follows from (6) and (7):

κ(w)1/2κ(v)−1/2 =

∏S : v∈S tS∏S : w∈S tS

= ∏S⊂S : v∈S,w/∈S

tS ∏S⊂S : w∈S,v/∈S

t−1S

2. If P is a reversible positive transition probability, then

1≥ ∑(v→w)∈A

s(u,w)κ(w)1/2κ(v)−1/2.

ut

The monomial parametrization of the positive K-ideal leads to an alternative presentation of the statistical model, cf. Diaconisand Rolles (2006), and possibly leads to a variation of the methods used in that paper.


Example 6 (Running example continue) An over-parametrization of two transition probabilities of the K-variety is:

P2→3 = s(2,3) t{3} t{1,3} t{2,3} t{1,2,3} t−1{4} t−1

{1,4} t−1{2,4} t−1

{1,2,4}

P3→2 = s(2,3) t{4} t{1,4} t{2,4} t{1,2,4} t−1{3} t−1

{1,3} t−1{2,3} t−1

{1,2,3}

Choosing S = {{1},{3},{1,2}}, we have:κ(1) = t−2

{1}t−2{1,2}

κ(2) = t−2{1,2}

κ(3) = t−2{3}

κ(4) = 1

and

t{1} = κ(1)−1/2 κ(2)1/2

t{3} = κ(3)−1/2

t{1,2} = κ(2)−1/2

The transition matrix parameterized by s(e), e ∈ E and tS, S ∈S is

1 2 3 4

1 ? s(1,2) t−1{1} 0 s(1,4) t−1

{1} t−1{1,2}

2 s(1,2) t{1} ? s(2,3) t−1{1,2} t{3} s(2,4) t−1

{1,2}3 0 s(2,3) t−1

{3} t{1,2} ? s(3,4) t−1{1}

4 s(1,4) t{1} t{1,2} s(2,4) t{1,2} s(3,4) t{3} ?

and parameterized by s(e),e ∈ E and κ(v),v ∈V is

1 2 3 4

1 ? s(1,2)κ(1)−1/2κ(2)1/2 0 s(1,4)κ(1)−1/2

2 s(1,2)κ(1)1/2κ(2)−1/2 ? s(2,3)κ(2)−1/2κ(3)1/2 s(2,4)κ(2)−1/2

3 0 s(2,3)κ(2)1/2κ(3)−1/2 ? s(3,4)κ(3)−1/2

4 s(1,4)κ(1)1/2 s(2,4)κ(2)1/2 s(3,4)κ(3)1/2 ?

Remark 4 Given Pa, Pa 6= 0 for any a ∈ A , reversible, then there exist unique s(e) and unnormalized κ(v) representing Pa. In theHastings-Metropolis algorithm, we are given an unnormalized positive probability κ and a transition Qv→w > 0 if, and only if,(v→ w) ∈A . We are required to produce a new transition Pv→w = Qv→wα(v,w) such that P is reversible with invariant probability κ

and 0<α(v,w)≤ 1. This problem has been discussed in Proposition 2. We derive again the Hastings solution via our parametrization.We have

Qv→wα(v,w) = s(v,w)κ(w)1/2κ(v)−1/2

and moreover we want

α(v,w) =s(v,w)κ(w)1/2

Qv→wκ(v)1/2 ≤ 1,

that is the symmetric s(v,w) must satisfy

s(v,w)≤ Qv→wκ(v)1/2κ(w)−1/2.

The Hastings’s choice corresponds to the largest possible value of s(u,v), see also the discussion in Peskun (1973). In fact, thelargest choice is

s(v,w) = Qv→wκ(v)1/2κ(w)−1/2∧Qw→vκ(w)1/2

κ(v)−1/2,

which, in turn, leads to

α(v,w) = 1∧ Qw→vκ(w)Qv→wκ(v)

.

Acknowledgment

During this research we have discussed various items with S. Sullivan, B. Sturmfels, G. Casnati and we want to thank them for theirencouragement and advice. Preliminary version of this paper have been presented at WOGAS2, Warwick U. 2010 and CREST-SBM2, Osaka 2010. The authors wish to thank G. Letac for bringing reference Suomela (1979) to their attention and F. Rigat for thereference Peskun (1973). Also they thank S. Onn for pointing out the relevance of his own work on Graver bases.

ARMC 13

References

Berge C (1985) Graphs, North-Holland Mathematical Library, vol 6. North-Holland Publishing Co., Amsterdam, second revised edition of part 1 of the 1973 Englishversion

Bigatti A, Robbiano L (2001) Toric ideals. Matematica Contemporanea 21:1–25Bollobas B (1998) Modern graph theory, Graduate Texts in Mathematics, vol 184. Springer-Verlag, New YorkCox D, Little J, O’Shea D (1997) Ideals, varieties, and algorithms: An introduction to computational algebraic geometry and commutative algebra, 2nd edn. Under-

graduate Texts in Mathematics, Springer-Verlag, New YorkDe Loera JA, Hemmecke R, Onn S, Weismantel R (2008) n-fold integer programming. Discrete Optim 5(2):231–241, DOI 10.1016/j.disopt.2006.06.006, URL

http://dx.doi.org/10.1016/j.disopt.2006.06.006Diaconis P, Rolles SWW (2006) Bayesian analysis for reversible Markov chains. Ann Statist 34(3):1270–1292, DOI 10.1214/009053606000000290, URL http:

//dx.doi.org/10.1214/009053606000000290Diaconis P, Sturmfels B (1998) Algebraic algorithms for sampling from conditional distributions. Ann Statist 26(1):363–397Dobrushin RL, Sukhov YM, Fritts I (1988) A. N. Kolmogorov—founder of the theory of reversible Markov processes. Uspekhi Mat Nauk 43(6(264)):167–188,

DOI 10.1070/RM1988v043n06ABEH001985, URL http://dx.doi.org/10.1070/RM1988v043n06ABEH001985Drton M, Sturmfels B, Sullivant S (2009) Lectures on Algebraic Statistics. No. 39 in Oberwolfach Seminars, BirkhauserGibilisco P, Riccomagno E, Rogantin M, Wynn HP (eds) (2009) Algebraic and Geometric Methods in Statistics. Cambridge University PressHastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109, URL http://dx.doi.org/10.

1093/biomet/57.1.97Kreuzer M, Robbiano L (2000) Computational commutative algebra. 1. Springer-Verlag, BerlinLiu JS (2008) Monte Carlo strategies in scientific computing. Springer Series in Statistics, Springer, New YorkOnn S (to appear) Theory and applications of n-fold integer programming. In: Mixed integer non-linear programming, Frontier Science, IMA, pp 1–35Peskun PH (1973) Optimum Monte-Carlo sampling using Markov chains. Biometrika 60:607–612Pistone G, Riccomagno E, Wynn HP (2001) Algebraic statistics. Computational commutative algebra in statistics, Monographs on Statistics and Applied Probability,

vol 89. Chapman & Hall/CRC, Boca Raton, FLStrook DW (2005) An Introduction to Markov Processes. No. 230 in Graduate Texts in Mathematics, Springer-Verlag, BerlinSturmfels B (1996) Grobner bases and convex polytopes. American Mathematical Society, Providence, RISuomela P (1979) Invariant measures of time-reversible Markov chains. J Appl Probab 16(1):226–229

http://dx.doi.org/10.1016/j.disopt.2006.06.006

http://dx.doi.org/10.1214/009053606000000290

http://dx.doi.org/10.1214/009053606000000290

http://dx.doi.org/10.1070/RM1988v043n06ABEH001985

http://dx.doi.org/10.1093/biomet/57.1.97

http://dx.doi.org/10.1093/biomet/57.1.97

Version 1 History: Wogas 2, U. Torino, CREST-SBM 2 · 2019. 11. 20. · History: Wogas 2, U. Torino, CREST-SBM 2 Algebra of reversible Markov chains Giovanni Pistone Maria Piera Rogantin

Documents