Binary Contingency Tables in Theory and Practice - Rochester

Binary Contingency Tables in Theory and Practice

Ivona Bezáková(Rochester Institute of Technology)

Based on joint works with Nayantara Bhatnagar, Alistair Sinclair, Daniel Štefankovič, and Eric Vigoda.

DIMACS Workshop on Markov Chain Monte Carlo: Synthesizing Theory and Practice June 5th, 2007

The Voyage of the Beagle

Galápagos archipelago (1835)

Darwin’s Finches

© Robert H. Rothman

Darwin’s Finches

10

8

Darwin’s Finches

Darwin’s Finches

8

96

2378164210

10

9 3 5 7 3 8 10 9 10 8

chance

OR

competitive pressures

?

Input:

Sample space: 0/1 tables satisfying the marginals

Goal: count / sample

2

3

21

3

32

5

3 4 2

4

Binary Contingency Tables

marginals(row sums r1, r2, …, rm, column sums c1, c2, …, cn)

Input:

Sample space: 0/1 tables satisfying the marginals

Goal: count / sample

Binary Contingency Tables

marginals(row sums r1, r2, …, rm, column sums c1, c2, …, cn)

2

3

21

3

32

5

3 4 2

41111

11

111

1111 1

1 11

Different Approaches

Theory (Markov chain Monte Carlo with simulated annealing)

• Jerrum-Sinclair-Vigoda ’01: approximate permanent in O*(n10), yields O*((mn)10) algorithm for m x n binary contingency tables

• Bezáková-Bhatnagar-Vigoda ’06: O*((mn)3(m+n)5)

Practice (sequential importance sampling, Chen-Diaconis-Holmes-Liu ’05)

• Bezáková-Sinclair-Štefankovič-Vigoda ’06: negative example

• Jose Blanchet ’06: SIS works if marginals O(n1/4)

• Bayati-Kim-Saberi ’07: alternative importance sampling method, works if marginals O(n1/4)

Practice (the switching Markov chain, Diaconis-Gangolli ’94)

• Kannan-Tetali-Vempala ’97, Cooper-Dyer-Greenhill ’05: works for regular marginals











Permanent: Broder Chain [Broder ’88]

What for: uniform sampling of perfect matchings

How: Markov chain on perfect + near-perfect matchings

Perfect matching:

Permanent:

subset of vertex-disjoint edges covering all vertices

number of all perfect matchings




Perfect matching:

Permanent:

subset of vertex-disjoint edges covering all vertices

number of all perfect matchings

At a perfect matching:

- remove a random edge
















At a near matching:

- choose a random vertex






At a near matching:


• if unmatched: match to the other hole






At a near matching:








At a near matching:








At a near matching:








At a near matching:





• if matched: slide adjacent edge




At a near matching:









At a near matching:









At a near matching:







Broder Chain

Mixes in polynomial time ? Even if it did…

Broder Chain


1 perfect matching

Broder Chain


1 perfect matching

near matchings≥2(n/4)

Broder Chain


1 perfect matching

near matchings≥2(n/4)

Thm [Jerrum-Sinclair ’89]:Rapid mixing if perfects polynomially related to nears.

Jerrum-Sinclair-Vigoda ’01:Change the weight so that perfect matchingstake polynomial fraction.

Simulated Annealing for Permanent

Jerrum-Sinclair-Vigoda ’01:

n2+1 regions, very different size

u,v

Change the weight so that perfect matchingstake polynomial fraction.

Originally:



n2+1 regions, very different size

u,v

the same


Want:



u,v

Ideal weights(for a matching with holes u,v):

(# perfects)

(# nears with holes u,v)

n2+1 regions, very different sizethe same



Want:

u,v


(# perfects)


A perfect matching sampled with prob. 1/(n2+1)

Computing ideal weights as hard as original problem ?

Good:

Bad:

Solution: Approximate




(# perfects)


Unordered state Ordered state

(EASY) (DIFFICULT)


G

G



(# perfects)


Kn,n G


(EASY) (DIFFICULT)


Kn,n



(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)




(# perfects)


Kn,n G


(EASY) (DIFFICULT)


G



(# perfects)


Kn,n G


(EASY) (DIFFICULT)

Solution: Approximate λλλλ = 1 … ~0

1


Ideal weights(# perfects)


λλλλ = 1 … ~0

1

need to be λλλλ-weighted:

w(u,v) =

λλλλ(M) = λλλλ# λλλλ-edges in M

λλλλ(S) = ∑∑∑∑ λλλλ(M)

where

λλλλ( P P P P )

λλλλ( N N N N (u,v) )

M in S


λλλλ = 1

1

λλλλ( P P P P )

λλλλ( N N N N (u,v) )w(u,v) =

Initially, λλλλ = 1. Thus w(u,v) = n!/(n-1)! = n.

Algorithm (sketch):

Later, have approx. of w(u,v). Run chain to improve the approx. Decrease λλλλ (until ~0).

(Improved approx. of old λ = starting approx. of new λ)

*

Thm [Jerrum-Sinclair-Vigoda ’01]: Weighted Broder chain mixes if w(u,v) approximated within a constant factor.


λλλλ = 0.7

1

λλλλ( P P P P )



Algorithm (sketch):



4-apx

*



λλλλ = 0.7

1

λλλλ( P P P P )




Algorithm (sketch):



4-apx2

*


λλλλ = 0.7 0.6

1

λλλλ( P P P P )



Algorithm (sketch):



4-apx2 = 4-apx for

*



λλλλ = 0.7 0.6 …~0

1

λλλλ( P P P P )



Algorithm (sketch):



4-apx2 = 4-apx for

*


0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

rows

BCT: Bipartite Graphs with Given Degrees

2

2

2

2

2 3 1 2

2 3 1 2

2 2 2 2

columns

rows

0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

rows


2

2

2

2

2 3 1 2

2 3 1 2

2 2 2 2

“Sliding” Markov Chain on perfect and near tables

Perfect: remove a random edge

Near: slide edges or match

0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

rows


2

2

2

2

2 3 1 2

2 3 1 2

2 2 2 2




2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




1

1

X

XX

X

1

1

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




1

1

X

XX

X

1

1

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




1

1

X

XX

X

1

1

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




1

1

X

XX

X

1

1

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




0

1

X

XX

X

1

0

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




0

1

X

XX

X

1

0

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




0

1

X

XX

X

1

0

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

XX

X

1

2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

XX

X

1

2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

XX

X

1

2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

XX

X

1

2

2 0 1 0 1

1 0 1 0

0 0 0 1

1 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

X

X

X

1

2

2 0 1 0 1

1 0 1 0

0 0 0 1

1 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




2

1

X

X

X

X

1

2

2 0 1 0 1

1 0 1 0

0 1 0 1

1 1 0 0

columns

rows


2

2

2

2

2 3 1 2

2 3 1

2 2 2 2




Simulated Annealing for BCT ?

22 3 1

2 2 2 2

Ideal weights

(# perfects)



(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2?

Bezáková-Bhatnagar-Vigoda ’06


22 3 1

2 2 2 2

Ideal weights

(# perfects)



(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2? on Kn,n



22 3 1

2 2 2 2

Ideal weights

(# perfects)



(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2on Kn,n

2 3 1 2

2 2 2 2on G*



22 3 1

2 2 2 2 where

w(u,v) = λλλλ( P P P P )


T in S

λλλλ(T) = λλλλ# λλλλ-edges in T

λλλλ(S) = ∑∑∑∑ λλλλ(T)

Recall that

2 3 1 2

2 2 2 2on Kn,n

2 3 1 2

2 2 2 2on G*

λλλλ = ~0

λλλλ

……………………… 1


22 3 1

2 2 2 2

λλλλ

The catch :

What, if for some u,v, there is no near-table which uses all real edges ? Then,

λλλλ( N N N N (u,v) ) = 0 for λλλλ = 0.

where



T in S

λλλλ(T) = λλλλ# λλλλ-edges in T

λλλλ(S) = ∑∑∑∑ λλλλ(T)

Recall that


22 3 1

2 2 2 2



Recall that

Thm [Bezáková-Bhatnagar-Vigoda ’06]:

There exists a graph G* with given degree sequence s.t. between any two vertices there exists an “alternating”path of length ≤≤≤≤ 5.

G*


22 3 1

2 2 2 2



Recall that



G*

Corollary: There exists a (u,v)-near-table similar to G*.


22 3 1

2 2 2 2



Recall that



a near-table T


22 3 1

2 2 2 2

λλλλ w(u,v) = λλλλ( P P P P )


Recall that



Corollary: λλλλ( N N N N (u,v) ) “easy” to compute for λλλλ = ~0.

G*


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n


22 3 1

2 2 2 2



Recall that




G* Kn,n




Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:

It is possible to sample/count bipartite graphs of given degree sequence (which are subgraphs of a given graph H) in time O*((nm)3(n+m)5).




Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:





Recall thatH Kn,n

22 3 1

2 2 2 2

Thm:












Importance Sampling for counting problems

x

with positive probability σσσσ(x)>0

Probability distribution σσσσon the points + ♦

♦

Random variable ηηηη(s) =1/σσσσ(s)

0

if s in the set

if s is ♦

Unbiased estimator

E[ηηηη] = ∑∑∑∑ σσσσ(x).1/σσσσ(x) = size of the set

2

3

21

3

32

5

3 4 2

4

a specific σσσσ

• fill table column-by-column

• assign each column ignoring other column sums

[Chen-Diaconis-Holmes-Liu ’05]

Sequential Importance Sampling for BCT

2

3

22

3

12

5

4 3 3

4

a specific σσσσ





2

3

22

3

12

5

4 3 3

4

a specific σσσσ





2

3

3

5

4

4

a specific σσσσ





2

3

3

5

4

4assign the column with probability proportional to

a specific σσσσ



∏∏∏∏ ri/(n-ri)

where product ranges over i: rows with assignment 1



1

2

2

5

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3

∏∏∏∏ ri/(n-ri)



1

2

2

5

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3

∏∏∏∏ ri/(n-ri)



0

1

2

4

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3 3

∏∏∏∏ ri/(n-ri)



4

assign the column with probability proportional to

a specific σσσσ



where product ranges over i: rows with assignment 1 3 3

∏∏∏∏ ri/(n-ri)

0

1

2

4

3



0

1

1

3

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3 3 2

∏∏∏∏ ri/(n-ri)



4


a specific σσσσ




∏∏∏∏ ri/(n-ri)

0

1

1

3

2



0

1

1

2

4


a specific σσσσ




∏∏∏∏ ri/(n-ri)



4


a specific σσσσ




∏∏∏∏ ri/(n-ri)

0

1

1

2

1



0

1

0

1

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3 3 22 2

∏∏∏∏ ri/(n-ri)



4


a specific σσσσ



where product ranges over i: rows with assignment 1 3 3 22 2

∏∏∏∏ ri/(n-ri)

0

1

0

1

1



0

1

0

0

4


a specific σσσσ



where product ranges over i: rows with assignment 1 3 3 22 2 1

∏∏∏∏ ri/(n-ri)



4


a specific σσσσ




∏∏∏∏ ri/(n-ri)

0

1

0

0

0




4


a specific σσσσ




∏∏∏∏ ri/(n-ri)

0

0

0

0

0



4


a specific σσσσ




∏∏∏∏ ri/(n-ri)

2

3

3

5

4


A Counterexample for SIS

1 1 1 11 1 γγγγm

1

ββββm

1

1

1

Thm [Bezáková-Sinclair-Štefankovič-Vigoda ‘06]:

For any ββββ ≠≠≠≠ γγγγ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).


1 1 1 11 1

1

ββββm

1

1

1


For any ββββ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).

Simpler example

1


1 1 1 11 1

1

ββββm

1

1

1Intuition

1

Random table:

- randomly choose ββββm ones




1 1 1 11 1

1

ββββm

1

1

1Intuition

1

Random table:





1 1 1 11 1

1

ββββm

1

1

1Intuition

1

Random table:





1 1 1 11 1

1

ββββm

1

1

1Intuition

1

Random table:


ααααm

Expect: αβαβαβαβm ones

SIS: asymptotically fewer




Intuition

Expect: αβαβαβαβm ones

SIS: asymptotically fewer

all tables

tables with ~αβαβαβαβm ones

tables seen by SIS whp




1 1 1 11 1 γγγγm

1

ββββm

1

1

1


For any ββββ ≠≠≠≠ γγγγ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).

Result holds for any order of rows/columns.

Alternating rows and columns?

50 100 150 200 250 300 350

201

202

203

204

205

206

SIS – Experimental Results

Bad example, m = 300, ββββ = 0.6, γγγγ = 0.7

log-scaleof SIS estimate

number SIS steps

correct

• Practical algorithm ?

• Detecting convergence of SIS

• SIS for larger marginals ?

• The Switching Markov chain of Diaconis-Gangolli ?

• General contingency tables

• Cell-bounded tables

• Counting non-bipartite graphs with a given degree sequence

Open Problems

Binary Contingency Tables in Theory and Practice - Rochester

Documents

Binary Contingency Tables in Theory and Practice - Rochester