-
Survey Propagation Revisited
Lukas Kroc Ashish Sabharwal Bart Selman
Department of Computer Science, Cornell University, Ithaca, NY
14853-7501, U.S.A.∗
{kroc,sabhar,selman}@cs.cornell.edu
Abstract
Survey propagation (SP) is an exciting newtechnique that has
been remarkably success-ful at solving very large hard
combinatorialproblems, such as determining the satisfia-bility of
Boolean formulas. In a promisingattempt at understanding the
success of SP,it was recently shown that SP can be viewedas a form
of belief propagation, computingmarginal probabilities over certain
objectscalled covers of a formula. This explana-tion was, however,
shortly dismissed by ex-periments suggesting that non-trivial
coverssimply do not exist for large formulas. Inthis paper, we show
that these experimentswere misleading: not only do covers exist
forlarge hard random formulas, SP is surpris-ingly accurate at
computing marginals overthese covers despite the existence of
manycycles in the formulas. This re-opens a po-tentially simpler
line of reasoning for under-standing SP, in contrast to some
alternativelines of explanation that have been proposedassuming
covers do not exist.
1 INTRODUCTION
Survey Propagation (SP) is a new exciting algorithmfor solving
hard combinatorial problems. It was dis-covered by Mezard, Parisi,
and Zecchina (2002), andis so far the only known method successful
at solvingrandom Boolean satisfiability (SAT) problems with
1million variables and beyond in near-linear time in thehardest
region. The SP method is quite radical in thatit tries to
approximate certain marginal probabilitiesrelated to the set of
satisfying assignments. It theniteratively assigns values to
variables with the most
∗Research supported by Intelligent Info. Systems Instt.(IISI),
Cornell Univ., AFOSR grant FA9550-04-1-0151.
extreme probabilities. In effect, the algorithm be-haves like
the usual backtrack search methods for SAT(DPLL-based), which also
assign variable values incre-mentally in an attempt to find a
satisfying assignment.However, quite surprisingly, SP almost never
has tobacktrack. In other words, the “heuristic guidance”from SP is
almost always correct. Note that, interest-ingly, computing
marginals on satisfying assignmentsis actually believed to be much
harder than findinga single satisfying assignment (#P-complete vs.
NP-complete). Nonetheless, SP is able to efficiently ap-proximate
certain marginals and uses this informationto successfully find a
satisfying assignment.
SP was derived from rather complex statistical physicsmethods,
specifically, the so-called cavity method de-veloped for the study
of spin glasses. Close connectionsto belief propagation (BP)
methods were subsequentlydiscovered. In particular, it was
discovered by Braun-stein and Zecchina (2004) (later extended by
Maneva,Mossel, and Wainwright (2005)) that SP equations
areequivalent to BP equations for obtaining marginalsover a special
class of combinatorial objects, calledcovers. Intuitively, a cover
provides a representativegeneralization of a cluster of satisfying
assignments.The discovery of a close connection between SP andBP
via the use of covers laid an exciting foundationfor explaining the
success of SP. Unfortunately, subse-quent experimental evidence
suggested that hard ran-dom 3-SAT formulas have, with high
probability, onlyone (trivial) cover (Maneva et al., 2005). This
wouldleave all variables effectively in an undecided state,
andwould mean that marginals on covers cannot provideany useful
information on how to set variables. SinceSP clearly sets variables
in a non-trivial manner, itwas conjectured that there must be
another explana-tion for the good behavior of SP; in particular,
onethat is not based on the use of marginal probabilitiesof
variables in the covers.
In this paper, we revisit the claim that hard random 3-SAT
formulas do not have interesting non-trivial cov-
-
ers. In fact, we show that such formulas have largenumbers of
non-trivial covers. The main contributionof the paper is the first
clear empirical evidence show-ing that in random 3-SAT problems
near the satisfi-ability and hardness threshold, (1) a significant
num-ber of non-trivial covers exist; (2) SP is remarkablygood at
computing variable marginals based on cov-ers; and (3) these cover
marginals closely relate to so-lution marginals at least in the
extreme values, whereit matters the most for survey inspired
decimation. Asa consequence, we strongly suspect that explaining
SPin terms of covers may be the correct path after all.
Note that (2) above is quite surprising for random3-SAT formulas
because such formulas have manyloops. The known formal proof that
SP computescover marginals only applies to tree-structured
formu-las, which in fact have only a single (trivial)
cover.Further, it’s amazing that while SP computes suchmarginals in
a fraction of a second, the next best meth-ods of computing these
marginals that we know of (viaexact enumeration, or sampling
followed by “peeling”)require over 100 CPU hours.
Our experiments also indicate that cover marginalsare more
“conservative” than solution marginals in thesense that variables
that are extreme with respect tocover marginals are almost
certainly also extreme withrespect to solution marginals, but not
vice versa. Thissheds light on why it is safe to set variables with
ex-treme cover marginals in an iterative manner, as isdone in the
survey inspired decimation process for find-ing a solution using
the marginals computed by SP.
In addition to these empirical results, we also revisitthe
derivation of the SP equations themselves, with thegoal of
presenting the derivation in an insightful formpurely within the
realm of combinatorial constraintsatisfaction problems (CSPs). We
describe how onecan reformulate in a natural step-by-step manner
theproblem of finding a satisfying assignment into one offinding a
cover, by considering related factor graphson larger state spaces.
The BP equations for this re-formulated problem are exactly the SP
equations forthe original problem, as shown in the Appendix.
2 COVERS OF CNF FORMULAS
We start by introducing the notation and the ba-sic concepts
that we use throughout the paper. Weare concerned with Boolean
formulas in ConjunctiveNormal Form or CNF, that is, formulas of the
formF ≡ (l11 ∨ . . . ∨ l1k1) ∧ . . . ∧ (lm1 ∨ . . . ∨ lmkm),
whereeach lik (called a literal) is a Boolean variable xj orits
negation ¬xj . Each conjunct of F , which itself isa disjunction of
literals, is called a clause. In 3-CNFor 3-SAT formulas, every
clause has 3 literals. Ran-
dom 3-SAT formulas over n variables are generated byuniformly
randomly choosing a pre-specified numberof clauses over these n
variables. The Boolean satis-fiability problem is the following:
Given a CNF for-mula F over n variables, find a truth assignment σ
forthe variables such that every clause in F evaluates totrue; σ is
called a satisfying assignment or a solutionof F . We identify true
with 1 and false with 0.
A truth assignment to n variables can be viewed as astring of
length n over the alphabet {0, 1}, and extend-ing this alphabet to
include a third letter “∗” leads toa generalized assignment. A
variable with the value ∗can be interpreted as being “undecided,”
while vari-ables with values 0 or 1 can be interpreted as
being“decided” on what they want to be. We will be inter-ested in
certain generalized assignments called covers.Our formal definition
of covers follows the one given byAchlioptas and Ricci-Tersenghi
(2006). Let variable xbe called a supported variable under a
generalized as-signment σ if there is a clause C such that x is
theonly variable that satisfies C and all other literals ofC are
false. Otherwise, x is called unsupported.
Definition 1. A generalized assignment σ ∈ {0, 1, ∗}n
is a cover of a CNF formula F iff
1. every clause of F has at least one satisfying literalor at
least two literals with value ∗ under σ, and
2. σ has no unsupported variables assigned 0 or 1.
The first condition ensures that each clause of F iseither
already satisfied by σ or has enough undecidedvariables to not
cause any undecided variable to beforced to decide on a value (no
“unit propagation”).The second condition says that each variable
that isassigned 0 or 1 is set that way for a reason: thereexists a
clause that relies on this setting in order tobe satisfied. For
example, consider the formula F ≡(x ∨ ¬y ∨ ¬z) ∧ (¬x ∨ y ∨ ¬z) ∧
(¬x ∨ ¬y ∨ z). F hasexactly two covers: 111 and ∗ ∗ ∗. This can be
verifiedby observing that whenever some variable is 0 or ∗,then all
non-∗ variables are unsupported. Notice thatthe string of all ∗’s
always satisfies the conditions inDefinition 1; we refer to this
string as the trivial cover.
Covers were introduced by Maneva et al. (2005) asa useful
concept to analyze the behavior of SP, buttheir combinatorial
properties are much less knownthan those of solutions. A cover can
be thought of asa partial assignment to variables, where the
variablesassigned ∗ are considered unspecified. In this sense,each
cover is a representative of a potentially large setof complete
truth assignments, satisfying as well as notsatisfying. This
motivates further differentiation:
Definition 2. A cover σ ∈ {0, 1, ∗}n of F is a truecover iff
there exists a satisfying assignment τ ∈{0, 1}n of F such that σ
and τ agree on all values where
-
σ is not a ∗, i.e., ∀i ∈ {1, . . . , n}(σi 6= ∗ =⇒ σi =
τi).Otherwise, σ is a false cover.
A true cover thus generalizes at least one satisfyingassignment.
True covers are interesting to study whentrying to satisfy a
formula, because if there exists atrue cover with variable x
assigned 0 or 1, then theremust also exist a satisfying assignment
with the samesetting of x.
One can construct a true cover σ ∈ {0, 1, ∗}n of F bystarting
with any satisfying assignment τ ∈ {0, 1}n ofF and generalizing it
using a simple procedure called∗-propagation.1 The procedure starts
by initiallysetting σ = τ . It then repeatedly chooses an
arbitraryvariable unsupported under σ and turns it into a ∗,until
there are no more unsupported variables. The re-sulting string σ is
a true cover, which can be verified asfollows. The satisfying
assignment τ already satisfiesthe first condition in Definition 1,
and ∗-propagationdoes not destroy this property. In particular, a
vari-able on which some clause relies is never turned into a∗. The
second condition in Definition 1 is also clearlysatisfied when
∗-propagation halts, so that σ must bea cover. Moreover, since σ
generalizes τ , it is a truecover. Note that ∗-propagation can, in
principle, beapplied to an arbitrary generalized assignment.
How-ever, unless we start with one that satisfies the
firstcondition in the cover definition, ∗-propagation maynot lead
to a cover.
We end with a discussion of two insightful propertiesof covers.
The first relates to “self-reducibility” andthe second to covers
for tree-structured formulas.
No self-reducibility. Consider the relation betweenthe decision
and search versions of the problem of find-ing a solution of a CNF
formula F . In the decision ver-sion, one needs an algorithm that
determines whetheror not F has a solution, while in the search
version,one needs an algorithm that explicitly finds a solu-tion.
The problem of finding a solution for F is self-reducible, i.e.,
given an oracle for the decision version,one can efficiently solve
the search version by itera-tively fixing variables to 1 or 0,
testing whether thereis still a solution, and continuing in this
way. Some-what surprisingly, this strategy does not work for
theproblem of finding a cover. In other words, an oraclefor the
decision version of this problem does not im-mediately provide an
efficient algorithm for finding acover. (The lack of
self-reducibility makes it very hardto find covers as we will see
below.) As a concreteexample, consider the formula F described
right afterDefinition 1. To construct a cover of F , we could
ask
1This was introduced under different names as the peel-ing
procedure or coarsening, e.g., by Maneva et al. (2005).
whether there exists a cover with x set to 1. Since111 is a
cover (yet unknown to us), the decision oraclewould say yes. We
could then fix x to 1, simplify theformula to (y∨¬z)∧(¬y∧z), and
ask whether there isa cover with y set to 0. This residual formula
indeedhas 00 as a cover, and the oracle would say yes. Withone more
query, we will end up with 100 as the valuesof x, y, z, which is in
fact not a cover of F .
Tree-structured formulas. For tree-structuredformulas without
unit clauses, i.e., formulas whose fac-tor graph does not have a
cycle, the only cover is thetrivial all-∗ cover. We argue this
using the connec-tion between covers and SP shown by Braunstein
andZecchina (2004), which says that when generalized as-signments
have a uniform prior, SP on a tree formula Fprovably computes
probability marginals of variablesbeing 0, 1, and ∗ in covers of F
. Moreover, it can beverified from the iterative equations for SP
that withno unit clauses, zero marginals for any variable being0 or
1, and full marginals for any variable being a ∗ isa fixed point of
SP. Since SP provably has exactly onefixed point on tree formulas,
it follows that the onlycover of such formulas is the trivial all-∗
cover.
3 PROBLEM REFORMULATION:
FROM SOLUTIONS TO COVERS
We now show that the concept of covers can be quitenaturally
arrived at when trying to find solutions ofa CNF formula, thus
motivating the study of coversfrom a purely generative perspective.
Starting witha CNF formula F , we describe how F is
transformedstep-by-step into the problem of finding covers of F
,motivating each step.
Although our discussion applies to any CNF formulaF , we will be
using the following example formula with3 variables and 4 clauses
to illustrate the steps:
(x ∨ y ∨ ¬z)︸ ︷︷ ︸
a
∧ (¬x ∨ y)︸ ︷︷ ︸
b
∧ (¬y ∨ z)︸ ︷︷ ︸
c
∧ (x ∨ ¬z)︸ ︷︷ ︸
d
Let N denote the number of variables, M the numberof clauses,
and L the number of literals of F .
Original problem. The problem is to find an as-signment in the
space {0, 1}
Nthat satisfies F . The fac-
tor graph for F has N variable nodes and M functionnodes,
corresponding directly to the variables x, y, . . .and clauses a,
b, . . . in F (see e.g. Kschischang et al.(2001)). The factor graph
for the example formulais depicted below. Here factors Fa, Fb, . .
. representpredicates ensuring that the corresponding clause hasat
least one satisfying literal.
-
FdFcFbFa
x y z
Variable occurrences. The first step in the trans-formation is
to start treating every variable occurrencexa, xb, ya, yb, . . . in
F as a separate unit that can be ei-ther 0 or 1. This allows for
more flexibility in the pro-cess of finding a solution, since a
variable can decidewhat value to assume in each clause separately.
Ofcourse, we need to add constraints to ensure that theoccurrence
values are eventually consistent: for everyvariable x in F , we add
a constraint Fx that all occur-rences of x have the same value. Now
the search spaceis {0, 1}L, and the corresponding factor graph
containsL variable nodes and M +N function nodes (the orig-inal
clause factors Fa, Fb, . . . and the new constraintsFx, Fy, . .
.).
Fx Fa Fb Fy Fc Fd Fz
zdzczaycybyaxdxbxa
At this point, we have not relaxed solutions to theoriginal
problem F : solutions to the modified problemcorrespond precisely
to the original solutions, becausevariable occurrences are forced
to be consistent. How-ever, we moved this consistency check from
the syn-tactic level (variables could not be inconsistent simplyby
the problem definition) to the semantic level (wehave special
constraints to guarantee consistency).
Relaxing assignments. The next step is to relaxthe problem by
allowing variable nodes to assume thespecial value “∗”. The
semantics of ∗ is “undecided,”meaning that the variable node is set
neither to 0nor to 1. The new search space is {0, 1, ∗}
L, and we
must specify how our constraints handle the value ∗.Variable
constraints Fx, . . . have the same meaning asbefore, namely, all
variable nodes xa, xb, . . . have thesame value for every variable
x. Clause constraintsFa, . . . now have a modified meaning: a
clause is sat-isfied if it contains at least one satisfying literal
or atleast two literals with the value ∗. The motivation hereis to
either satisfy a clause or leave enough “freedom”in the form of at
least two undecided variables. (Asingle undecided variable would be
forced to take on aparticular value if all other literals in the
clause werefalsified.) With this transformation, the factor
graphremains structurally the same, while the set of possible
values for variable nodes changes.
The solutions to this modified problem do not neces-sarily
correspond directly to solutions of the originalone. In particular,
if there are no unit clauses and allvariables are set to ∗, the
problem is already “solved”without providing any useful
information.
Reducing freedom of choice. To distinguish vari-ables that could
assume the value ∗ from those thattruly need to be fixed to either
0 or 1, we require thatevery non-∗ variable has a clause that needs
the vari-able to be 0 or 1 in order to be satisfied. The
searchspace does not change, but we need to add constraintsto
implement the reduction in the freedom of choice.
Notice that this requirement is equivalent to “no un-supported
variables” in the definition of a cover, andthat the first
requirement in that definition is ful-filled by the clause
constraints. Therefore, we are nowsearching for covers of F . A
natural way to representthe “no unsupported variable” constraint in
the fac-tor graph is to add for each variable x a new functionnode
F ′x, connected to the variable nodes for x as wellas for all other
variables sharing a clause with x. This,of course, creates many new
links and introduces ad-ditional short cycles, even if the original
factor graphwas acyclic. The following transformation step
allevi-ates this issue.
Reinterpreting variable nodes. As the final step,we change the
semantics of the variable nodes’ val-ues and of the constraints so
that the “no unsup-ported variable” condition can be enforced
without ad-ditional function nodes. The reasoning is that the
sim-ple {0, 1, ∗} domain creates a bottleneck for how
muchinformation can be communicated between nodes inthe factor
graph. By altering the semantics of thevariable nodes’ values, we
can improve on this.
The new value of a variable node xa will be a pair(ra→x, wx→a) ∈
{(0, 0), (0, 1), (1, 0)}, so that the sizeof the search space is
still 3L. We interpret the valuera→x as a request from clause a to
variable x with themeaning that a relies on x to satisfy it, and
the valuewx→a as a warning from variable x to clause a that x isset
such that it does not satisfy a. The values 1 and 0indicate
presence and absence, resp., of the request orwarning. We can
recover the original {0, 1, ∗} valuesfrom these new values as
follows: if ra→x = 1 for somea, then x is set to satisfy clause a;
if there is no requestfrom any clause where x appears, then x is
undecided(a value of ∗ in the previous interpretation). The
vari-able constraints Fx, . . . not only ensure consistency ofthe
values of xa, xb, . . . as before, but also ensure thesecond cover
condition as described below. The clauseconstraints Fa, . . .
remain unchanged.
-
The variable constraint Fx is a predicate ensuring thatthe
following two conditions are met:
1. if ra→x = 1 for any clause a where x appears,then wx→b = 0
for all clauses b where x appearswith the same sign as in a, and
wx→b = 1 for allb where x appears with the opposite sign. Since
xmust be set to satisfy a, this ensures that clausesthat are
unsatisfied by x do receive a warning.
2. if ra→x = 0 for all clauses a where x appears, thenwx→a = 0
for all of them, i.e., no clause receivesa warning from x.
To evaluate Fx, values (ra→x, wx→a) are needed onlyfor clauses a
in which x appears, which is exactly theset of variable nodes the
factor Fx is connected to. No-tice that the case (ra→x, wx→a) = (1,
1) cannot happendue to condition 1 above. The conditions also
implythat the variable occurrences of x are consistent, and
inparticular that two clauses where x appears with oppo-site signs
(say a and b) cannot simultaneously requestto be satisfied by x.
This is because either ra→x = 0or rb→x = 0 must hold due to
condition 1.
The clause constraint Fa is a predicate stating thatclause a
issues a request to its variable x if and only if itreceives
warnings from all its other variables: ra→x = 1iff wy→a = 1 for all
variables y 6= x in a. Again, Facan be evaluated using exactly
values from the variablenodes it is connected to.
When clause a issues a request to variable x (i.e.,ra→x = 1), x
must be set to satisfy a, thus providing asatisfying literal for a.
If a does not issue any request,then according to the condition of
Fa, at least two ofa’s variables, say x and y, must not have sent a
warn-ing. In this case, Fx and Fy state that each of x andy is
either undecided or satisfies a. Thus the first con-dition in the
cover definition holds in any solution ofthis new constraint
satisfaction problem. The secondcondition also holds, because every
variable x that isnot undecided must have received a request from
someclause a, so that x is the only literal in a that is notfalse.
Therefore x is supported.
Let us denote this final constraint satisfaction problemby P (F
). (It is a function of the original formula F .)Notice that the
factor graph of P (F ) has the sametopology as the factor graph of
F . In particular, ifF has a tree factor graph, so does P (F ).
Further, bythe construction of P (F ) described above, its
solutionscorrespond precisely to the covers of F .
3.1 INFERENCE OVER COVERS
This section discusses an approach for solving theproblem P (F )
with probabilistic inference using beliefpropagation (BP). It
arrives at the survey propagation
equations for F by applying BP equations to P (F ).
Since the factor graph of P (F ) can be easily viewedas a
Bayesian Network (cf. Pearl, 1988), one can com-pute marginal
probabilities over the set of satisfyingassignments of the problem,
defined as
Pr[xa = v | all constraints of P (F ) are satisfied]
for each variable node xa and v ∈ {(0, 0), (0, 1), (1, 0)}.The
probability space here is over all assignments tovariable nodes
with uniform prior.
Once these solution marginals are known, we knowwhich variables
are most likely to assume a particularvalue, and setting these
variables simplifies the prob-lem. A new set of marginals can be
computed on thissimplified formula, and the whole process
repeated.This method of searching for a satisfying assignmentis
called the decimation procedure. The problem,of course, is to
compute the marginals (which, in gen-eral, is much harder than
finding a satisfying assign-ment). One possibility for computing
marginals is touse the belief propagation algorithm (cf. Pearl,
1988).Although provably correct essentially only for formulaswith a
tree factor graph, BP provides a good approxi-mation of the true
marginals in many problem domainsin practice (Murphy et al., 1999).
Moreover, as shownby Maneva et al. (2005), applying the BP
algorithm tothe problem of searching for covers of F results in
theSP algorithm. Thus, on formulas with a tree factorgraph, the SP
algorithm provably computes marginalprobabilities over covers of F
, which are equivalent tomarginals over satisfying assignments of P
(F ). Whenthe formula contains loops, SP computes a loopy
ap-proximation to the cover marginals. Specific details ofthe
derivation of SP equations from the problem P (F )are deferred to
the Appendix.
4 EXPERIMENTAL RESULTS
This section presents our main contributions. We be-gin by
demonstrating that non-trivial covers do ex-ist in large numbers in
random 3-SAT formula, andthen explore connections between SP, BP,
and vari-able marginals computed from covers as well as so-lutions,
showing in particular that SP approximatescover marginals
surprisingly well.
4.1 EXISTENCE OF COVERS
Motivated by theoretical results connecting SP to cov-ers of
formulas, Maneva et al. (2005) suggested anexperimental study to
test whether non-trivial coverseven exist in random 3-SAT formulas.
They proposeda seemingly good way to do this (the “peeling
experi-ment”), namely, start with a uniformly random satisfy-
-
ing assignment of a formula F and, while it has unsup-ported
variables, ∗-propagate the assignment. Whenthe process terminates,
one obtains a (true) cover ofF . Unfortunately, what they observed
is that this pro-cess repeatedly hits the trivial all-∗ cover, from
whichthey concluded that non-trivial covers most likely donot exist
for such formulas. However, it is known thatnear-uniformly sampling
solutions of such formulas tostart with is a hard problem in itself
and that mostsampling methods obtain solutions in a highly
non-uniform manner (Wei et al., 2004). Consequently, onemust be
careful in drawing conclusions from relativelyfew and possibly
biased samples.
0 1000 2000 3000 4000 5000
020
040
060
080
010
00
Number of stars
Num
ber o
f uns
uppo
rted
varia
bles Solutions leading to the trivial cover
Solutions leading to non−trivial covers
Figure 1: The peeling experiment, showing the evolu-tion of the
number of stars as ∗-propagation proceeds.
To understand this issue better, we ran the same peel-ing
experiment on a 5000 variable random 3-SAT for-mula at
clause-to-variable ratio 4.2 (which is close tothe hardness
threshold for random 3-SAT problems),but used SampleSat (Wei et
al., 2004) to obtain sam-ples, which is expected to produce fairly
uniform sam-ples. Figure 1 shows the evolution of the number
ofunsupported variables at each stage as ∗-propagationis performed
starting from a solution. Here, thex-axis shows the number of
stars, which monotoni-cally increases by ∗-propagation. The y-axis
showsthe number of unsupported variables present at eachstage. As
one moves from left to right following the∗-propagation process,
one hits a cover if the num-ber of unsupported variables drops to
zero (so that∗-propagation terminates). The two curves in the
plotcorrespond to solutions that ∗-propagated to the triv-ial cover
and those that did not. In our experiment,out of 500 satisfying
assignments used, nearly 74% ledto the trivial cover; their average
is represented by thetop curve. The remaining 26% of the sampled
solu-tions actually led to non-trivial covers; their averageis
represented by the bottom curve. Thus, when so-lutions are sampled
near-uniformly, a substantial frac-tion of them lead to non-trivial
covers.2
2 That this was not observed by Maneva et al. (2005)can be
attributed to the fact that SP was used to findsatisfying
assignments (Mossel, 2007), resulting in highlynon-uniform
samples.
2.0 2.5 3.0 3.5 4.0 4.5
0.0
0.2
0.4
0.6
0.8
1.0
Clause−to−variable ratio
P[no
n−tri
vial
cov
er]
90 vars70 vars50 vars
2.0 2.5 3.0 3.5 4.0 4.5
020
4060
8010
0
Clause−to−variable ratio
Num
ber o
f non
−triv
ial c
over
s
90 vars70 vars50 vars
Figure 2: Non-trivial covers in random formulas. Left:existence
probability. Right: average number.
An alternative method of finding covers is to create anew
Boolean formula G whose solutions correspond gothe covers of F . It
turned out to be extremely hardto solve G to find any non-trivial
cover using state-of-the-art SAT solvers for number of variables as
low as150. So we confined our experiments to small formu-las, with
50, 70 and 90 variables. We found all coversfor such formulas with
varying clause-to-variable ra-tios α. The results are shown in
Figure 2, where eachdata point corresponds to statistics obtained
from 500formulas. The left pane shows the probability that arandom
formula, for a given clause-to-variable ratio,has at least one
non-trivial cover (either true or false).The figure shows a nice
phase transition where cov-ers appear, at around α = 2.5, which is
surprisinglysharp given the small formula sizes. Also, the
regionwhere covers surely exist is widening on both sidesas the
number of variables increases, supporting theclaim that non-trivial
covers exist even in large for-mulas. The right pane of Figure 2
shows the actualnumber of non-trivial covers, with a clear trend
thatthe number increases with the size of the formula, forall
values of the clause-to-variable ratio. It is worthnoting that the
number of covers is very small com-pared to the number of
satisfying assignments; e.g.for 90 variables and α = 4.2, the
expected number ofsatisfying assignments is 150, 000, while there
are only8 covers on average. Somewhat surprisingly, the num-ber of
false covers is almost negligible, around 2 at thepeak, and does
not seem to be growing nearly as fastas the total number of covers.
This might explain whySP, although approximating marginals over all
covers,is successful in finding satisfying assignments.
We also consider how the number of solutions that leadto
non-trivial covers changes for larger formulas, asthe number of
variables N increases from 200 to 4000.The left pane of Figure 3
shows that an estimate ofthis number, in fact, grows exponentially
with N . Foreach N , the estimate is obtained by averaging over200
formulas at ratio 4.2 the following quantity: thefraction p(N) of
20,000 sampled solutions that lead toa non-trivial cover, scaled up
by the expected number
-
of solutions for N -variable formulas at this ratio, whichis (2
× (7/8)4.2)N ≈ 1.1414N .3 The resulting number,p(N) × 1.1414N , is
plotted on the y-axis of the leftpane, with N on the x-axis. The
right pane of Figure 3shows the data used to estimate p(N) along
with itsfit on the y-axis, with N on the x-axis again.
200 500 1000 2000
1e+1
21e
+56
1e+1
44
Number of Vars. (log scale)
E[#s
ols w
/ non
tr. c
over
](lo
g sc
ale)
1000 2000 3000 4000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Number of Variables
P[no
n tri
vial
cov
er]
Figure 3: Left: Expected number of solutions leadingto
non-trivial covers (log-log scale). Right: Probabilityof a solution
leading to a non-trivial cover.
Notice that the left pane is in log-scale for both axes,and
clearly increases faster than a linear function.Thus, the estimated
number of solutions that lead tonon-trivial covers grows
super-polynomially. In fact,performing a best fit for this curve
suggests that thisnumber grows exponentially, roughly as 1.1407N .
Thisnumber is indeed a vanishingly small fraction of the ex-pected
number of solutions (1.1414N ) as observed byManeva et al. (2005),
but nonetheless exponentiallyincreasing. The existence of covers
for random 3-SATalso aligns with what Achlioptas and
Ricci-Tersenghi(2006) recently proved for k-SAT with k ≥ 9.
4.2 SP, BP, AND MARGINALS
We now study the behavior of SP and BP on a ran-dom formula in
relation to solutions and covers ofthat formula. While theoretical
work has shown thatSP, viewed as BP on a related combinatorial
problem,provably computes cover marginals on
tree-structuredformulas, we demonstrate that even on random
3-SATinstances, which are far from tree-like, SP approxi-mates
cover marginals surprisingly well. We also showthat cover
marginals, especially in the extreme range,are closely related to
solution marginals in an intrigu-ing “conservative” fashion. The
combination of thesetwo effects, we believe, plays a crucial role
in the suc-
cess of SP. Our experiments also reveal that BP per-forms poorly
at computing any marginals of interest.
Given marginal probabilities, we define the magneti-zation of a
variable to be the difference between the
3 The version of the paper published in UAI-07 incor-rectly
stated, as pointed out by Lenka Zdeborova, that thenumber of
solutions of such formulas is known to be highlyconcentrated around
its expectation.
marginals of the variable being positive and it beingnegative.
For the rest of our experiments, we startwith a random 3-SAT
formula F with 5000 variablesand 21000 clauses (clause-to-variable
ratio of 4.2), andplot the magnetization of the variables of F in
therange [−1,+1].4 The marginals for magnetization areobtained from
four different sources, which are com-pared and contrasted against
each other: (1) by run-ning SP on F till the iterations converge;
(2) by run-ning BP on F but terminating it after 10,000 itera-tions
because the equations do not converge; (3) bysampling solutions of
F using SampleSat and comput-ing an estimate of the positive and
negative marginalsfrom the sampled solutions (the solution
marginals);and (4) by sampling solutions of F using
SampleSat,∗-propagating them to covers, and computing an esti-mate
of the positive and negative marginals from thesecovers (the cover
marginals). Note that in (4), we aresampling true covers and
obtaining an estimate. Analternative approach is to use SP itself
on F to try tosample covers of F , but the issue here is that the
prob-lem of finding (non-trivial) covers is not self-reducibleto
the decision problem of whether covers exist, asshown in Section 2.
Therefore, it is not clear whetherSP can be used to actually find a
cover, despite it ap-proximating the cover marginals very well.
Recall that the SP-based decimation process works byidentifying
variables with extreme magnetization, fix-ing them, and iterating.
We will therefore be inter-ested mostly in what happens in the
extreme magne-tization regions in these plots, namely, the lower
leftcorner (−1,−1) and the upper right corner (+1,+1).
In the left pane of Figure 4 we plot the magnetizationcomputed
by SP on the x-axis and the magnetizationobtained from cover
marginals on the y-axis. The scat-ter plot has exactly 5000 data
points, with one pointfor each variable of the formula F . If the
magneti-zations on the two axes matched perfectly, all pointswould
fall on a single diagonal line from the bottom-left corner to the
top-right corner. The plot shows thatSP is highly accurate at
computing cover marginals, es-
pecially in the extreme regions at the bottom-left
andtop-right.
The middle pane of Figure 4 compares the magneti-zation based on
cover marginals with the magnetiza-tion based on solutions
marginals. This will providean intuition for why it might be better
to follow covermarginals rather than solution marginals when
lookingfor a satisfying assignment.5 We see an interesting “s-
4 For clarity, the plots show magnetizations for one
suchformula, although the trend is generic.
5 Of course, if solution marginals could be computedperfectly,
this would not be an issue. In practice, how-ever, the best we can
hope is to approximately estimate
-
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
SP Magnetization
Cove
r Mag
netiz
atio
n
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
Cover Magnetization
Solu
tion
Mag
netiz
atio
n
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
BP Magnetization
Solu
tion
Mag
netiz
atio
n
Figure 4: Magnetization plots. Left: SP vs. covers. Middle:
covers vs. solutions. Right: BP vs. solutions.
shape” in this plot, which can be interpreted as follows:fixing
variables with extreme cover magnetizations ismore conservative
compared to fixing variables withextreme solution magnetizations.
Which means thatvariables that are extreme w.r.t. cover-based
magne-tization are also extreme w.r.t. solution-based
magne-tization (but not necessarily vice-versa). Recall thatthe
extreme region is exactly where decimation-basedalgorithms, that
often fix a small set of extreme vari-ables per iteration, need to
be correct. Thus, etimatesof cover marginals provide a safer
heuristic for fixingvariables than estimates of solution
marginals.
As a comparison with BP, the right pane of Figure 4shows BP
magnetization vs. magnetization based onsolution marginals for the
same 5000 variable, 21000clause formula. Since BP almost never
converges onsuch formulas, we terminated BP after 10,000
itera-tions (SP took roughly 50 iterations to converge) andused the
partially converged marginals obtained so farfor computing
magnetization. The plot shows that BPprovides very poor estimates
for the magnetizationsbased on solution marginals. (The points are
equallyscattered when BP magnetization is plotted againstcover
magnetization.) In fact, BP appears to identifyas extreme many
variables that have the opposite so-lution magnetization. Thus,
when magnetization ob-tained from BP is used as a heuristic for
identifyingvariables to fix, mistakes are often made that
eventu-ally lead to a contradiction, i.e. unsatisfiable
reducedformula.
5 DISCUSSION
A comparison between left and right panes of Figure 4suggests
that approximating statistics over covers (asdone by SP) is much
more accurate than approximat-ing statistics over solutions (as
done by BP). Thisappears to be because covers are much more
coarse
marginals.
grained than solutions; indeed, even an exponentiallylarge
cluster of solutions will have only a single coveras its
representative. This cover still captures criticalproperties of the
cluster necessary for finding solutions,such as backbone variables,
which is what SP appearsto exploit.
We also saw that the extreme magnetization based oncover
marginals is more conservative than that basedon solution marginals
(as seen in the “s-shape” ofthe plot in the middle pane of Figure
4). This sug-gests that while SP, based on approximating
covermarginals, may miss some variables with extreme
mag-netization, when it does find a variable to have ex-treme
magnetization, it is quite likely to be correct.This provides an
intuitive explanation of why the dec-imation process based on
extreme SP magnetizationsucceeds with high probability on random
3-SAT prob-lems without having to backtrack, while the decima-tion
process based on BP magnetizations more oftenfails to find a
satisfying assignment in practice.
We also note that BP and SP have been proven to com-pute exact
marginals on solutions and covers, respec-tively, only for
tree-structured formulas (with somesimple exceptional cases like
formulas with a single cy-cle). For BP, solution marginals on tree
formulas arealready non-trivial, and it is reasonable to expect it
tocompute a fair approximation of marginals on loopynetworks
(formulas). However, for SP, cover marginalson tree formulas are
trivial: the only cover here is theall-∗ cover. Cover marginals
become interesting onlywhen one goes to loopy formulas, such as
random 3-SAT. In this case, as seen in our experiments, it is
re-markable that the SP computes a good approximation
of non-trivial cover marginals for non-tree formulas.
We hope that our results have convincingly demon-strated that
the study of the covers of formulas is veryfruitful and may well
lead to a correct explanation ofthe success of SP.
-
References
D. Achlioptas and F. Ricci-Tersenghi. On the solution-space
geometry of random constraint satisfaction prob-lems. In 38th STOC,
pages 130–139, Seattle, WA, 2006.
A. Braunstein and R. Zecchina. Survey propagation as lo-cal
equilibrium equations. J. Stat. Mech., P06007, 2004.URL
http://lanl.arXiv.org/cond-mat/0312483.
A. Braunstein, M. Mezard, and R. Zecchina. Survey prop-agation:
an algorithm for satisfiability. Random Struc-tures and Algorithms,
27:201, 2005.
F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Fac-tor
graphs and the sum-product algorithm. InformationTheory, IEEE
Transactions on, 47(2):498–519, 2001.
E. N. Maneva, E. Mossel, and M. J. Wainwright. A newlook at
survey propagation and its generalizations. In16th SODA, pages
1089–1098, Vancouver, Canada, 2005.
M. Mezard, G. Parisi, and R. Zecchina. Analytic and Al-gorithmic
Solution of Random Satisfiability Problems.Science,
297(5582):812–815, 2002. doi: 10.1126/science.1073287.
E. Mossel. Personal communication, April 2007.
K. Murphy, Y. Weiss, and M. Jordan. Loopy belief prop-agation
for approximate inference: An empirical study.In 15th UAI, pages
467–475, Sweden, July 1999.
R. E. Neapolitan. Learning Bayesian Networks. Prentice-Hall,
Inc., Upper Saddle River, NJ, USA, 2004. ISBN0130125342.
J. Pearl. Probabilistic Reasoning in Intelligent
Systems:Networks of Plausible Inference. Morgan Kauf., 1988.
R Development Core Team. R: A language and envi-ronment for
statistical computing. R Foundation forStatistical Computing,
Vienna, Austria, 2005. URLhttp://www.R-project.org. ISBN
3-900051-07-0.
W. Wei, J. Erenrich, and B. Selman. Towards efficientsampling:
Exploiting random walk strategies. In 19thAAAI, pages 670–676, San
Jose, CA, July 2004.
APPENDIX: DERIVATION OF THESP EQUATIONS
Section 3 shows how to formulate a constraint satisfac-tion
problem P (F ) such that its solutions are exactlythe covers of a
formula F . Here we proceed to showhow the belief propagation
formalism applied to P (F )(as described in Section 3.1) results in
the survey prop-agation equations.
Review of BP. We assume familiarity with thegeneral form of BP
equations, as used for exampleby Neapolitan (2004) in Theorem 3.2.
In short, BPuses messages to communicate information betweennodes
of the factor graph (between variable nodesxa, . . . and function
nodes Fx, Fa, . . .). Each messageis a function of one argument,
which takes on thesame values as the variable node end-point of the
mes-sage. There are two kinds of messages: from vari-able nodes to
function nodes (denoted by πx→F (.)),and from function nodes to
variable nodes (denotedby λF→x(.)). In a two-level Bayesian
Network, π
messages are computed by (piecewise) multiplying to-gether the λ
messages received on all other links.The λ messages are more
complicated: they are sumsacross all possible worlds (values for
arguments of re-ceived π messages) of products of all-but-one π
mes-sages with the chosen arguments. In case of a deter-ministic
system (which is our case: every world hasprobability of either 1
or 0), this is equivalent to sumof products of π messages with
arguments that arecompatible with each other as judged by the
corre-sponding function node, Fx or Fa. Moreover, if a vari-able
node only has two neighboring function nodes,then it merely passes
received λ messages from oneneighbor to the other. Since all
variable nodes inP (F ) have degree two, we can safely ignore the
ex-istence of π messages and only focus on λ messages.Thus, every
Fx node receives messages from Fa nodes(which we will denote by
λa→x) and and every Fanode receives messages from Fx nodes (denoted
byλx→a), both of which are functions of one argument,(ra→x, wx→a) ∈
{(0, 0), (0, 1), (1, 0)}. The BP equa-tions are constructed by
considering the set of com-patible variable node values given the
one fixed valuein the argument.
Let C(x) be the set of all clauses containing variablex, and V
(a) the set of all variables appearing in clausea. Further, let
Csa(x) be the set of all clauses otherthan a where x occurs with
the same sign as in a.Similarly define Cua (x) to be the set of
clauses wherex occurs with the opposite sign as in a. Note
thatCsa(x) ∪ C
ua (x) ∪ {a} = C(x).
Equations for Fx. The equations for messages sentfrom a factor
node Fx are given in Figure 5. For theargument value of (1, 0) the
set of compatible values,as judged by Fx when xa = (1, 0), is one
where therecan be requests from clauses where x appears with
thesame sign as in a (and no warnings sent to them), butthere must
be no requests from opposite clauses (andwarning must be sent).
Similarly for (0, 1), but herethe roles of Csa(x) and C
ua (x) are exchanged, plus the
fact that x sends a warning to a means that it must bereceiving
a request from some opposite clause (which isaccounted for by the
“−” term). Finally, for the value(0, 0), there are two
possibilities: either at least onerequest is received from Csa(x)
(the first term in thesum, analogous to the expression for (0, 1)),
or thereare no requests at all (the second term in the sum).
Equations for Fa. Figure 6 shows equations formessages sent from
a factor node Fa. The argumentvalue of (1, 0) is the easiest: a
sends out a request ifand only if all other variables send it a
warning, so thatthe only compatible values are all (0, 1). The case
of(0, 0) is the complement: a cannot receive a warning
http://lanl.arXiv.org/cond-mat/0312483http://www.R-project.org
-
λx→a(1, 0) =Y
b∈Csa(x)
(λb→x(0, 0) + λb→x(1, 0))Y
b∈Cua
(x)
λb→x(0, 1)
λx→a(0, 1) =Y
b∈Csa(x)
λb→x(0, 1)
2
4
Y
b∈Cua
(x)
(λb→x(0, 0) + λb→x(1, 0)) −Y
b∈Cua
(x)
λb→x(0, 0)
3
5
λx→a(0, 0) =
2
4
Y
b∈Csa(x)
(λb→x(0, 0) + λb→x(1, 0)) −Y
b∈Csa(x)
λb→x(0, 0)
3
5
Y
b∈Cua
(x)
λb→x(0, 1) +Y
b∈C(x)\a
λb→x(0, 0)
Figure 5: BP equations for Fx
λa→x(1, 0) =Y
y∈V (a)\x
λy→a(0, 1)
λa→x(0, 0) =Y
y∈V (a)\x
(λy→a(0, 0) + λy→a(0, 1)) −Y
y∈V (a)\x
λy→a(0, 1)
λa→x(0, 1) =Y
y∈V (a)\x
(λy→a(0, 0) + λy→a(0, 1)) −Y
y∈V (a)\x
λy→a(0, 1)
+X
y∈V (a)\x
(λy→a(1, 0) − λy→a(0, 0))Y
y′∈V (a)\{x,y}
λy′→a(0, 1)
Figure 6: BP equations for Fa
from all other variables, and since x does not send awarning, a
does not send a request anywhere. The lastcase, (0, 1), is a little
more complicated. The first partis the same as before, but there
needs to be a correc-tion term to account for two extra
possibilities: firstit is now possible that a issues a request to
some yif all other variables also send a warning (the positiveterm
in the sum), and second it is not possible thatall-but-one variable
send a a warning and yet a doesnot issue a request to the last one
(the negative termin the sum).
Deriving the SP equations. The expressionfor λa→x(0, 1) can be
simplified by assuming thatλy→a(1, 0) = λy→a(0, 0) for all y, in
which case itreduces to the expression for λa→x(0, 0). This
as-sumption is crucial, but not very restrictive. Noticethat it
then follows that λx→a(1, 0) = λx→a(0, 0) (byinspecting the
appropriate expressions in Figure 5),and therefore the assumption
keeps holding when iter-atively solving the BP equations, provided
it was trueat the beginning.
The last step in the derivation is to rename and nor-malize the
terms appropriately so as to “recognize” theSP equations in Figures
5 and 6. Let us define
ηa→x4=
Y
y∈V (a)\x
λy→a(0, 1)
λy→a(0, 0) + λy→a(0, 1)
and
Πux→a4= λx→a(0, 1)
Π0x→a4=
Y
b∈C(x)\a
λb→x(0, 0)
Πsx→a4= λx→a(0, 0) − Π
0x→a
Notice that ηa→x is just a rescaled λa→x(1, 0), andthat the
scaling factor λy→a(0, 0) + λy→a(0, 1) equalsΠuy→a + Π
sy→a + Π
0
y→a. Rescaling λa→x(0, 0) andλa→x(0, 1) in the same way (and
using the assumptionthat they are equal) yields λa→x(0, 0) =
λa→x(0, 1) =1 − ηa→x. Finally, writing down the BP equations
forλx→a(0, 1) and λx→a(0, 0) in terms of these new vari-ables
results in the familiar SP equations establishedin Braunstein et
al. (2005):
ηa→x =Y
y∈V (a)\x
Πuy→aΠuy→a + Πsy→a + Π0y→a
Πux→a =Y
b∈Csa(x)
(1 − ηb→x)
2
41 −Y
b∈Cua
(x)
(1 − ηb→x)
3
5
Πsx→a =Y
b∈Cua
(x)
(1 − ηb→x)
2
41 −Y
b∈Csa(x)
(1 − ηb→x)
3
5
Π0x→a =Y
b∈C(x)\a
(1 − ηb→x)
In addition, the expressions for marginal probabilitiescomputed
by BP from a fixed point of the above equa-tions can be shown, in a
similar way, to be equivalentto the SP “bias” expressions.
INTRODUCTIONCOVERS OF CNF FORMULASPROBLEM REFORMULATION: FROM
SOLUTIONS TO COVERSINFERENCE OVER COVERS
EXPERIMENTAL RESULTSEXISTENCE OF COVERSSP, BP, AND MARGINALS
DISCUSSION