Exploring the border between P and NP Uriel Feige Weizmann Institute 1.

1

Exploring the border between P and NP

Uriel FeigeWeizmann Institute

2

Computational Intractability

• Convention: tractable = polynomial time.• When a computational problem appears to be

intractable, can we pinpoint what makes it intractable?

3

Distant History

• Many natural problems had no known polynomial time algorithm: traveling salesperson TSP, satisfiability of Boolean formulas SAT, maximum independent set MIS, etc.

• At that time: no unifying theory explaining why.

4

NP-hardness [Cook, Levin, Karp]

• 3SAT is hard because it encodes every nondeterministic computation (NP).

• 3SAT is polynomial time reducible to many other problems (TSP, MIS, …).

• Same obstacle for multiple problems: they all encode 3-SAT.

5

What makes a problem intractable?

Almost a dichotomy: Either a problem is in P, or it encodes 3SAT.A computational problem is intractable if (and “only if”) it encodes 3SAT.

6

What next?

• What about problems not known to be NP-hard (factoring, graph isomorphism, Nash equilibrium)? Not addressed in this talk.

• Look more closely at 3SAT, trying to understand what exactly is computationally difficult about it.

7

Approximation

• Max 3SAT – given a 3CNF formula, satisfy as many clauses as possible.

• Every 3CNF has an assignment that satisfies a 7/8 fraction of the clauses.

• Given a satisfiable 3CNF, on the road to advancing from 7/8 to 1, where do we get stuck?

8

Hardness of approximation

• Approximating 3SAT within a ratio better than 7/8 is NP-hard [Hastad].

• Given a satisfiable 3CNF, on the road to advancing from 7/8 to 1, we get stuck already at 7/8 – we do not know how to get started. If we could start, we could also finish.

9

What makes 3SAT intractable?

3SAT is intractable because we do not have a handle on how to start looking for a satisfying assignment. We cannot even find an assignment better than random (with respect to fraction of clauses satisfied).

10

What more can we hope to know?

11

The 7/8 boundary

• NP-hardness concerns worst case instances. How do 3CNF formulas that are hard to approximate look like? Does the worst case only manifest itself in rare cases?

12

Random 3CNF

• n variables, m clauses chosen independently at random. Density d = m/n.

• If m is large (say, m = n log n) , every assignment satisfies roughly a 7/8 fraction of the clauses.

• A better than 7/8 approximation algorithm would at least be able to tell that such formulas are not satisfiable.

13

Framework for study• Randomized algorithms that for every 3CNF

formula correctly decide satisfiability.On some inputs, likely to be exponential.

• Desirable goal: polynomial on most inputs within a certain range of densities.

High density – refutation heuristics.• How do we know when to stop?

How does a witness for non-satisfiability look like?

14

Refutation of random 3SAT - density

Approach of [Feige, Kim and Ofek 2006].

n

15

Algebra of refutation

Given an assignment, denote:n0 – number of clauses with no satisfied literaln1 – number of clauses with one satisfied literaln2 – number of clauses with two satisfied literalsn3 – number of clauses with three satisfied literalsn0 + n1 + n2 + n3 = mIf the assignment is satisfying, then n0 = 0.

16

The 3LIN principle

Given a random 3CNF, can certify in polynomial time that if there is a satisfying assignment, it must satisfy the inequality: Replace 3CNF by system of linear equations. clause replaced by x1+x2+x3=1 modulo 2Only clauses are not satisfied as 3LIN.

)(31 dnOmnn

)( dnO

)( 321 xxx

17

Certifying that there is no good 3LIN assignment

• Refuting 3LIN is easy - Gaussian elimination.• In random 3CNF, there are small subformulas

of size that are not satisfiable as 3LIN. Moreover, there are roughly disjoint subformulas, each not 3LIN-satisfiable.

• Refute each of them using Gaussian elimination. Implies .

Refutes 3SAT when , or .

)/( 2dnO3d

dnd 3 5/2nd

32 dn

18

Factor graphs

(X1 v X2 v X3), (¬X1 v X3 v X4), (X2 v ¬X3 v X4)

19

More on the 3LIN partition• Consider the bipartite factor graph of formula.• A set S of left hand side vertices (clauses) is an

even cover if each right hand side vertex (variable) has even degree into S (every variable appears in an even number of clauses of S, possibly 0).

• If the clauses of S have an odd number of negated literals (happens w.p. ½) they are not satisfiable as 3LIN (add up all clauses mod 2).

20

Remarks on running time

When one can find the even covers in polynomial time (because each one is of constant size). The whole refutation algorithm is polynomial.At lower densities (with ), we do not know how to find the even covers in polynomial time. Still interesting that at these densities witnesses for 3SAT non-satisfiability exist.

nd

5/2nd

21

Is dense random 3SAT easy to refute?

Current obstacle: given a random bipartite graph (with left hand side degree 3), find a small even cover.

This a computational problem regarding the structure of the factor graph – does not refer to Boolean assignments to variables.

22

A general phenomena?

Could it be that algorithms for 3SAT spend most of their time making sense of the factor graph, and then once they succeed, assigning truth values to the variables (or refuting the formula) becomes easy (or easier)?Anecdotal evidence: some common heuristics try to first decompose the factor graph, or to find a favorable order on the variables.

23

Preprocessing

Reveal the 3SAT instances in two stages.First, reveal only the factor graph.Allow the algorithm to preprocess the factor graph for arbitrary time and record arbitrary polynomial size advice (e.g., an optimal tree decomposition).Then reveal the polarities.Now algorithm may use the advice, and needs to decide satisfiability in polynomial time.

24

Informal interpretation

If preprocessing helps, then difficulty of 3SAT is due to the complexity of analyzing factor graph.If preprocessing does not help, then difficulty of 3SAT is due to the combinatorial richness of the polarities.Intermediate possibilities: after preprocessing, second stage gives good approximation, or takes sub-exponential time. Difficulty of 3SAT is partly due to the complexity of analyzing factor graph.

25

Research challenge

Find at least some aspect in which preprocessing the factor graph would help.Essentially nothing is known (to us).

Alternatively, provide evidence that preprocessing cannot help.What would this evidence look like?

26

Universal Factor Graphs

A factor graph is polytime-universal for 3SAT if a polytime algorithm for 3SAT instances with this factor graph (and arbitrary polarities of variables) implies a polytime algorithm for all factor graphs.

Preprocessing would not help on a universal factor graph, unless NP is in P/poly. (Proof: give the advice without doing the preprocessing.)

27

Some technicalities

Family of universal factor graphs, parameterized by n and m.

The universal graph for given (n,m) may have somewhat more than n variables and somewhat more than m clauses (but polynomially related to m and n).

28

Proving that a factor graph is universal

A sufficient condition for a factor graph G with N variables and M clauses to be polytime-universal for (n,m): Design a polytime algorithm that reduces any arbitrary 3CNF instance with n variables and m clauses to a 3CNF instance with factor graph G, while preserving satisfiability.

29

Some results (with Shlomo Jozeph)

Theorem: there are subexp-universal factor graphs for 3SAT.Theorem: there are 77/80-universal factor graphs for max-3SAT.

30

77/80-universal factor graphs

Our proof has three parts:

• NP-universal factor graphs (easy).• APX-universal factor graphs (via gap

amplification [Dinur]).• Amplification to 77/80-universal factor graphs

(via long code tests [Bellare, Goldreich, Sudan]).

31

The technical content of second part

Given two 3CNF formulas with the same factor graph G (but different variable polarities), a modified version of Dinur’s gap amplification technique (proof of the PCP theorem) produces two new CNF formulas with the same factor graph G’.

32

77/80-universal factor graphs

Tight hardness of approximation results are based on the long code of [Bellare, Goldreich and Sudan].They use the idea that only properties of the long code need to be tested explicitly. The requirement that the underlying predicate (e.g., a collection of 3CNF clauses) is satisfied is implicitly enforced on the long code via a mechanisms called folding [in BGS], and extended to conditioning [in Hastad].

33

A problem

Folding and conditioning depend on the polarities of variables in the underlying clauses. Changing the polarities changes the locations in which the long code is queried. As a consequence, the resulting factor graph changes.

34

So is folding hopeless?

Not quite. We identify variants of folding that can be performed while still maintaining the pattern of queries despite change of polarities. We call these oblivious foldings.

35

77/80-universal factor graphs - proof

By modification of the proof of [Bellare, Goldreich and Sudan], which also had the same ratio. The difference is that the folding in [BGS] was not oblivious, and we show how to replace it by oblivious folding.

In Hastad’s tight proof, conditioning rather than folding is used, and we do not know how to make it oblivious.

36

Open question

• Must (exact or approximate) algorithms for 3SAT spend most of their time processing the factor graph?

Can preprocessing of the factor graph lead to a substantial saving in the running time?

37

What makes 3SAT intractable?

3SAT is intractable because we do not have a handle on how to start looking for a satisfying assignment. We cannot even find an assignment better than random (with respect to fraction of clauses satisfied).

38

Returning to worst case versus average case

39

Per instance insights

• If in worse case have nontrivial approximation than in worse case can solve exactly.

• If in average case have nontrivial approximation than in average case can solve exactly?

• If per instance (from a random distribution) have nontrivial approximation than per same instance case can solve exactly?

• Is the problem just to get started?

40

A test case – the planted model

• Proxy for random dense satisfiable formulas.Generate a random dense 3CNF formula.

Pick a random assignment 1010010…Drop all clauses not satisfied by the assignment.

Give the resulting formula as input to a satisfiability algorithm.

41

Known results [Alon and Kahale; Flaxman]

Majority assignment (setting each variable to the polarity that agrees with most of its literals) is highly correlated with planted assignment (because all dropped literals were false).• Gives a very good start.Finish off in two steps:• Identify sure variables (core).• Residual formula – simple enough.

42

Alternatives to majority assignment

May modify the planted model to drop additional clauses so as to destroy correlation of majority assignment with planted assignment.

• Drop 3-clauses (~3NAE). Spectral techniques give a good start, that can be amplified and be followed by the final two steps.

• Drop most 2-clauses (~3LIN). No known good start.

43

Different variant of the planted model(joint work with Alina Arbitman)

• Drop most of the 1-clauses.• Majority assignment even more strongly

correlated with planted assignment. An excellent start.

What about the final two steps?• For some range of parameters, they still work

(with some modifications). • In some other range of parameters, still open.

44

Open question

• Choose large constants d > 1 and • Generate a 3CNF formula with Dn random

clauses.• Pick a random assignment. Relative to it, drop

all unsatisfied clauses. • Drop all but dn random 1-clauses. Is there an algorithm that can satisfy such formulas? Majority vote gives a fantastic start.

45

SummarySome open questions for 3SAT

Are dense random instances also worst case instances?Is it true that for most instances, finding an approximate solution is a sufficient step for finding an exact solution?Can preprocessing of the factor graph lead to faster (exact or approximate) satisfiability algorithms?

46

Homework

Given a random 3CNF formula with clauses, how would you set the polarities of the variables so as to be absolutely sure that the formula has no satisfying assignments?

Exploring the border between P and NP Uriel Feige Weizmann Institute 1.

Documents

clauses of s

n0 number of clauses

fraction of clauses

vertices clauses

clauses mod

random bipartite graph

refutation of random

arbitrary time