Greedy MaxCut Algorithmsand theirInformationContentyataobian.com/slides/slides-itw15.pdf · 2020. 12. 8. · GreedyMaxCut Algorithmsand theirInformationContent Yatao...

Greedy MaxCut Algorithms andtheir Information Content

Yatao Bian, Alexey Gronskiy and Joachim M. Buhmann

Machine Learning Institute, ETH Zurich

April 27, 2015

1 / 19

Contents

Greedy MaxCut Algorithms

Approximation Set Coding (ASC)

Applying ASC: Count the Approximation Sets

Applying ASC: Experiments and Analysis

2 / 19

Contents





2 / 19

MaxCut

MaxCut: classical NP-hard problem

• G = (V,E), vertex set V , edge set E, weights wij ≥ 0• CUT c := (S, V \S), cut space C (|C| = 2n−1 − 1)• Cut value: cut(c,G) :=

∑i∈S,j∈V \S wij

maxcut:)

x y

z

1

32

5=2+3

3 / 19

MaxCut

MaxCut: classical NP-hard problem• G = (V,E), vertex set V , edge set E, weights wij ≥ 0

• CUT c := (S, V \S), cut space C (|C| = 2n−1 − 1)• Cut value: cut(c,G) :=


x y

z

1

32

3=1+2cut

value

maxcut:)

x y

z

1

32

5=2+3

3 / 19

MaxCut

MaxCut: classical NP-hard problem• G = (V,E), vertex set V , edge set E, weights wij ≥ 0• CUT c := (S, V \S), cut space C (|C| = 2n−1 − 1)

• Cut value: cut(c,G) :=∑

i∈S,j∈V \S wij

x y

z

1

32

3=1+2cut

value

maxcut:)

x y

z

1

32

5=2+3

3 / 19

MaxCut

MaxCut: classical NP-hard problem• G = (V,E), vertex set V , edge set E, weights wij ≥ 0• CUT c := (S, V \S), cut space C (|C| = 2n−1 − 1)• Cut value: cut(c,G) :=


x y

z

1

32

3=1+2cut

value

maxcut:)

x y

z

1

32

5=2+3

3 / 19

MaxCut

MaxCut: classical NP-hard problem• G = (V,E), vertex set V , edge set E, weights wij ≥ 0• CUT c := (S, V \S), cut space C (|C| = 2n−1 − 1)• Cut value: cut(c,G) :=


x y

z

1

32

3=1+2cut

valuemaxcut:)

x y

z

1

32

5=2+3

3 / 19

Greedy Algorithms for MaxCut

NameGreedy TechniquesHeuristic Sorting Init. Vertices

Deterministic Double GreedyDoubleSG (Sahni & Gonzales) X

SG3 (variant of SG) X XEdge Contraction (EC) Backward X

4 / 19

Double Greedy Taxonomy

Deterministic Double Greedy (D2Greedy)

Require: graph G = (V,E)Ensure: cut and the cut value1: init. 2 solutions S := ∅, T := V

//in random order2: for each vertex vi ∈ V do3: ai := gain of adding vi to S4: bi := gain of removing vi from T5: if ai ≥ bi then6: add vi to S7: else8: remove vi from T9: end if

10: end for11: return cut: (S, V \S), cut value

• works on 2 solutionssimultaneously

• for each vertex, decideswhether it should beadded to S, or removedfrom T

Differences between the double greedy algorithms:

D2Greedy → select the first 2 vertices → SGSG → sort the candidates → SG3

5 / 19

Double Greedy TaxonomyDeterministic Double Greedy (D2Greedy)








5 / 19


Require: graph G = (V,E)Ensure: cut and the cut value

1: init. 2 solutions S := ∅, T := V//in random order

2: for each vertex vi ∈ V do3: ai := gain of adding vi to S4: bi := gain of removing vi from T5: if ai ≥ bi then6: add vi to S7: else8: remove vi from T9: end if






5 / 19









5 / 19



//in random order2: for each vertex vi ∈ V do

3: ai := gain of adding vi to S4: bi := gain of removing vi from T5: if ai ≥ bi then6: add vi to S7: else8: remove vi from T9: end if

10: end for

11: return cut: (S, V \S), cut value





5 / 19



//in random order2: for each vertex vi ∈ V do3: ai := gain of adding vi to S4: bi := gain of removing vi from T

5: if ai ≥ bi then6: add vi to S7: else8: remove vi from T9: end if

10: end for






5 / 19




10: end for






5 / 19









5 / 19









5 / 19








D2Greedy → select the first 2 vertices → SG

SG → sort the candidates → SG3

5 / 19









5 / 19

Backward Greedy – Edge Contraction Algorithm

Edge Contraction (EC)

Require: graph G = (V,E)Ensure: cut, cut value1: repeat2: find the lightest edge (x, y) in G3: contract x, y to be a super vertex v4: set the edge weights connecting v5: until 2 “super" vertices left6: return the 2 super vertices

• contract the lightest edge ineach step

x y

z

1

32

v

z

2+3 = 5

contraction

Backward greedy: EC tries to remove the lightest edge from thecut set in each step

6 / 19





x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19



Require: graph G = (V,E)Ensure: cut, cut value

1: repeat2: find the lightest edge (x, y) in G3: contract x, y to be a super vertex v4: set the edge weights connecting v5: until 2 “super" vertices left6: return the 2 super vertices


x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19



Require: graph G = (V,E)Ensure: cut, cut value1: repeat

2: find the lightest edge (x, y) in G3: contract x, y to be a super vertex v4: set the edge weights connecting v

5: until 2 “super" vertices left

6: return the 2 super vertices


x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19








x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19








x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19



Require: graph G = (V,E)Ensure: cut, cut value1: repeat2: find the lightest edge (x, y) in G

3: contract x, y to be a super vertex v4: set the edge weights connecting v




x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19



Require: graph G = (V,E)Ensure: cut, cut value1: repeat2: find the lightest edge (x, y) in G3: contract x, y to be a super vertex v

4: set the edge weights connecting v




x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19



Require: graph G = (V,E)Ensure: cut, cut value1: repeat2: find the lightest edge (x, y) in G3: contract x, y to be a super vertex v4: set the edge weights connecting v5: until 2 “super" vertices left



x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19





x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19





x y

z

1

32

v

z

2+3 = 5

contraction


6 / 19

Contents





6 / 19

Glance of Approximation Set Coding (ASC)

How to measure the robustness of these algorithms facing noise?

• ASC: an analogy to Shannon’s communication theorylearning procedure ⇔ communication process [Buhmann 2010]

2 instances scenario: trainingG′, test G′′ (noisy instacesof G) G′

G′′G

noise

noise

“Master" Graph TwoInstances

• Models/algorithms should generalize well from G′ to G′′

7 / 19





G′′G

noise

noise



7 / 19





G′′G

noise

noise



7 / 19





G′′G

noise

noise



7 / 19

Approximate Solving and Algorithmic Approx. Set

• Empirical risk minimizerc⊥(G) := arg mincR(c,G)

c⊥(G′)noise6= c⊥(G′′)

• γ-approximation set (solutions γ distant fromc⊥): Cγ(G) :=

{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ

• Flow of contractive A : sequence of theavailable solution sets in each step t

Algorithmic t-approximation set [Gronskiy andBuhmann 2014]:

CAt (G)

↗ step t ⇔ ↘ resolution γ

8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)


8 / 19





{c ∈ C

∣∣ R(c,G)−R(c⊥, G) ≤ γ}

γ: resolution

Cγ(G)

c⊥

γ



CAt (G)

↗ step t ⇔ ↘ resolution γ8 / 19

Analogy of Communication System

(Not going into detail here)

Analogical mutual information in step t

IAt := EG′,G′′

[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]∆CA

t (G′, G′′) = CAt (G′) ∩ CA

t (G′′)

Information content of A

channel capacity IA := maxt IAt

9 / 19





[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]∆CA

t (G′, G′′) = CAt (G′) ∩ CA

t (G′′)



9 / 19





[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]∆CA

t (G′, G′′) = CAt (G′) ∩ CA

t (G′′)



9 / 19





[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]∆CA

t (G′, G′′) = CAt (G′) ∩ CA

t (G′′)



9 / 19





[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]∆CA

t (G′, G′′) = CAt (G′) ∩ CA

t (G′′)



9 / 19

Information Content of an Algorithm A

G′

G′′P(G)

A (G′)

A (G′′)

DataInputs Algorithm

Optimalc⊥(G)

mutual information: IAt := E

[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|

)](stepwise information)

Information content of A : channel capacity IA := maxt IAt

10 / 19


G′

G′′P(G)

A (G′)

A (G′′)


Optimalc⊥(G)


[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|


↗ step t ⇔ ↘ resolution γless informative but more robust


10 / 19


G′

G′′P(G)

A (G′)

A (G′′)


Optimalc⊥(G)


[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|




10 / 19


G′

G′′P(G)

A (G′)

A (G′′)


Optimalc⊥(G)


[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|




10 / 19


G′

G′′P(G)

A (G′)

A (G′′)


Optimalc⊥(G)


[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|




10 / 19


G′

G′′P(G)

A (G′)

A (G′′)


Optimalc⊥(G)


[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|




10 / 19

Contents





10 / 19

Counting – Double Greedy Algorithms

Counting methods similar for double greedy algorithms (D2Greedy,SG, SG3)

• SG3: assume k verticesunlabeled in step t,|CAt (G

′)| = |CA

t (G′′)| = 2k

• |CAt (G

′) ∩ CA

t (G′′)|

We propose (and provecorrectness) polynomial timealgorithm to count (notgoing in detail here):

11 / 19




′)| = |CA

t (G′′)| = 2k

• |CAt (G

′) ∩ CA

t (G′′)|


11 / 19




′)| = |CA

t (G′′)| = 2k

• |CAt (G

′) ∩ CA

t (G′′)|


11 / 19




′)| = |CA

t (G′′)| = 2k

• |CAt (G

′) ∩ CA

t (G′′)|


11 / 19

Counting – Edge Contraction Algorithm

• In step t, there are k “super"vertices, get|CAt (G

′)| = |CA

t (G′′)| = 2k−1 − 1

•We propose polynomial timealgorithm (and prove correctness)to exactly count|CAt (G

′) ∩ CA

t (G′′)|

• Involves calculating max. numberof common super vertices between 2super vertex sets (details in thepaper)

12 / 19



′)| = |CA

t (G′′)| = 2k−1 − 1


′) ∩ CA

t (G′′)|


12 / 19



′)| = |CA

t (G′′)| = 2k−1 − 1


′) ∩ CA

t (G′′)|


12 / 19



′)| = |CA

t (G′′)| = 2k−1 − 1


′) ∩ CA

t (G′′)|


12 / 19



′)| = |CA

t (G′′)| = 2k−1 − 1


′) ∩ CA

t (G′′)|


12 / 19

Contents





12 / 19

Noise Model: Gaussian Edge Weights

Master Graph GGaussian distributed edge weights:

Wij ∼ N(µ, σ2m), µ = 600, σm = 50

Negative edges are set to be µ.

Master graph G withGaussian weights

Noisy Graphs G′, G′′

G′, G

′′are obtained by adding Gaussian distributed noise.

Negative edges are set to be 0.

13 / 19



Wij ∼ N(µ, σ2m), µ = 600, σm = 50

Negative edges are set to be µ. Master graph G withGaussian weights


G′, G



13 / 19



Wij ∼ N(µ, σ2m), µ = 600, σm = 50

Negative edges are set to be µ. Master graph G withGaussian weights


G′, G



13 / 19

Noise Model: Edge Reversal

Master Graph G

1. approximate bipartite G′b: light edges,heavy edges

2. randomly flip edges in G′b ⇒ G,flipping: heavy (light) ⇒ light (heavy)(flip eij) ∼ Ber(pm); pm = 0.2

heavy edges

light edges

Approximate bipartitegraph G′b


• Flip G ⇒ G′and G

′′.

Probability of flipping an edge: Bernoulli distribution with p,

(flip eij) ∼ Ber(p)

p: noise level

14 / 19


Master Graph G



heavy edges

light edges




′′.



p: noise level

14 / 19


Master Graph G



heavy edges

light edges




′′.



p: noise level

14 / 19


Master Graph G



heavy edges

light edges




′′.



p: noise level

14 / 19


Master Graph G



heavy edges

light edges




′′.



p: noise level

14 / 19


Master Graph G



heavy edges

light edges




′′.



p: noise level14 / 19

Stepwise Information IAt


[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]

Gaussian Model, σ = 125 Edge Reversal, p = 0.65

• IAt behavior: increase initially ⇒ reach the optimal step t∗ ⇒

decreases ⇒ vanishes.• consistent with analysis: ↗ t ⇒ tradeoff of roubstness andinformativeness

15 / 19



[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]



decreases ⇒ vanishes.

• consistent with analysis: ↗ t ⇒ tradeoff of roubstness andinformativeness

15 / 19



[log(|C|·|∆CA

t (G′,G′′)||CA

t (G′)|·|CAt (G′′)|

)]



decreases ⇒ vanishes.• consistent with analysis: ↗ t ⇒ tradeoff of roubstness andinformativeness

15 / 19

Information Content IA

IA := maxt IAt (channel capacity)

Gaussian Edge Weights Model Edge Reversal Model

• All reach max. information content in the noise free limit (G′ = G′′)(p = 0, 1 in edge reversal model, σ = 0 in Gaussian model)• 1 node transmits about 1 bit information

16 / 19




• All reach max. information content in the noise free limit (G′ = G′′)(p = 0, 1 in edge reversal model, σ = 0 in Gaussian model)

• 1 node transmits about 1 bit information

16 / 19




• All reach max. information content in the noise free limit (G′ = G′′)(p = 0, 1 in edge reversal model, σ = 0 in Gaussian model)• 1 node transmits about 1 bit information

16 / 19

Effect of Greedy Heuristics

Backward greedy < double greedy


• Delayed decision making of backward greedy• EC preserves consistent solutions by contracting lightest edge (havinglow probability to be included in the cut)

17 / 19




• Delayed decision making of backward greedy

• EC preserves consistent solutions by contracting lightest edge (havinglow probability to be included in the cut)

17 / 19




• Delayed decision making of backward greedy• EC preserves consistent solutions by contracting lightest edge (havinglow probability to be included in the cut)

17 / 19

Effect of Greedy Techniques


• Initializing (D2Greedy ⇒ SG): ↘, due to early decision making• Sorting candidates (SG ⇒ SG3): ↘, due to early decision making

18 / 19



• Initializing (D2Greedy ⇒ SG): ↘, due to early decision making

• Sorting candidates (SG ⇒ SG3): ↘, due to early decision making

18 / 19



• Initializing (D2Greedy ⇒ SG): ↘, due to early decision making• Sorting candidates (SG ⇒ SG3): ↘, due to early decision making

18 / 19

Discussion

• Observation:Different greedy heuristics (backward, double) and differentprocessing techniques (sorting candidates, initializing the first2 vertices) sensitively influence the information content of A .

• Conjecture:

Backward greedydelayed decision making

< double greedyfor different noise models and noise levels.

19 / 19

Discussion

• Observation:Different greedy heuristics (backward, double) and differentprocessing techniques (sorting candidates, initializing the first2 vertices) sensitively influence the information content of A .• Conjecture:

Backward greedydelayed decision making

< double greedyfor different noise models and noise levels.

19 / 19

Thank you!

Qs?

19 / 19

Supplement: Analogy of Communication System

Imaginary communication system:

• message: permutations σs ∈ Σ on the data space• encoder: encoding σs using CA

t (σs ◦G′) (codebook vector)• channel: noisy instances G′, G′′

• decoder: max. overlap of approx. sets:σ̂ := arg maxσ∈Σ |CA

t (σ ◦G′′) ∩ CAt (σs ◦G′)|


IAt (σs; σ̂) := EG′,G′′

[log(|C| |C

At (G′)∩CA

t (G′′)||CA

t (G′)|·|CAt (G′′)|

)]channel capacity IA := maxt I

At (Information content of A )

19 / 19

Greedy MaxCut Algorithmsand theirInformationContentyataobian.com/slides/slides-itw15.pdf · 2020. 12. 8. · GreedyMaxCut Algorithmsand theirInformationContent Yatao...

Documents