MIT Randomization in Graph Optimization Problems David Karger MIT karger.
Post on 28-Dec-2015
220 Views
Preview:
Transcript
MIT
Randomized Algorithms
Flip coins to decide what to do next Avoid hard work of making “right” choice Often faster and simpler than
deterministic algorithms
Different from average-case analysis» Input is worst case» Algorithm adds randomness
MIT
Methods
Random selection» if most candidate choices “good”, then a
random choice is probably good Monte Carlo simulation
» simulations estimate event likelihoods Random sampling
» generate a small random subproblem» solve, extrapolate to whole problem
Randomized Rounding for approximation
MIT
Cuts in Graphs
Focus on undirected graphs A cut is a vertex partition Value is number (or total weight) of
crossing edges
MIT
Optimization with Cuts
Cut values determine solution of many graph optimization problems:» min-cut / max-flow» multicommodity flow (sort-of)» bisection / separator» network reliability» network design
Randomization helps solve these problems
MIT
Presentation Assumption
For entire presentation, we consider unweighted graphs (all edges have weight/capacity one)
All results apply unchanged to arbitrarily weighted graphs» Integer weights = parallel edges» Rational weights scale to integers» Analysis unaffected» Some implementation details
MIT
Basic Probability
Conditional probability» Pr[A B] = Pr[A] Pr[B | A]
Independent events multiply:» Pr[A B] = Pr[A] Pr[B]
Linearity of expectation: » E[X + Y] = E[X] + E[Y]
Union Bound» Pr[X Y] Pr[X] + Pr[Y]
MIT
Minimum Cut
Smallest cut of graph Cheapest way to separate into 2 parts Various applications:
» network reliability (small cuts are weakest)» subtour elimination constraints for TSP» separation oracle for network design
Not s-t min-cut
MIT
Max-flow/Min-cut
s-t flow: edge-disjoint packing of s-t paths s-t cut: a cut separating s and t [FF]: s-t max-flow = s-t min-cut
» max-flow saturates all s-t min-cuts» most efficient way to find s-t min-cuts
[GH]: min-cut is “all-pairs” s-t min-cut» find using n flow computations
MIT
Flow Algorithms
Push-relabel [GT]:» push “excess” around graph till it’s gone» max-flow in O*(mn) (note: O* hides logs)
– Recent O*(m3/2) [GR]
» min-cut in O*(mn2) --- “harder” than flow Pipelining [HO]:
» save push/relabel data between flows» min-cut in O*(mn) --- “as easy” as flow
MIT
Contraction Algorithm
Repeat n - 2 times:» find non-min-cut edge» contract it (keep parallel edges)
Each contraction decrements #vertices At end, 2 vertices left
» unique cut» corresponds to min-cut of starting graph
MIT
Picking an Edge
Must contract non-min-cut edges [NI]: O(m) time algorithm to pick edge
» n contractions: O(mn) time for min-cut» slightly faster than flows
If only could find edge faster….
Idea: min-cut edges are few
MIT
Analysis I
Min-cut is small---few edges» Suppose graph has min-cut c» Then minimum degree at least c» Thus at least nc/2 edges
Random edge is probably safe
Pr[min-cut edge] c/(nc/2)
= 2/n
(easy generalization to capacitated case)
MIT
Analysis II
Algorithm succeeds if never accidentally contracts min-cut edge
Contracts #vertices from n down to 2 When k vertices, chance of error is 2/k
» thus, chance of being right is 1-2/k Pr[always right] is product of
probabilities of being right each time
MIT
22
)1(2
31
132
32
122
n
2k
)())((
)1()1)(1(
safe]n contractio Pr[ success]Pr[
n
nn
nn
nn
nn
thk
Analysis III
…not too good!
MIT
Repetition
Repetition amplifies success probability» basic failure probability 1 - 2/n2
» so repeat 7n2 times
6
72
7
2
10
)1(
once]) (Pr[fail
times]7 Pr[failfailure] completePr[
2
2
2
n
n
n
n
MIT
How fast?
Easy to perform 1 trial in O(m) time» just use array of edges, no data structures
But need n2 trials: O(mn2) time Simpler than flows, but slower
MIT
An improvement [KS]
When k vertices, error probability 2/k» big when k small
Idea: once k small, change algorithm» algorithm needs to be safer» but can afford to be slower
Amplify by repetition!» Repeat base algorithm many times
MIT
Recursive Algorithm
Algorithm RCA ( G, n )
{G has n vertices}
repeat twice
randomly contract G to n/21/2 vertices
RCA(G,n/21/2)
(50-50 chance of avoiding min-cut)
MIT
Main Theorem
On any capacitated, undirected graph, Algorithm RCA» runs in O*(n2) time with simple structures» finds min-cut with probability 1/log n
Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2 n) time.
MIT
Proof Outline
Graph has O(n2) (capacitated) edges So O(n2) work to contract, then two
subproblems of size n/2½
» T(n) = 2 T(n/2½) + O(n2) = O(n2 log n) Algorithm fails if both iterations fail
» Iteration succeeds if contractions and recursion succeed
» P(n)=1 - [1 - ½ P(n/2½)]2 = (1 / log n)
MIT
Failure Modes
Monte Carlo algorithms always run fast and probably give you the right answer
Las Vegas algorithms probably run fast and always give you the right answer
To make a Monte Carlo algorithm Las Vegas, need a way to check answer» repeat till answer is right
No fast min-cut check known (flow slow!)
MIT
Cut Counting
Original CA finds any given min-cut with probability at least 2/n(n-1)
Only one cut found Disjoint events, so probabilities add So at most n(n-1)/2 min-cuts
» probabilities would sum to more than one Tight
» Cycle has exactly this many min-cuts
MIT
Enumeration
RCA as stated has constant probability of finding any given min-cut
If run O(log n) times, probability of missing a min-cut drops to 1/n3
But only n2 min-cuts So, probability miss any at most 1/n So, with probability 1-1/n, find all
» O(n2 log3 n) time
MIT
Generalization
If G has min-cut c, cut c is -mincut Lemma: contraction algorithm finds any
given -mincut with probability (n-2) » Proof: just add factor to basic analysis
Corollary: O(n2) -mincuts Corollary: Can find all in O*(n2) time
» Just change contraction factor in RCA
MIT
Summary
A simple fast min-cut algorithm» Random selection avoids rare problems
Generalization to near-minimum cuts Bound on number of small cuts
» Probabilistic method, backwards
MIT
The Problem
Input:» Graph G with n vertices » Edge failure probabilities
– For simplicity, fix a single p
Output:» FAIL(p): probability G is disconnected by
edge failures
MIT
Approximation Algorithms
Computing FAIL(p) is #P complete [V] Exact algorithm seems unlikely Approximation scheme
» Given G, p, , outputs -approximation» May be randomized:
– succeed with high probability
» Fully polynomial (FPRAS) if runtime is polynomial in n, 1/
MIT
Monte Carlo Simulation
Flip a coin for each edge, test graph k failures in t trials FAIL(p) k/t E[k/t] = FAIL(p) How many trials needed for confidence?
» “bad luck” on trials can yield bad estimate» clearly need at least 1/FAIL(p)
Chernoff bound: O*(1/2FAIL(p)) suffice to give probable accuracy within » Time O*(m/2FAIL(p))
MIT
Chernoff Bound
Random variables Xi [0,1] Sum X = Xi
Bound deviation from expectation
Pr[ |X-E[X]| E[X] ] < exp(-2E[X]/4) If E[X] 4(log n)/2, “tight concentration”
» Deviation by probability < 1 / n No one variable is a big part of E[X]
MIT
Application
Let Xi=1 if trial i is a failure, else 0
Let X = X1 + … + Xt
Then E[X] = t FAIL(p) Chernoff says X within relative of E[X]
with probability 1-exp(2 t FAIL(p)/4) So choose t to cancel other terms
» “High probability” t = O(log n / 2FAIL(p))» Deviation by probability < 1 / n
MIT
Network reliability problem
Random edge failures» Estimate FAIL(p) = Pr[graph disconnects]
Naïve Monte Carlo simulation» Chernoff bound---“tight concentration”
Pr[ |X-E[X]| E[X] ] < exp(-2E[X]/4)» O(log n / 2FAIL(p)) trials expect O(log n / 2)
network failures---good for Chernoff» So estimate within in O*(m/2FAIL(p)) time
MIT
Rare Events
When FAIL(p) too small, takes too long to collect sufficient statistics
Solution: skew trials to make interesting event more likely
But in a way that let’s you recover original probability
MIT
DNF Counting
Given DNF formula (OR of ANDs)(e1 e2 e3) (e1 e4) (e2 e6)
Each variable set true with probability p Estimate Pr[formula true]
» #P-complete [KL, KLM] FPRAS
» Skew to make true outcomes “common”» Time linear in formula size
MIT
Rewrite problem
Assume p=1/2 » Count satisfying assignments
“Satisfaction matrix » Sij=1 if ith assignment satisfies jth clause
We want number of nonzero rows Randomly sampling rows won’t work
» Might be too few nonzeros
MIT
New sample space
So normalize every nonzero row to sum to one (divide by number of nonzeros)» Now sum of nonzeros is desired value» So sufficient to estimate average nonzero
MIT
Sampling Nonzeros
We know number of nonzeros/column» If satisfy given clause, all variables in
clause must be true» All other variables unconstrained
Estimate average by random sampling» Know number of nonzeros/column» So can pick random column» Then pick random true-for-column
assignment
MIT
Few Samples Needed
Suppose k clauses Then E[sample] > 1/k
» 1 satisfied clauses k» 1 sample value 1/k
Adding O(k log n / 2) samples gives “large” mean
So Chernoff says sample mean is probably good estimate
MIT
Reliability Connection
Reliability as DNF counting:» Variable per edge, true if edge fails» Cut fails if all edges do (AND of edge vars)» Graph fails if some cut does (OR of cuts)» FAIL(p)=Pr[formula true]
Problem: the DNF has 2n clauses
MIT
Focus on Small Cuts
Fact: FAIL(p) > pc
Theorem: if pc=1/n(2+) then Pr[>-mincut fails]< n-
Corollary: FAIL(p) Pr[-mincut fails],
where=1+2/ Recall: O(n2) -mincuts Enumerate with RCA, run DNF counting
MIT
Proof of Theorem
Given pc=1/n(2+)
At most n2 cuts have value c Each fails with probability pc=1/n(2+)
Pr[any cut of value c fails] = O(n) Sum over all
MIT
Algorithm
RCA can enumerate all -minimum cuts with high probability in O(n2) time.
Given -minimum cuts, can -estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2))
Corollary: when FAIL(p)< n-(2+), can -approximate it in O (cn2+4/) time
MIT
Combine
For large FAIL(p), naïve Monte Carlo For small FAIL(p), RCA/DNF counting Balance: -approx. in O(mn3.5/2) time Implementations show practical for
hundreds of nodes Again, no way to verify correct
MIT
Summary
Naïve Monte Carlo simulation works well for common events
Need to adapt for rare events Cut structure and DNF counting lets us
do this for network reliability
MIT
Random Sampling
General tool for faster algorithms:» pick a small, representative sample» analyze it quickly (small)» extrapolate to original (representative)
Speed-accuracy tradeoff» smaller sample means less time» but also less accuracy
MIT
Min-cut Duality
[Edmonds]: min-cut=max tree packing» convert to directed graph» “source” vertex s (doesn’t matter which)» spanning trees directed away from s
[Gabow] “augmenting trees”» add a tree in O*(m) time» min-cut c (via max packing) in O*(mc)
» great if m and c are small…
MIT
Random Sampling
[Gabow] scheme great if m, c small Random sampling
» reduces m, c
» scales cut values (in expectation)» if pick half the edges, get half of each cut
So find tree packings, cuts in samples
Problem: maybe some large deviations
MIT
Sampling Theorem
Given graph G, build a sample G(p) by including each edge with probability p
Cut of value v in G has expected value pv in G(p)
Definition: “constant” = 8 (ln n) / 2
Theorem: With high probability, all cuts in G( / c) have (1 ± ) times their expected values.
MIT
A Simple Application
[Gabow] packs trees in O*(mc) time Build G( / c)
» minimum expected cut » by theorem, min-cut probably near » find min-cut in time O*(m) using [Gabow]» corresponds to near-min-cut in G
Result: (1+) times min-cut in O*(m) time
MIT
Proof of Sampling: Idea
Chernoff bound says probability of large deviation in cut value is small
Problem: exponentially many cuts. Perhaps some deviate a great deal
Solution: showed few small cuts» only small cuts likely to deviate much» but few, so Chernoff bound applies
MIT
Proof of Sampling
Sampled with probability /c, » a cut of value c has mean » [Chernoff]: deviates from expected size by
more than with probability at most n-3
At most n2 cuts have value c Pr[any cut of value c deviates] = O(n) Sum over all
MIT
Approximate Tree Packing
Break edges into c /random groups Each looks like a sample at rate / c
» O*( m / c) edges» each has min expected cut » so theorem says min-cut 1 –
So each has a packing of size 1 – [Gabow] finds in time O*(m/c) per group
» so overall time is c O*(m/c) = O*(m)
MIT
Las Vegas Algorithm
Packing algorithm is Monte Carlo Previously found approximate cut (faster) If close, each “certifies” other
» Cut exceeds optimum cut» Packing below optimum cut
If not, re-run both Result: Las Vegas, expected time O*(m)
MIT
Exact Algorithm
Randomly partition edges in two groups» each like a 1/2-sample: =O*(c-1/2)
Recursively pack trees in each half» c/2 - O*(c1/2) trees
Merge packings» gives packing of size c - O*(c1/2)
» augment to maximum packing: O*(mc1/2) T(m,c)=2T(m/2,c/2)+O*(mc1/2) = O* (mc1/2)
MIT
Analyze Trees
Recall: [G] packs c (directed)-edge disjoint spanning trees
Corollary: in such a packing, some tree crosses min-cut only twice
To find min-cut:» find tree packing» find smallest cut with 2 tree edges crossing
Problem: packing takes O*(mc) time
MIT
Constraint trees
Min-cut c: » c directed trees» 2c directed min-cut
edges » On average, two
min-cut edges/tree
Definitions:
tree 2-crosses cut
MIT
Finding the Cut
From crossing tree edges, deduce cut
Remove tree edges
No other edges cross So each component
is on one side And opposite its
“neighbor’s” side
MIT
Sampling
Solution: use G(/c) with =1/8» pack O*() trees in O*(m) time » original min-cut has (1) edges in G( / c) » some tree 2-crosses it in G( / c) » …and thus 2-crosses it in G
Analyze O*() trees in G» time O*(m) per tree» Monte Carlo
MIT
Root tree, so cut subtree Use dynamic program up from leaves to
determine subtree cuts efficiently Given cuts at children of a node,
compute cut at parent Definitions:
» v are nodes below v» C(v) is value of cut at subtree v
Analyzing a tree
MIT
Algorithm: 1-Crossing Trees
Compute edges’ LCA’s: O(m) Compute “cuts” at leaves
» Cut values = degrees» each edge incident on at most two leaves» total time O(m)
Dynamic program upwards: O(n)
Total: O(m+n)
MIT
2-Crossing Trees
Cut corresponds to two subtrees:
n2 table entries fill in O(n2) time with dynamic program
C v w C v C w C v w 2 ,
v w
keep
discard
MIT
Bottleneck is C(v, w) computations Avoid. Find right “twin” w for each v
Linear Time
Compute using addpath and minpath operations of dynamic trees [ST]
Result: O(m log3 n) time (messy)
C v C w C v ww
min ,2
MIT
Problem Statement
Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties
Example: minimum cost k-connected graph.
Generally NP-hard Recent approximation algorithms [GW],
[JV]
MIT
Integer Linear Program
Variable xvw=1 if buy edge vw
Solution cost xvw cvw
Constraint: for every cut, xvw k
Relaxing integrality gives tractable LP» Exponentially many cuts» But separation oracles exist (eg min-cut)
What is integrality gap?
MIT
Randomized Rounding
Given LP solution values xvw
Build graph where vw is present with probability xvw
Expected cost is at most opt: xvw cvw
Expected number of edges crossing any cut satisfies constraint
If expected number large for every cut, sampling theorem applies
MIT
k-connected subgraph
Fractional solution is k-connected So every cut has (expected) k edges
crossing in rounded solution Sampling theorem says every cut has at
least k-(k log n)1/2 edges Close approximation for large k Can often repair: e.g., get k-connected
subgraph at cost 1+((log n)/k)1/2 times min
MIT
s-t Min-Cuts
Recall: if G has min-cut c, then in G(/c) all cuts approximate their expected values to within.
Applications:
Min-cut inO*(mc) time [G]
Approximate/exact inO*((m/c) c) =O*(m)
s-t min-cut of value v in O*(mv)
Approximate inO*(mv/c) time
Trouble if c is small and v large.
MIT
The Problem
Cut sampling relied on Chernoff bound Chernoff bounds required that no one
edge is a large fraction of the expectation of a cut it crosses
If sample rate 1/c, each edge across a min-cut is too significant
But: if edge only crosses large cuts, then sample rate 1/c is OK!
MIT
Biased Sampling
Original sampling theorem weak when» large m
» small c But if m is large
» then G has dense regions» where c must be large» where we can sample more sparsely
MIT
Approx. s-t min-cut O*(mn) O*(n2 / 2)
Approx. s-t max-flow O*(m3/2 ) O*(mn1/2 / )
Flow of value v O*(mv)
O*(n11/9v)
Approx. bisection O*(m2) O*(n2 / 2)
m n /2 in weighted, undirected graphs
Problem Old Time New Time
vmnO *
MIT
Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k.
Strong Components
2 2
3
3
MIT
Nonuniform Sampling
Definition: An edge is k-strong if its endpoints are in same k-component.
Stricter than k-connected endpoints. Definition: The strong connectivity ce
for edge e is the largest k for which e is k-strong.
Plan: sample dense regions lightly
MIT
Nonuniform Sampling
Idea: if an edge is k-strong, then it is in a k-connected graph
So “safe” to sample with probability 1/k Problem: if sample edges with different
probabilities, E[cut value] gets messy Solution: if sample e with probability pe,
give it weight 1/pe
Then E[cut value]=original cut value
MIT
Compression Theorem
Definition: Given compression probabilities pe, compressed graph G[pe]
» includes edge e with probability pe and
» gives it weight 1/pe if included
Note E[G[pe]] = G
Theorem: G[/ ce]
» approximates all cuts by » has O (n) edges
MIT
Proof (approximation)
Basic idea: in a k-strong component, edges get sampled with prob. / k» original sampling theorem works
Problem: some edges may be in stronger components, sampled less
Induct up from strongest components:» apply original sampling theorem inside» then “freeze” so don’t affect weaker parts
MIT
Strength Lemma
Lemma: 1/ce n
» Consider connected component C of G» Suppose C has min-cut k
» Then every edge e in C has ce k
» So k edges crossing C’s min-cut have
1/ce 1/k k (1/k ) = 1
» Delete these edges (“cost” 1)» Repeat n - 1 times: no more edges!
MIT
Proof (edge count)
Edge e included with probability / ce
So expected number is / ce
We saw 1/ce n
So expected number at most n
MIT
Construction
To sample, must find edge strengths» can’t, but approximation suffices
Sparse certificates identify weak edges:» construct in linear time [NI]» contain all edges crossing cuts k
» iterate until strong components emerge Iterate for 2i-strong edges, all i
» tricks turn it strongly polynomial
MIT
Certificate Algorithm
Repeat k times» Find a spanning forest» Delete it
Each iteration deletes one edge from every cut (forest is spanning)
So at end, any edge crossing a cut of size k is deleted
[NI] merge all iterations in O(m) time
MIT
Flows
Uniform sampling led to flow algorithms» Randomly partition edges» Merge flows from each partition element
Compression problematic» Edge capacities changed» So flow path capacities distorted» Flow in compressed graph doesn’t fit in
original graph
MIT
Smoothing
If edge has strength ce, divide into / ce edges of capacity ce /» Creates 1/ce = n edges
Now each edge is only 1/ fraction of any cut of its strong component
So sampling a 1/ fraction works So dividing into groups works Yields (1-) max-flow in time /* vmnO
MIT
Cleanup
Approximate max-flow can be made exact by augmenting paths
Integrality problems» Augmenting paths fast for small integer flow» But breakup by smoothing ruins integrality
Surmountable» Flows in dense and sparse parts separable
Result: max-flow in O*(n11/9v) time
MIT
Conclusion
Randomization is a crucial tool for algorithm design
Often yields algorithms that are faster or simpler than traditional counterparts
In particular, gives significant improvements for core problems in graph algorithms
MIT
Randomized Methods
Random selection» if most candidate choices “good”, then a
random choice is probably good Monte Carlo simulation
» simulations estimate event likelihoods Random sampling
» generate a small random subproblem» solve, extrapolate to whole problem
Randomized Rounding for approximation
MIT
Random Selection
When most choices good, do one at random
Recursive contraction algorithm for minimum cuts» Extremely simple (also to implement)» Fast in theory and in practice [CGKLS]
MIT
Monte Carlo
To estimate event likelihood, run trials Slow for very rare events Bias samples to reveal rare event FPRAS for network reliability
MIT
Random Sampling
Generate representative subproblem Use it to estimate solution to whole
» Gives approximate solution» May be quickly repaired to exact solution
Bias sample toward “important” or “sensitive” parts of problem
New max-flow and min-cut algorithms
MIT
Randomized Rounding
Convert fractional to integral solutions Get approximation algorithms for integer
programs “Sampling” from a well designed sample
space of feasible solutions Good approximations for network
design.
MIT
Generalization
Our techniques work because undirected graph are matroids
All our results extend/are special cases» Packing bases» Finding minimum “quotients”» Matroid optimization (MST)
MIT
Directed Graphs?
Directed graphs are not matroids Directed graphs can have lots of
minimum cuts Sampling doesn’t appear to work Residual graphs for flows are directed
» Precludes obvious recursive solutions to flow problems
MIT
Open problems
Flow in O(nv) time (complete m n)» Eliminate v dependence» Apply to weighted graphs with large flows» Flow in O(m) time?
Las Vegas algorithms» Finding good certificates
Detrministic algorithms» Deterministic construction of “samples”» Deterministically compress a graph
top related