• If G connected, Error = O (p|V |). • For grids of size × n, optimal Error = Θ(p 2 n). Until c ≤ 2, then O (pn) optimal. For graph G =(V,E ), let W = {W 1 ,...,W N } be a collection of subsets of V and let T =(W ,F ) and be a tree graph over W . T is a tree decomposition if 1. W ∈W W = V . 2. For each uv ∈ E , some W ∈ W has u, v ∈ W . 3. For W 1 ,W 2 ,W 3 ∈ W , if W 2 on path from W 1 to W 3 , need W 1 ∩ W 3 ⊆ W 2 . Main theorem: Recovery from tree decomposition Suppose we have • G ′ =(V,E ′ ), E ′ ⊆ E . • Tree decomp. T =(W ,F ) for G ′ w/ constant width, overlap. • (G ′ (W )) ∆ for each W ∈ W . Then there is efficient Y s.t. Error ≤ O (p ⌈∆/2⌉ n). Statistical learning reduction: Take Y to be the empirical risk minimizer: Y = arg min Y ′ ∈F (X ) v ∈V Y ′ v ̸= Z v . Rate for ERM: v ∈V Y v ̸= Y v ≤ ˜ O ( log|F (X )|/ϵ 2 ) w.h.p. over Z. Theorem: Optimal recovery for trees When G is a tree: • Efficient algorithm Y with Hamming error Error ≤ O (pn) w.h.p. • Lower bound of Ω(pn). • → Error = O (pn) for all connected G! Key ideas: Contributions • Characterize optimal recovery rates for trees. • Lift result to general graphs via tree decomposition. • Non-trivial recovery rates for all connected graphs, including sparse graphs where recovery without side information is impossible. • All rates finite sample and high probability. • All achieved efficiently. Goal: Error = O (h(p)n), with h(p) → 0 as p → 0. Huge body of work on solving / approximating MAP, MLE, etc., but how to establish tight bounds on statistical performance? Introduced in [Globerson-Roughgarden-Sontag-Yildirim‘15]. • Fixed graph G =(V,E ), (|V | = n, |E | = m). • Ground truth labels Y ∈ {±1} V . • Observe noisy edge labels X ∈ {±1} E : X uv = Y u Y v , with prob. (1 − p) −Y u Y v , with prob. p • Observe noisy vertex labels Z ∈ {±1} V : Z u = Y u , with prob. (1 − q ) −Y u , with prob. q Inference in Sparse Graphs with Pairwise Measurements and Side Information Dylan Foster, Daniel Reichman, and Karthik Sridharan Theory-Practice Gap [email protected], [email protected] , [email protected] Goal: Obtain small Hamming error : E ( Y ) v ∈V Y v (X, Z ) ̸= Y v . aka partial recovery. Model Basic problem: Recover latent node variables using noisy measurements on edges of a graph G =(V,E ). • Community detection • Inference for structured prediction (e.g. image segmentation) • Alignment/registration/synchronization, correlation clustering, genome assembly, ... many more! Censored block model: [Abbe et al.`14] [Saade et al. `15] [Globerson,Yildirim, Roughgarden,Sontag`15] [Chen et al. `15] [Joachims/ Hopcroft`05] Motivation Contributions Our question: How do recovery prospects change with addition of side information? Proof sketch: How to take advantage of Chernoff? uv ∈E {X uv ̸= Y u Y v } ≤ 2pn + O (1) w.h.p. Define hypothesis class: F (X ) Y ′ ∈ {±1} V | uv ∈E X uv ̸= Y ′ u Y ′ v ≤ 2pn + O (1) . Then we have Y ∈ F (X ) w.h.p., and |F (X )| ≈ O 1 p pn F O (pn) X Y Result: Trees Result: General Graphs Further examples: Hypergrids, lattices, Newman-Watts — see paper for more. • √ n × √ n grid: We recover O (p 2 n), but like so: