Variational Algorithms for Marginal MAP Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Abstract Marginal MAP tasks seek an optimal configuration of the marginal distribution over a subset of variables. Marginal MAP can be computationally much harder than more common inference tasks. We show • a general variational framework for marginal MAP problems • analogues to Bethe, tree-reweighted, & mean-field approximations • novel upper bounds via the tree-reweighted free energy • “mixed” message passing and CCCP-based solvers • conditions for global or local optimality of the solutions • close connections to EM and variational EM approaches Variational Form Graphical Models Graphical models: • Factors & exponential family form • Factors are associated with cliques of a graph G=(V,E) Tasks: max (B) sum (A) H a r d e r Mixed inference problems can be hard even in trees, since • A-B trees extend notion of efficient structure to mixed inference • Ensure graph structure remains a tree during inference • Two example sub-types: sum max sum max Example from D. Koller and N. Friedman (2009) Mixed- Inference (marginal MAP, MAP) Sum-Inference (partition function, probability of evidence) Max-Inference (MAP, MPE) Variational Algorithms Sum- product Max- product Match max and sum max (B) sum (A) A ! A [ B B ! B B ! A Mixed-product message passing • start with “standard” weighted message passing • Generalize zero-temperature limit results of Weiss et al. (2007) • Apply limit directly to messages ( for Bethe, for TRW) • Match updates interpretable as a “local” marginal MAP problem • Mixed marginals satisfy a reparameterization property • Fixed points are locally optimal (similar to max-product results) • Convergence can be an issue Double-loop algorithms • Decompose H into two parts H=H + - H - & iteratively linearize H - • CCCP algorithm: take H + , H - to be convex • Can also take H + to be the Bethe approximation (non- convex) • Iteratively solve sum-product and apply truncation correction “Type 1” “Type 2” Connections to EM • Restrict to the mean-field like product subspace • Coordinate-wise updates = in the primal: • Reformulate inference as a distributional optimization problem • Define and Sum-Inference Max-Inference Mixed-Inference Sum-inference: Mixed-inference: (with equality when q=p) (with equality when q = p(A|B) 1(B=B*) or similar) This w ork Variational Approximations Bethe approximation (exact on A-B tree) • “Truncated” free energy Tree-reweighted approximation (convex comb. of A-B trees) • Dual in terms of edge appearances Experiments Chain graphs • G A is a tree • TRW1: type-1 only • TRW2: ½ type-1, ½ type 2 • Bethe: most accurate • EM: stuck quickly (2-3 iter.) Grid graphs • Attractive or mixed potentials • G A has cycles • Similar trends Attractive Mixed % correct solutions Energy relative error