• Perturb-and-MAP [1] has been shown to be effective for pairwise MRFs, yet its application to other kinds of graphical models has been limited. • We demonstrate that Perturb-and-MAP is effective at learning features using graphical models with complex dependencies between variables. • We also propose a method of designing perturbations so that the distribution induced by Perturb-and-MAP better approximates the Gibbs distribution. Abstract Efficient Feature Learning Using Perturb-and-MAP Ke Li, Kevin Swersky and Richard Zemel • The cardinality restricted Boltzmann machine (CaRBM) enforces a sparsity constraint over hidden units: P , = 1 exp( T + T + T ) ψ ( ℎ ) where ψ =1 if x ≤ k and 0 otherwise. • Training requires sampling from P | , which is non-trivial as the hidden units are not conditionally independent from each other. • Swersky et al. [2] proposed a method to compute P | using message passing in O( ) time. • Using Perturb-and-MAP, if the input to each hidden unit is perturbed with Logistic(0,1) noise and MAP is performed using a selection algorithm, an approximate sample can be drawn in O( ) time. • We found the features learned by Perturb-and-MAP has a greater discriminative capability. Cardinality RBM • Many tasks involve predicting the correct matching in a bipartite graph, like image stitching, stereo reconstruction and video tracking. • Our aim is to learn a descriptor for image patches that is tailored to matching key points across images. • Our bipartite matching model is characterized by: P ; θ = 1 exp − 1 2 , ϕ ;θ −ϕ ′ ;θ 2 2 · ψ( ) ψ( ) where ψ(x) = 1 if x = 1 and 0 otherwise, =1 if i th and j th key points match and 0 otherwise. • Training requires estimating an expectation over using a sample from P ; θ . • As computing the partition function of P ; θ is #P-hard, sampling from P ; θ is challenging. • If the model is perturbed with noise from the right distribution, approximate samples can be drawn in O( 3 ) time using the Hungarian algorithm. Bipartite Matching • If the negative energy of each joint configuration is perturbed with i.i.d. Gumbel(0,1) noise, exact samples can be drawn from the Gibbs distribution using Perturb-and-MAP. • In practice, reduced-order perturbation must be used to ensure tractability. As a result, negative perturbed energies across joint configurations are no longer independent or Gumbel-distributed. We propose a way of designing perturbations so that the latter property is preserved. • The negative perturbed energy of each joint configuration is distributed according to the sum of individual perturbations. • We find a distribution D(1) using numerical deconvolution that satisfies the following property: If ~ 0,1 ⊥ ~(1), + ~(0,2). • Define D(s) as a scaled version of D(1). Then if ~ 0, 2 − −1 ⊥ ~ 2 − ∀ ∈ *1, … , − 1+, + ~(0,1). Thus, by perturbing the model with noise from the above distributions, the negative energy of each joint configuration is guaranteed to follow a Gumbel(0,1) distribution. Designing Perturbations Figure 1: The pdfs of Gumbel(0,1) and D(1) Figure 2b: Comparison of prediction errors Figure 4: Comparison of test error rates Figure 2a: Comparison of reconstruction errors Ongoing Research • We are exploring ways of combining D-perturbations to obtain perturbations with equal entropy while ensuring the negative perturbed energies are approximately Gumbel-distributed. • We are also investigating how closely the empirical marginals over configurations produced using different perturbation methods approximate the underlying Gibbs distribution. Figure 3: Two frames and ground truth matching from dataset • Perturb-and-MAP is an approximate sampling method that leverages existing optimization algorithms for performing MAP inference. • It works by perturbing potentials with random noise, and then performing MAP inference on the model with perturbed potentials. • It relies on the following fact: If 1 ,…, ~ iid Gumbel(0,1), then + = max + = exp( ) exp( ) • If the energy of each joint configuration is perturbed, Perturb-and-MAP yields an exact sample. • In a pairwise MRF, perturbing unary and pairwise potentials has been shown empirically to produce similar results as perturbing each joint configuration. Perturb-and-MAP References [1] George Papandreou and Alan L. Yuille (2011). Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models, ICCV. [2] Kevin Swersky, Danny Tarlow, Ilya Sutskever, Ruslan Salakhutdinov, Rich Zemel, and Ryan Adams (2012). Cardinality restricted boltzmann machines, NIPS 25. {keli,kswersky,zemel}@cs.toronto.edu