Page 1
ECE 6504: Advanced Topics in Machine Learning
Probabilistic Graphical Models and Large-Scale Learning
Dhruv Batra Virginia Tech
Topics – Markov Random Fields: MAP Inference
– Integer Programming, LP formulation – Dual Decomposition
Readings: KF 13.1-5, Barber 5.1,28.9
Page 2
Administrativia • HW1
– Solutions and grades released
• HW2 – Solutions released – Grades next week
• Project Presentations – When: April 22, 24 – Where: in class – 5 min talk
• Main results • Semester completion 2 weeks out from that point so nearly finished
results expected • Slides due: April 21 11:55pm
(C) Dhruv Batra 2
Page 3
Recap of Last Time
(C) Dhruv Batra 3
Page 4
MAP Inference
(C) Dhruv Batra 4
MAP
Inference
Most Likely Assignment
y1
y2
…
yn
Person
Table Plate
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open
S(y) =�
i∈Vθi(yi) +
�
(i,j)∈E
θij(yi, yj)
P (y) =1
Z eS(y)
kx1 1 1 10 0
kxk
10
10 10
10
0
0
Node Scores / Local Rewards
Edge Scores / Distributed Prior
P (y)
y
Page 5
MAP Inference • Why is MAP difficult?
• What if we independently maximize the terms?
(C) Dhruv Batra 5
Page 6
MAP in Pairwise MRFs
• Over-Complete Representation
6 (C) Dhruv Batra
G = (V, E)
X1
X2
…
Xn Xi kxk
… …
θij(1, 1) θij(1, k)
θij(k, 1) θij(k, k)
θ1(1)
θ1(k)
…
kx1
x1 (x1, x2) (xn−1, xn)
…
θn(1)
θn(k)
… …
θij(1, 1) θij(1, k)
θij(k, 1) θij(k, k)
… xn…
kx1
µi(s)1
1
0
0
0
0
0
0
θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)
1
0
0
0
xi = 1
0
1
0
0
xi = 2
µ1(1) . . . µ1(k) µn(1) . . . µn(k)
xi
Page 7
MAP in Pairwise MRFs
• Over-Complete Representation
7 (C) Dhruv Batra
G = (V, E)
x1
x2
…
xn Xi
x1 (x1, x2) (xn−1, xn)… xn…
θ = θ1(1) . . . θ1(k) θn(1) . . . θn(k) θn−1,n(1, 1) . . . θn−1,n(k, k)θ12(1, 1) . . . θ12(k, k)
k2x1
µij(s, t)
xi
xj
1 0 0 0 0 0 0 0 0 0 0 0
xi = 1xj = 1
0 1 0 0 0 0 0 0 0 0 0 0
xj = 2xi = 1
µ1(1) . . . µ1(k) µn(1) . . . µn(k) µ12(1, 1) . . . µ12(k, k) µn−1,n(1, 1) . . . µn−1,n(k, k)µx =
S(x) = θ · µx
Page 8
MAP in Pairwise MRFs • Integer Program
(C) Dhruv Batra 8
�
s
µij(s, t) = µj(t)
�
s,t
µi,j(s, t) = 1
�
s
µi(s) = 1
µij(s, t) ∈ {0, 1}
µi(s) ∈ {0, 1}Indicator Variables
Unique Label
Consistent Assignments
maxµ
θTµ
Page 9
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 9
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
Page 10
MAP in Pairwise MRFs • MAP Linear Program
(C) Dhruv Batra 10
µx5
µx4µx3
µx2
µx1
A = O(|E|)
O(|E|)
Off-the-shelf solvers CPLEX Mosek
etc
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ [0, 1]
Page 11
Plan for today • MRF Inference
– (Specialized) MAP Inference • Integer Programming Formulation • Linear Programming Relaxation
– Understanding the LP better – When is it tight? – When is it not?
• Dual Decomposition – Algorithm for solving this LP
(C) Dhruv Batra 11
Page 12
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 12
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
Page 13
Marginal Polytope
(C) Dhruv Batra 13
• a
Figure Credit: David Sontag
Page 14
MAP in Pairwise MRFs • MAP Linear Program
• Properties – If LP-opt is integral, MAP is found – LP always integral for trees – Efficient message-passing schemes for solving LP
(C) Dhruv Batra 14
µx5
µx4µx3
µx2
µx1
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ [0, 1]
Page 15
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 15
A = O(|E|)
O(|E|)
λij→j
λji→i
Page 16
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 16
A = O(|E|)
O(|E|)
λ(t)ji→i
λ(t)ij→j
Page 17
LP Relaxation • Block Co-ordinate / Sub-gradient Descent on Dual
(C) Dhruv Batra 17
A = O(|E|)
O(|E|)
λ(t+1)ij→j
λ(t+1)ji→i
Distributed Message-Passing
Still inefficient!
Page 18
Linear Programming Duality
(C) Dhruv Batra 18 Figure Credit: David Sontag
Page 19
Dual Decomposition • For MAP Inference
– On board
(C) Dhruv Batra 19
Page 20
MAP in Pairwise MRFs • MAP Integer Program
(C) Dhruv Batra 20
maxµ
θTµ
s.t. Aµ = b
µ(·) ∈ {0, 1}
µx5
µx4µx3
µx2
µx1
Page 21
MAP LP • Lagrangian Relaxation
(C) Dhruv Batra 21
Dual
Convex (Non-smooth)
Upper-Bound on MAP
f(λ) =
Subgradient Descent
∇f(λ(0))
λ(0)
∇f(λ(1))
λ(1)
maxµ∈C
�
i
θi · µi +�
(i,j)
θij · µij
s.t. µi(·),µij(·) ∈ {0, 1}
minλ≥0
f(λ)
λ
MAP score
f(λ)
−λ · (Aµ− b)