8/12/2019 Aug Lagrangian and ADMM
1/19
8/12/2019 Aug Lagrangian and ADMM
2/19
ST810 Lecture 24
Augmented Lagrangian method
Outline
Augmented Lagrangian method
ADMM
Final words
8/12/2019 Aug Lagrangian and ADMM
3/19
ST810 Lecture 24
Augmented Lagrangian method
Augmented Lagrangian method
Consider minimizingf(x)subject to equality constraintsgi(x) =0 fori=1, . . . , q.
Inequality constraints are ignored for simplicity
Assumef andgiare smooth for simplicity
At a constrained minimum, the Lagrange multiplier condition
0= f(x) +
q
i=1
igi(x)
holds providedgi(x)are linearly independent
8/12/2019 Aug Lagrangian and ADMM
4/19
ST810 Lecture 24
Augmented Lagrangian method
Augmented Lagrangian:
L(x,) =f(x) +
qi=1
igi(x) +
2
qi=1
gi(x)2
The penalty term(/2)
qi=1 gi(x)2 punishes violations of theequality constraintsgi()
Idea: optimize the Augmented Lagrangian and adjust in thehope of matching the true Lagrange multipliers
For large enough (finite), the unconstrained minimizer of the
augmented Lagrangian coincides with the constrained solution ofthe original problem
At convergence, the gradient gi(x)gi(x)vanishes and werecover the standard multiplier rule
8/12/2019 Aug Lagrangian and ADMM
5/19
ST810 Lecture 24
Augmented Lagrangian method
Algorithm: take initially large or gradually increase it; iterate find the unconstrained minimum
x(t+1) minxL(x,
(t))
update the multiplier vector
(t+1)i
(t)i +gi(x(
t)), i=1, . . . , q
Intuition for updating: ifx(t) is the unconstrained minimum ofL(x,), then the stationarity condition says
0 = f(x
(t)
) +
q
i=1
(t)
i gi(x
(t)
) +
q
i=1
gi(x
(t)
)gi(x
(t)
)
= f(x(t)) +
q
i=1
[(t)i +gi(x
(t))]gi(x(t))
For non-smoothf, replace gradientfby subdifferentialf
8/12/2019 Aug Lagrangian and ADMM
6/19
ST810 Lecture 24
Augmented Lagrangian method
Example: basis pursuit
Basis pursuit problem seeks the sparsest solution subject to linearconstraints
minimize x1
subject to Ax=b
Take initially large or gradually increase it; iterate according to
x(t+1) minx1+(t),Ax b+
2
Ax b22 (lasso)
(t+1) (t) +(Ax(t+1) b)
Converges in a finite (small) number of steps (Yin et al., 2008)
8/12/2019 Aug Lagrangian and ADMM
7/19
ST810 Lecture 24
Augmented Lagrangian method
Remarks
The augmented Lagrangian method dates back to 50s
(Hestenes, 1969; Powell, 1969) Monograph by Bertsekas (1982) provides a general treatment
Same as theBregman iteration(Yin et al., 2008) proposed forbasis pursuit (compressive sensing)
Equivalent to proximal point algorithm applied to the dual; can be
accelerated (Nesterov)
8/12/2019 Aug Lagrangian and ADMM
8/19
ST810 Lecture 24
ADMM
Outline
Augmented Lagrangian method
ADMM
Final words
8/12/2019 Aug Lagrangian and ADMM
9/19
ST810 Lecture 24
ADMM
ADMM
Alternatingdirectionmethod ofmultipliers
Consider minimizingf(x) +g(y)subject to affine constraintsAx+ By=c
The augmented Lagrangian
L(x, y,) =f(x) +g(y) + ,Ax+ By c +
2Ax+ By c22
Idea: perform block descent on xandyand then updatemultiplier vector
x(t+1) minx
f(x) + ,Ax+ By(t) c + 2Ax+ By(t) c22
y(t+1) miny
g(y) + ,Ax(t+1) + By c +
2Ax(t+1) + By c22
(t+1) (t) +(Ax(t+1) + By(t+1) c)
8/12/2019 Aug Lagrangian and ADMM
10/19
ST810 Lecture 24
ADMM
Example: fused lasso
Fused lasso problem minimizes
1
2
y X22+
p1
j=1
|j+1j|
Define=D, where
D=
1 1
1 1
.
Then we minimize 12y X22+1 subject toD=
ST810 L t 24
8/12/2019 Aug Lagrangian and ADMM
11/19
ST810 Lecture 24
ADMM
Augmented Lagrangian is
L(,,) = 1
2y X22+1+
T(D ) +
2D 22
ADMM: Update is a smooth quadratic problem Update is a separated lasso problem (elementwise thresholding) Update multipliers
(t+1) (t) +(D(t) (t))
Same algorithm applies to a general regularization matrixD(generalized lasso)
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
12/19
ST810 Lecture 24
ADMM
Remarks on ADMM
Related algorithms split Bregman iteration(Goldstein and Osher, 2009) Dykstra (1983)s alternating projection algorithm ...
Proximal point algorithm applied to the dual
Numerous applications in statistics and machine learning: lasso,gen. lasso, graphical lasso, (overlapping) group lasso, ...
Embraces distributed computing for big data (Boyd et al., 2011)
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
13/19
ST810 Lecture 24
Final words
Outline
Augmented Lagrangian method
ADMM
Final words
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
14/19
ST810 Lecture 24
Final words
Take-home messages from this course
Statistics, the science ofdata analysis, is the appliedmathematics in the 21st century Read the first few pages and the last few pages of Tukey (1962)s
Future of data analysis(posted on course website). They are amust for every statistician
Big dataera:wiki,WSJ,white house,McKinsey report, ...Challenges: methodology: bigp
efficiency: bignand/or bigp memory: bign, distributed computing via MapReduce (Hadoop),
online algorithms
ST810 Lecture 24
http://en.wikipedia.org/wiki/Big_datahttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdfhttp://www.slideshare.net/fred.zimny/mckinsey-quarterlys-2011-report-the-challenge-and-opportunityof-big-datahttp://www.slideshare.net/fred.zimny/mckinsey-quarterlys-2011-report-the-challenge-and-opportunityof-big-datahttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdfhttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://en.wikipedia.org/wiki/Big_data8/12/2019 Aug Lagrangian and ADMM
15/19
ST810 Lecture 24
Final words
Coding Prototyping: R and Matlab
A real programming language: C/C++, Fortran, Python Scripting language: Python, Perl, JavaScript
Numerical linear algebra. Use standardlibraries(BLAS,LAPACK)! Sparse linear algebra is critical for exploitingsparsitystructure in big data
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
16/19
Final words
Optimization Disciplined convex programming(LS, LP, QP, GP, SOCP, SDP)
Convex programming is becoming atechnology, just like least
squares (LS). Many statisticians dont realize this Specialized tools in statistics: EM/MM, Fisher scoring,
Gauss-Newton, simulated annealing, ... Combinatorial optimization techniques: divide-and-conquer,
dynamic programming, greedy algorithm, ...
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
17/19
Final words
About final project
In your presentation
describe your research question
what variables/features in the data are used describe preprocessing procedure
describe implementation details: language, software, algorithm,timing, ...
describe the difficulties you met. Which are or are not working?
send us your slides before your presentation, so we can givebetter feedback
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
18/19
Final words
(Partial) answers to your questions
Weis group lasso (with equality constraint) problem: SOCP,accelerated proximal gradient (Nesterov) method, ADMM
Tians fused-lasso problem: QP, ADMM, accelerated proximalgradient method coupled with DP, re-parameterize to lasso, pathalgorithm
Kehuis composite quantile regression problem: LP (althoughoriginal problem is non-convex)
Shikai: SDP
Feel free to ask more
ST810 Lecture 24
8/12/2019 Aug Lagrangian and ADMM
19/19
References
Bertsekas, D. P. (1982). Constrained Optimization and Lagrange Multiplier Methods.Computer Science and Applied Mathematics. Academic Press Inc. [Harcourt BraceJovanovich Publishers], New York.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributedoptimization and statistical learning via the alternating direction method ofmultipliers. Found. Trends Mach. Learn., 3(1):1122.
Dykstra, R. L. (1983). An algorithm for restricted least squares regression. J. Amer.Statist. Assoc., 78(384):837842.
Goldstein, T. and Osher, S. (2009). The split Bregman method forl1-regularizedproblems. SIAM J. Img. Sci., 2:323343.
Hestenes, M. R. (1969). Multiplier and gradient methods. J. Optimization Theory Appl.,4:303320.
Powell, M. J. D. (1969). A method for nonlinear constraints in minimization problems. InOptimization (Sympos., Univ. Keele, Keele, 1968), pages 283298. AcademicPress, London.
Tukey, J. W. (1962). The future of data analysis. Ann. Math. Statist., 33:167.
Yin, W., Osher, S., Goldfarb, D., and Darbon, J. (2008). Bregman iterative algorithmsforl1-minimization with applications to compressed sensing. SIAM J. Imaging Sci.,1(1):143168.