Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms. Hedy ATTOUCH Universit´ e Montpellier 2 ACSIOM, I3M UMR CNRS 5149 Joint work with J. Peypouquet, P. Redont, and Z. Chbani CHALLENGES IN OPTIMIZATION FOR DATA SCIENCE Laboratoire Jacques-Louis Lions, Universit´ e Pierre et Marie Curie, July 1-2, 2015 H. ATTOUCH (Univ. Montpellier 2) Fast inertial dynamics for convex optimization. Convergence of FISTA algo 1 / 47
47
Embed
Fast inertial dynamics for convex optimization. …plc/data2015/attouch2015.pdfFast inertial dynamics for convex optimization. Convergence of FISTA algorithms. Hedy ATTOUCH Universit
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Fast inertial dynamics for convex optimization.Convergence of FISTA algorithms.
Hedy ATTOUCH
Universite Montpellier 2ACSIOM, I3M UMR CNRS 5149
Joint work with J. Peypouquet, P. Redont, and Z. Chbani
CHALLENGES IN OPTIMIZATION FOR DATA SCIENCE
Laboratoire Jacques-Louis Lions, Universite Pierre et Marie Curie,July 1-2, 2015
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.1 / 47
1A. General presentation: dynamical system
Fast dynamical methods for convex minimization.
min Φ(x) : x ∈ H .
H real Hilbert space; ‖x‖2 = 〈x , x〉;Φ : H → R convex, continuously differentiable, argminΦ 6= ∅.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.24 / 47
6F. Fast related algorithms. Nesterov method, FISTA.
(AVD− algo)α
yk = xk + k−1k+α−1 (xk − xk−1);
xk+1 = proxsΦ (yk − s∇Ψ(yk)) .
Theorem 7 (Chambolle-Dossal, Su-Boyd-Candes)
Φ : H → R ∪ +∞ closed convex proper;
Ψ : H → R convex differentiable, ∇Ψ L-Lipschitz continuous;
S = argmin(Φ + Ψ) 6= ∅, s < 1L , α > 3.
Let (xk) be a sequence generated by (AVD− algo)α. Then,
xk x∗ ∈ argmin(Φ + Ψ) weakly as k → +∞;
(Φ + Ψ)(xk)−minH(Φ + Ψ) ≤ C
k2;∑
k k‖xk − xk−1‖2 < +∞, ‖xk − xk−1‖ ≤ Ck .
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.25 / 47
6G. Fast related algorithms. Proof of convergence
Step one. k 7→ Ek is the correspondent of Eα(·):
Ek =2s
α− 1(k+α−2)2(Θ(xk)−Θ∗)+
1
α− 1‖(k+α−1)yk−kxk−(α−1)x∗‖2
Ek is a strict Lyapunov function: for any k ∈ N
Ek + 2sα−1
((α− 3)(k + α− 2) + 1
)(Θ(xk)− inf Θ) ≤ Ek−1.
Fast convergence properties
(Φ + Ψ)(xk)−minH
(Φ + Ψ) ≤ C
k2∑k
k
((Φ + Ψ)(xk)−min
H(Φ + Ψ)
)< +∞.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.26 / 47
6H. Fast related algorithms. Proof of convergence
Step two. Next step consists in obtening the energy estimate∑k
k‖xk − xk−1‖2 < +∞,
= discrete version of the continuous energy estimate∫ ∞t0
t‖x(t)‖2dt < +∞.
Step three. The final step is to apply Opial’s lemma. Using theprevious estimates, it is a direct adaptation of the classical proof of theconvergence of proximal-like inertial algorithms. It is a parallelargument to that using the differential inequality with ‖xk − x∗‖2
instead of ‖x(t)− x∗‖2, and x∗ ∈ argmin(Φ + Ψ).
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.27 / 47
6I. A perturbed FISTA algorithm.
(AVD)α,g − algo
yk = xk + k−1k+α−1 (xk − xk−1);
xk+1 = proxsΦ (yk − s(∇Ψ(yk)− gk)) .
Theorem 8 (A-Chbani)
Φ : H → R ∪ +∞ closed convex proper;
Ψ : H → R convex differentiable, ∇Ψ L-Lipschitz continuous;
S = argmin(Φ + Ψ) 6= ∅, s < 1L , α > 3,
∑k k‖gk‖ < +∞.
Let (xk) be a sequence generated by (AVD)α,g − algo. Then,
Let x(·) be a classical global solution of (AVD)α,ε, α > 1.
Case 1:∫ +∞t0
ε(t)t dt < +∞. Then,
limt→+∞Φ(x(t)) = infHΦ, limt→+∞ ‖x(t)‖ = 0.
Case 2:∫ +∞t0
ε(t)t dt = +∞. Then,
lim inft→+∞ ‖x(t)− p‖ = 0
where p is the element of minimal norm of argminΦ.H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.34 / 47
(ωn)n≥1 is a sequence of independent identically distributed variables
Xn+1 = Xn − εn∇φ(Xn, ωn+1).
The stochastic approximation can be numerically improved:
Xn+1 = Xn − εn+1
∑ni εi∇φ(Xi , ωi+1)∑n
i εi
Limit ODE (n→ +∞,∑εn = +∞,
∑εn
p < +∞ for some p > 1)
sX (s) + X (s) +∇Φ(X (s)) = 0.
Time rescaling t = 2√s gives X (t) + 1
t X (t) +∇Φ(X (t)) = 0.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.35 / 47
12. Annex 2. Complexity aspects
In 1983, Nemirovsky & Yudin proved lower bounds on the complexityof first-order methods (number of subgradient calls needed to achieve agiven accuracy) for convex optimization under various regularityassumptions for the objective functions. See also Nesterov (2004).
1 They constructed convex, piecewise linear functions in dimensionsn > k, where no first-order method can have function values moreaccurate than O( 1√
k) after k subgradient evaluations.
2 They also constructed convex quadratic functions in dimensionsn ≥ 2k where no first-order method can have function values moreaccurate than O( 1
k2 ) after k gradient evaluations.
3 For strongly convex functions with Lipschitz continuous gradients,the known lower bounds on the complexity allow a dimensionindependent linear rate of convergence O(qk) with 0 < q < 1.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.36 / 47
13. Perspective, open questions
Convergence of the orbits for α = 3? of Nesterov algorithm?
Convergence of the values: exhibit concrete examples showing thatα = 3 is critical. Rate of convergence for a(t) = α
t , 1 ≤ α < 3?
Find a Lyapunov function in the case 1tγ , giving the rate of
convergence 1t1+γ for W (t).
Extend to the algorithmic part the convergence properties of thecontinuous dynamic (strong convergence...).
Adaptive restart for (DIN-AVD), without strong convexity.
Compare /combine with other rapid methods: multigrid, Newtonbased methods, other type of friction (dry).
Show the O( 1t2 ) convergence of the values for (DIN-AVD),
combining Hessian driven and asymptotic vanishing damping.
Extension to a non-convex setting: for analytic potentials, theconvergence theory for HBF (HJ), and DIN (AABR) still works.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.37 / 47
References
B. Abbas, H. Attouch, and B. F. Svaiter, Newton-likedynamics and forward-backward methods for structured monotoneinclusions in Hilbert spaces, JOTA, 161 (2014), N0 2, pp. 331–360.
S. Adly, H. Attouch, A. Cabot, Finite time stabilization ofnonlinear oscillators subject to dry friction, Nonsmooth Mechanicsand Analysis (ed. P. Alart, O. Maisonneuve, R.T. Rockafellar),Adv. in Math. and Mech., Kluwer , 12 (2006) pp. 289–304.
F. Alvarez, On the minimizing property of a second-orderdissipative system in Hilbert spaces, SIAM J. Control Optim., 38(2000), N0 4, pp. 1102-1119.
F. Alvarez, H. Attouch, An inertial proximal method formaximal monotone operators via discretization of a nonlinearoscillator with damping, Set-Valued Analysis, 9 (2001), pp. 3–11.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.38 / 47
References
F. Alvarez, H. Attouch, Convergence and asymptoticstabilization for some damped hyperbolic equations withnon-isolated equilibria, ESAIM Control Optim. Calc. of Var., 6(2001), pp. 539–552.
F. Alvarez, H. Attouch, J. Bolte, P. Redont, Asecond-order gradient-like dissipative dynamical system withHessian-driven damping. Application to optimization andmechanics, J. Math. Pures Appl., 81 (2002), No. 8, pp. 747–779.
H. Attouch, G. Buttazzo, G. Michaille, Variational analysisin Sobolev and BV spaces. Applications to PDE’s and optimization,MPS/SIAM Series on Optimization, 6, SIAM, Philadelphia, PA,Second edition, 2014, 793 pages.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.39 / 47
References
H. Attouch, A. Cabot, P. Redont, The dynamics of elasticshocks via epigraphical regularization of a differential inclusion,Adv. Math. Sci. Appl., 12 (2002), N0 1, pp. 273–306.
H. Attouch and M.-O. Czarnecki, Asymptotic control andstabilization of nonlinear oscillators with non-isolated equilibria, J.Differential Equations, 179 (2002), pp. 278–310.
H. Attouch, X. Goudou and P. Redont, The heavy ball withfriction method. The continuous dynamical system, globalexploration of the local minima of a real-valued function, Commun.Contemp. Math., 2 (2000), N0 1, pp. 1–34.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.40 / 47
References
H. Attouch, P.E. Mainge, P. Redont, A second-orderdifferential system with Hessian-driven damping; Application tonon-elastic shock laws, Differential Equations and Applications, 4(2012), N0 1, pp. 27–65.
H. Attouch, J. Peypouquet, P. Redont, A dynamicalapproach to an inertial forward-backward algorithm for convexminimization, SIAM J. Optim., 24 (2014), No. 1, pp. 232–256.
H. Attouch and A. Soubeyran, Inertia and reactivity indecision making as cognitive variational inequalities, Journal ofConvex Analysis, 13 (2006), pp. 207–224.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.41 / 47
References
J.-B. Baillon, Un exemple concernant le comportementasymptotique de la solution du probleme du
H. Bauschke and P. Combettes, Convex Analysis andMonotone Operator Theory in Hilbert spaces , CMS Books inMathematics, Springer, (2011).
A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAMJ. Imaging Sci., 2 (1) (2009), pp. 183–202.
H. Brezis, Operateurs maximaux monotones dans les espaces deHilbert et equations d’evolution, Lecture Notes 5, North Holland,(1972).
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.42 / 47
References
H. Brezis, Asymptotic behavior of some evolution systems:Nonlinear evolution equations, Academic Press, New York, (1978),pp. 141–154.
R.E. Bruck, Asymptotic convergence of nonlinear contractionsemigroups in Hilbert spaces, J. Funct. Anal., 18 (1975), pp. 15–26.
A. Cabot, Inertial gradient-like dynamical system controlled by astabilizing term, J. Optim. Theory Appl., 120 (2004), pp. 275–303.
A. Cabot, H. Engler, S. Gadat, On the long time behavior ofsecond-order differential equations with asymptotically smalldissipation Trans. AMS, 361 (2009), pp. 5983–6017.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.43 / 47
References
A. Cabot, H. Engler, S. Gadat, Second order differentialequations with asymptotically small dissipation and piecewise flatpotentials, Electronic Journal of Differential Equations, 17 (2009),pp. 33–38.
A. Cabot, P. Frankel, Asymptotics for some semilinearhyperbolic equations with non-autonomous damping, J. DifferentialEquations 252 (2012), pp. 294–322.
A. Chambolle, Ch. Dossal, On the convergence of the iteratesof Fista, HAL Id: hal-01060130 https://hal.inria.fr/hal-01060130v3Submitted on 20 Oct 2014.
O. Guler, New proximal point algorithms for convexminimization, SIAM J. Optim., 2 (4), (1992), pp. 649–664.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.44 / 47
References
A. Haraux, M. Jendoubi, Convergence of solutions ofsecond-order gradient-like systems with analytic nonlinearities,Journal of Differential Equations, 144 (2), (1998).
A. Moudafi, M. Oliny, Convergence of a splitting inertialproximal method for monotone operators, J. Comput. Appl. Math.,155 (2), (2003), pp. 447–454.
Y. Nesterov, A method of solving a convex programming problemwith convergence rate O(1/k2). In Soviet Mathematics Doklady, 27(1983), pp. 372–376.
Y. Nesterov, Introductory lectures on convex optimization: Abasic course, volume 87 of Applied Optimization. Kluwer AcademicPublishers, Boston, MA, 2004.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.45 / 47
References
Y. Nesterov, Smooth minimization of non-smooth functions,Mathematical Programming, 103 (1) (2005), pp. 127–152.
Y. Nesterov, Gradient methods for minimizing compositeobjective function, CORE Discussion Papers, 2007.
Z. Opial, Weak convergence of the sequence of successiveapproximations for nonexpansive mappings, Bull. Amer. Math.Soc., 73 (1967), pp. 591–597.
B. O Donoghue and E. J. Candes, Adaptive restart foraccelerated gradient schemes, Found. Comput. Math., 2013.
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.46 / 47
References
J. Peypouquet and S. Sorin, Evolution equations for maximalmonotone operators: asymptotic analysis in continuous and discretetime, Journal of Convex Analysis, 17, (2010), pp. 1113–1163.
D. A. Lorenz and Thomas Pock, An inertial forward-backwardalgorithm for monotone inclusions, J. Math. Imaging Vision, 2014.
A.S. Nemirovsky and D.B. Yudin, Problem complexity andmethod efficiency in optimization, Wiley, New York, 1983.
M. Schmidt, N. Le Roux, F. Bach, Convergence Rates ofInexact Proximal- Gradient Methods for Convex Optimization,NIPS’11 -Grenada, Spain. 2011. <inria-00618152v3>, HAL.
W. Su, S. Boyd, E. J. Candes, A Differential Equation forModeling Nesterov’s Accelerated Gradient Method, Advances inNeural Information Processing Systems 27 (NIPS 2014).
H. ATTOUCH (Univ. Montpellier 2)Fast inertial dynamics for convex optimization. Convergence of FISTA algorithms.47 / 47