The steepest descent method for computing Riemannian center of mass on Hadamard manifolds Jo˜ ao Xavier da Cruz Neto * Federal University of Piau´ ı (UFPI) Workshop on Optimization on Manifolds Joint work with: G.C. Bento (UFG), J.C. O. Souza (UFPI), P.R. Oliveira (UFRJ) and S.D. Bitar (UFAM) August 9, 2019 J.X. Cruz Neto Chemnitz - German August 9, 2019 1 / 39
45
Embed
The steepest descent method for computing Riemannian ... · Computing Riemannian center of mass on Riemannian manifolds The problem of computing the Riemannian center of mass has
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The steepest descent method for computingRiemannian center of mass on Hadamard manifolds
Joao Xavier da Cruz Neto∗
Federal University of Piauı (UFPI)
Workshop on Optimization on Manifolds
Joint work with: G.C. Bento (UFG), J.C. O. Souza (UFPI),P.R. Oliveira (UFRJ) and S.D. Bitar (UFAM)
August 9, 2019
J.X. Cruz Neto Chemnitz - German August 9, 2019 1 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 2 / 39
Steepest descent method (SDM)
We recall the steepest descent method for solving the followingminimization problem
minx∈M
f (x), (1)
where f : M → R is continuously differentiable which its gradient isLipschitz with constant L > 0. The steepest descent method generates asequence as follows:
J.X. Cruz Neto Chemnitz - German August 9, 2019 3 / 39
Steepest descent method
Algorithm 1 (Steepest Descent Method)Initialization: Choose x0 ∈ M;Stopping rule: Given xk , if xk is a critical point of f , then set xk+p = xk
for all p ∈ N. Otherwise, compute the iterative step;Iterative step: Take as the next iterate any xk+1 ∈ M such that
xk+1 = expxk (−tkgradf (xk)), (2)
where tk is some positive stepsize.
J.X. Cruz Neto Chemnitz - German August 9, 2019 4 / 39
Steepest descent method
We consider the next two possibilities of the stepsize rule:
We choose such a sequence {tk} as follows:Given δ1, δ2 > 0 such that Lδ1 + δ2 < 1, where L is the Lipschitzconstant associated to the gradient map of f . Take {tk} such that
tk ∈(δ1,
2
L(1− δ2)
), ∀k ≥ 0. (3)
The sequence {tk} as in (3) is called fixed stepsize rule.
J.X. Cruz Neto Chemnitz - German August 9, 2019 5 / 39
Steepest descent method
Let {tk} be a sequence obtained by
tk : = max{
2−j : j ∈ N, f(
expxk (2−jgradf (xk))≤
f (xk)− α2−j ‖gradf (xk)‖2},
with α ∈ (0, 1). This stepsize rule is the so-called Armijo’s search.
J.X. Cruz Neto Chemnitz - German August 9, 2019 6 / 39
In order to obtain full convergence results in:
Cruz Neto, J.X., Lima, L.L. and Oliveira, P.R., Geodesic algorithms inRiemannian geometry. Balkan J. Geom. Appl., 3 (1998), pp. 89-100.
The authors assume that M has nonnegative sectional curvature and f is aconvex function.
J.X. Cruz Neto Chemnitz - German August 9, 2019 7 / 39
Proximal Point Method
For a starting point x0 ∈ M, the proximal point method for solvingoptimization problem (1) generates a sequence {xk} ⊂ M in the followingform:
xk+1 ∈ argminy∈M
{f (y) +
λk2d2(y , xk)
}, (4)
where {λk} is a sequence of positive numbers such that 0 < a ≤ λk ≤ b.
The authors assume that M is a Hadamard manifold and f is a convexfunction.
J.X. Cruz Neto Chemnitz - German August 9, 2019 8 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 9 / 39
Kurdyka- Lojasiewicz property
Definition
A proper and lower semicontinuous function f : M → R ∪ {+∞} is said tosatisfy the Kurdyka-Lojasiewicz property at x ∈ dom ∂f iff there existη ∈]0,+∞], a neighbourhood U of x , and a continuous concave functionϕ : [0, η[→ R+, such that
ϕ(0) = 0, ϕ ∈ C 1(0, η), ϕ′(s) > 0, s ∈]0, η[;
ϕ′(f (x)− f (x))dist(0, ∂f (x)) ≥ 1, x ∈ U ∩ [f (x) < f < f (x) + η],
J.X. Cruz Neto Chemnitz - German August 9, 2019 10 / 39
Analytic manifolds
For each x ∈ M, the distance function on M with base point x , is definedby dx := d(x , ·).
Theorem
Let M be a finite dimensional, connected, complete, real analyticRiemannian manifold and x ∈ M. Then, dx is a subanalytic function.
Martin Tamm, Subanalytic sets in the calculus of variation, ActaMath. 146 (1981), no. 3-4, 167–199.
Therefore, dx satisfy the Kurdyka- Lojasiewicz property.
J.X. Cruz Neto Chemnitz - German August 9, 2019 11 / 39
Stiefel manifolds
An important class of analytical and compact manifold whose sign ofthe sectional curvature can be not constant, is the setVp(Rn) := {X ∈ Rnp| XTX = Ip} of the n × p orthonormal matrices.
T. Rapcsak, Sectional curvatures in nonlinear optimization, J. GlobalOptim. 40 (2008), no. 1-3, 375–388.
J.X. Cruz Neto Chemnitz - German August 9, 2019 12 / 39
Morse functions
Theorem
Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.
See Hirsh Theorem 1.2, page 147
M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.
Theorem
If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.
Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)
J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39
Morse functions
Theorem
Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.
See Hirsh Theorem 1.2, page 147
M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.
Theorem
If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.
Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)
J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39
Proximal Point Method
Proximal Point Method
Assume that x0 ∈ domf , x ∈ M is an accumulation point of the sequence{xk}, and f satisfies the Kurdyka-Lojasiewicz property at x . Then,f (xk)→ f (x) and the sequence {xk} converges to x , which is a criticalpoint of f .
G. C. Bento, J. X. Cruz Neto, and P. R. Oliveira, A new approach tothe proximal point method: convergence on general Riemannianmanifolds, J. Optim. Theory Appl. 168 (2016), no. 3, 743–755.
J.X. Cruz Neto Chemnitz - German August 9, 2019 14 / 39
Steepest Descent Method
Let M be a Hadamard manifold, x0 ∈ domf , x∗ ∈ M is an accumulationpoint of the sequence {xk}, and f satisfies the Kurdyka-Lojasiewiczproperty at x∗. Then, {xk} converges to x∗ which is a critical point of f .
J.X. Cruz Neto Chemnitz - German August 9, 2019 15 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 16 / 39
Computing Riemannian center of mass on Riemannianmanifolds
Consider the problem of computing (global) Riemannian Lp center of massof the data set {ai}ni=1 ⊂ M on a Riemannian manifold with respect toweights 0 ≤ wi ≤ 1, such that
∑ni=1 wi = 1. The Riemannian Lp center of
mass is defined as the solution set of the following problem
minx∈M
fp(x) :=1
p
n∑i=1
widp(x , ai ), (5)
for 1 ≤ p <∞. If p =∞, the center of mass is defined as the minimizersof maxi d(x , ai ) in M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 17 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 1
Let {ai}ni=1 ⊂ M be the data set and let γ be a unit speed geodesic suchthat γ(0) = x , where x 6= ai , for i = 1, . . . , n. Then, there exists aconstant α ≥ 0 such that
α ≤ d2
dt2(f1 ◦ γ)(t)|t=0.
Furthermore, if the points a1, . . . , an are not collinear, then α > 0.
Recall that the points a1, . . . , an are said to be collinear if they reside onthe same geodesic, i.e., there exist y ∈ M, v ∈ TyM and ti ∈ R,i = 1, . . . , n, such that ai = expy tiv , for each i = 1, . . . , n.
J.X. Cruz Neto Chemnitz - German August 9, 2019 19 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Theorem
Let M be a simply connected, complete Riemann manifold of nonpositivesectional curvature. Assume the points Pi ∈ M, i = 1, . . . , n belong to ageodesic σ : [0, 1]→ M such that Pi = σ(ti ) with 0 ≤ ti . . . ≤ 1. Then:
1 the unique minimum point for f1 is Pn/2 whenever n is odd;
2 the minimum points for f1 are situated on σ, between Pn/2 andPn/2+1 whenever n is even.
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
J.X. Cruz Neto Chemnitz - German August 9, 2019 20 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 2
Let C be a compact set such that ai /∈ C , for each i = 1, . . . , n. Then, thevector field grad f1 : M → TM is Lipschitz continuous on C .
J.X. Cruz Neto Chemnitz - German August 9, 2019 21 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 3
The following statements hold:
(a) The function f1(x) =∑n
i=1 wid(x , ai ) is convex;
(b) The problem (5), for p = 1, always has a solution. Furthermore, if thepoints a1, . . . , an are not collinear, then the solution is unique;
(c) Let i0 ∈ {1, . . . , n} be an index such that f1(ai0) = mini=1,...,n f1(ai ).Then, ai0 is a minimizer of f1 on M if and only if∣∣∣∣∣∣
∣∣∣∣∣∣n∑
i=1,i 6=i0
wi
exp−1ai0ai
d(ai , ai0)
∣∣∣∣∣∣∣∣∣∣∣∣ ≤ wi0 .
J.X. Cruz Neto Chemnitz - German August 9, 2019 22 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 4
The following statements hold:
(a) The function f2(x) = 12
∑ni=1 wid
2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;
(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.
Proposition 5
The function f2(x) = 12
∑ni=1 wid
2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 4
The following statements hold:
(a) The function f2(x) = 12
∑ni=1 wid
2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;
(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.
Proposition 5
The function f2(x) = 12
∑ni=1 wid
2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 25 / 39
Numerical experiments
Let M := (Sm++, 〈 , 〉) be the Riemannian manifold endowed with theRiemannian metric induced by the Euclidean Hessian ofΨ(X ) = − ln detX ,
〈U,V 〉 = tr (VΨ′′(X )U) = tr (VX−1UX−1), X ∈ M, U,V ∈ TXM,
where Sm++ be the cone of the symmetric positive definite matrices bothm ×m.In this case, for any X ,Y ∈ M the unique geodesic joining those twopoints is given by:
γ(t) = X 1/2(X−1/2YX−1/2
)tX 1/2, t ∈ [0, 1].
J.X. Cruz Neto Chemnitz - German August 9, 2019 26 / 39
Numerical experiments
Thus, for each X ∈ M, expX : TXM → M and exp−1X : M → TXM aregiven, respectively, by
expX V = X 1/2e(X−1/2YX−1/2)X 1/2, exp−1X Y = X 1/2 ln(X−1/2YX−1/2
)X 1/2.
d2(X ,Y ) = tr(
ln2 X−1/2YX−1/2)
=n∑
i=1
ln2 λi
(X−
12YX−
12
), (6)
J.X. Cruz Neto Chemnitz - German August 9, 2019 27 / 39
Numerical experiments
In our simulations, we consider different scenes taking into account threeparameters: the number of matrices n in the data set {Qi}ni=1, the sizem ×m of the matrices and the stopping rule ε > 0. The random matriceswe use for our test are generated with an uniform (well conditioned) andnon-uniform (ill conditioned) distribution of the eigenvalues of each matrixof the data set. The ill conditioned data set is generated as follows:
Hence, the non-uniform distribution satisfiesλmax
λmin> 102, where λmax and
λmin denote the largest and the smallest eigenvalues of each matrix of thedata set, respectively.
J.X. Cruz Neto Chemnitz - German August 9, 2019 28 / 39
Numerical experiments
Next figures plot some results for m ×m matrices (m = 5, 10, 20, 40) fordifferent data sets (n = 25, 50, 75, 100).
Figure: Uniform : m = 5 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 29 / 39
Numerical experiments
Figure: Non-uniform - m = 5 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 30 / 39
Numerical experiments
Figure: Uniform : m = 10 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 31 / 39
Numerical experiments
Figure: Non-uniform - m = 10 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 32 / 39
Numerical experiments
Figure: Uniform : m = 20 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 33 / 39
Numerical experiments
Figure: Non-uniform - m = 20 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 34 / 39
Numerical experiments
Figure: Uniform : m = 40 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 35 / 39
Numerical experiments
Figure: Non-uniform - m = 40 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 36 / 39
Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 37 / 39
Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 38 / 39
Thank you for you attencion!
J.X. Cruz Neto Chemnitz - German August 9, 2019 39 / 39