The steepest descent method for computing Riemannian ... · Computing Riemannian center of mass on Riemannian manifolds The problem of computing the Riemannian center of mass has

The steepest descent method for computingRiemannian center of mass on Hadamard manifolds

Joao Xavier da Cruz Neto∗

Federal University of Piauı (UFPI)

Workshop on Optimization on Manifolds

Joint work with: G.C. Bento (UFG), J.C. O. Souza (UFPI),P.R. Oliveira (UFRJ) and S.D. Bitar (UFAM)

August 9, 2019

J.X. Cruz Neto Chemnitz - German August 9, 2019 1 / 39

Summary

1 Influence of the Curvature in the Convergence of Agorithms

2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms

3 Riemannian center of mass on Riemannian manifolds

4 Numerical experiments


Steepest descent method (SDM)

We recall the steepest descent method for solving the followingminimization problem

minx∈M

f (x), (1)

where f : M → R is continuously differentiable which its gradient isLipschitz with constant L > 0. The steepest descent method generates asequence as follows:


Steepest descent method

Algorithm 1 (Steepest Descent Method)Initialization: Choose x0 ∈ M;Stopping rule: Given xk , if xk is a critical point of f , then set xk+p = xk

for all p ∈ N. Otherwise, compute the iterative step;Iterative step: Take as the next iterate any xk+1 ∈ M such that

xk+1 = expxk (−tkgradf (xk)), (2)

where tk is some positive stepsize.



We consider the next two possibilities of the stepsize rule:

We choose such a sequence {tk} as follows:Given δ1, δ2 > 0 such that Lδ1 + δ2 < 1, where L is the Lipschitzconstant associated to the gradient map of f . Take {tk} such that

tk ∈(δ1,

2

L(1− δ2)

), ∀k ≥ 0. (3)

The sequence {tk} as in (3) is called fixed stepsize rule.



Let {tk} be a sequence obtained by

tk : = max{

2−j : j ∈ N, f(

expxk (2−jgradf (xk))≤

f (xk)− α2−j ‖gradf (xk)‖2},

with α ∈ (0, 1). This stepsize rule is the so-called Armijo’s search.


In order to obtain full convergence results in:

Cruz Neto, J.X., Lima, L.L. and Oliveira, P.R., Geodesic algorithms inRiemannian geometry. Balkan J. Geom. Appl., 3 (1998), pp. 89-100.

The authors assume that M has nonnegative sectional curvature and f is aconvex function.


Proximal Point Method

For a starting point x0 ∈ M, the proximal point method for solvingoptimization problem (1) generates a sequence {xk} ⊂ M in the followingform:

xk+1 ∈ argminy∈M

{f (y) +

λk2d2(y , xk)

}, (4)

where {λk} is a sequence of positive numbers such that 0 < a ≤ λk ≤ b.

Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm onRiemannian manifold. Optimization. 51, 257-270 (2002)

The authors assume that M is a Hadamard manifold and f is a convexfunction.


Summary






Kurdyka- Lojasiewicz property

Definition

A proper and lower semicontinuous function f : M → R ∪ {+∞} is said tosatisfy the Kurdyka-Lojasiewicz property at x ∈ dom ∂f iff there existη ∈]0,+∞], a neighbourhood U of x , and a continuous concave functionϕ : [0, η[→ R+, such that

ϕ(0) = 0, ϕ ∈ C 1(0, η), ϕ′(s) > 0, s ∈]0, η[;

ϕ′(f (x)− f (x))dist(0, ∂f (x)) ≥ 1, x ∈ U ∩ [f (x) < f < f (x) + η],


Analytic manifolds

For each x ∈ M, the distance function on M with base point x , is definedby dx := d(x , ·).

Theorem

Let M be a finite dimensional, connected, complete, real analyticRiemannian manifold and x ∈ M. Then, dx is a subanalytic function.

Martin Tamm, Subanalytic sets in the calculus of variation, ActaMath. 146 (1981), no. 3-4, 167–199.

Therefore, dx satisfy the Kurdyka- Lojasiewicz property.


Stiefel manifolds

An important class of analytical and compact manifold whose sign ofthe sectional curvature can be not constant, is the setVp(Rn) := {X ∈ Rnp| XTX = Ip} of the n × p orthonormal matrices.

T. Rapcsak, Sectional curvatures in nonlinear optimization, J. GlobalOptim. 40 (2008), no. 1-3, 375–388.


Morse functions

Theorem

Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.

See Hirsh Theorem 1.2, page 147

M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.

Theorem

If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.

Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)


Morse functions

Theorem

Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.

See Hirsh Theorem 1.2, page 147

M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.

Theorem

If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.

Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)




Assume that x0 ∈ domf , x ∈ M is an accumulation point of the sequence{xk}, and f satisfies the Kurdyka-Lojasiewicz property at x . Then,f (xk)→ f (x) and the sequence {xk} converges to x , which is a criticalpoint of f .

G. C. Bento, J. X. Cruz Neto, and P. R. Oliveira, A new approach tothe proximal point method: convergence on general Riemannianmanifolds, J. Optim. Theory Appl. 168 (2016), no. 3, 743–755.


Steepest Descent Method

Let M be a Hadamard manifold, x0 ∈ domf , x∗ ∈ M is an accumulationpoint of the sequence {xk}, and f satisfies the Kurdyka-Lojasiewiczproperty at x∗. Then, {xk} converges to x∗ which is a critical point of f .


Summary






Computing Riemannian center of mass on Riemannianmanifolds

Consider the problem of computing (global) Riemannian Lp center of massof the data set {ai}ni=1 ⊂ M on a Riemannian manifold with respect toweights 0 ≤ wi ≤ 1, such that

∑ni=1 wi = 1. The Riemannian Lp center of

mass is defined as the solution set of the following problem

minx∈M

fp(x) :=1

p

n∑i=1

widp(x , ai ), (5)

for 1 ≤ p <∞. If p =∞, the center of mass is defined as the minimizersof maxi d(x , ai ) in M.



The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:

Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)

Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)

Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)

Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
















Computing Riemannian center of mass on Hadamardmanifolds

Proposition 1

Let {ai}ni=1 ⊂ M be the data set and let γ be a unit speed geodesic suchthat γ(0) = x , where x 6= ai , for i = 1, . . . , n. Then, there exists aconstant α ≥ 0 such that

α ≤ d2

dt2(f1 ◦ γ)(t)|t=0.

Furthermore, if the points a1, . . . , an are not collinear, then α > 0.

Recall that the points a1, . . . , an are said to be collinear if they reside onthe same geodesic, i.e., there exist y ∈ M, v ∈ TyM and ti ∈ R,i = 1, . . . , n, such that ai = expy tiv , for each i = 1, . . . , n.



Theorem

Let M be a simply connected, complete Riemann manifold of nonpositivesectional curvature. Assume the points Pi ∈ M, i = 1, . . . , n belong to ageodesic σ : [0, 1]→ M such that Pi = σ(ti ) with 0 ≤ ti . . . ≤ 1. Then:

1 the unique minimum point for f1 is Pn/2 whenever n is odd;

2 the minimum points for f1 are situated on σ, between Pn/2 andPn/2+1 whenever n is even.




Proposition 2

Let C be a compact set such that ai /∈ C , for each i = 1, . . . , n. Then, thevector field grad f1 : M → TM is Lipschitz continuous on C .



Proposition 3

The following statements hold:

(a) The function f1(x) =∑n

i=1 wid(x , ai ) is convex;

(b) The problem (5), for p = 1, always has a solution. Furthermore, if thepoints a1, . . . , an are not collinear, then the solution is unique;

(c) Let i0 ∈ {1, . . . , n} be an index such that f1(ai0) = mini=1,...,n f1(ai ).Then, ai0 is a minimizer of f1 on M if and only if∣∣∣∣∣∣

∣∣∣∣∣∣n∑

i=1,i 6=i0

wi

exp−1ai0ai

d(ai , ai0)

∣∣∣∣∣∣∣∣∣∣∣∣ ≤ wi0 .



Proposition 4


(a) The function f2(x) = 12

∑ni=1 wid

2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;

(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.

Proposition 5

The function f2(x) = 12

∑ni=1 wid

2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.



Proposition 4


(a) The function f2(x) = 12

∑ni=1 wid

2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;

(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.

Proposition 5

The function f2(x) = 12

∑ni=1 wid

2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.



We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in

{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,

we have that ai /∈ Lf (x0), for each i = 1, . . . , n.

Theorem 2

The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.

Theorem 3

The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.






Theorem 2


Theorem 3







Theorem 2


Theorem 3



Summary






Numerical experiments

Let M := (Sm++, 〈 , 〉) be the Riemannian manifold endowed with theRiemannian metric induced by the Euclidean Hessian ofΨ(X ) = − ln detX ,

〈U,V 〉 = tr (VΨ′′(X )U) = tr (VX−1UX−1), X ∈ M, U,V ∈ TXM,

where Sm++ be the cone of the symmetric positive definite matrices bothm ×m.In this case, for any X ,Y ∈ M the unique geodesic joining those twopoints is given by:

γ(t) = X 1/2(X−1/2YX−1/2

)tX 1/2, t ∈ [0, 1].



Thus, for each X ∈ M, expX : TXM → M and exp−1X : M → TXM aregiven, respectively, by

expX V = X 1/2e(X−1/2YX−1/2)X 1/2, exp−1X Y = X 1/2 ln(X−1/2YX−1/2

)X 1/2.

d2(X ,Y ) = tr(

ln2 X−1/2YX−1/2)

=n∑

i=1

ln2 λi

(X−

12YX−

12

), (6)



In our simulations, we consider different scenes taking into account threeparameters: the number of matrices n in the data set {Qi}ni=1, the sizem ×m of the matrices and the stopping rule ε > 0. The random matriceswe use for our test are generated with an uniform (well conditioned) andnon-uniform (ill conditioned) distribution of the eigenvalues of each matrixof the data set. The ill conditioned data set is generated as follows:

Hence, the non-uniform distribution satisfiesλmax

λmin> 102, where λmax and

λmin denote the largest and the smallest eigenvalues of each matrix of thedata set, respectively.



Next figures plot some results for m ×m matrices (m = 5, 10, 20, 40) fordifferent data sets (n = 25, 50, 75, 100).

Figure: Uniform : m = 5 and ε = 10−8



Figure: Non-uniform - m = 5 and ε = 10−8
























Thank you for you attencion!


The steepest descent method for computing Riemannian ... · Computing Riemannian center of mass on Riemannian manifolds The problem of computing the Riemannian center of mass has

Documents