Optimization Algorithms on Homogeneous Spaces WITH APPLICATIONS IN LINEAR SYSTEMS THEORY Robert Mahony March 1994 Presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy at the Australian National University Department of Systems Engineering Research School of Information Sciences and Engineering Australian National University
219
Embed
Optimization Algorithms on Homogeneous Spacesusers.cecs.anu.edu.au/~john/studenttheses/robertmahony.pdf5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127 5.4 Gradient
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Optimization Algorithmson
Homogeneous Spaces
WITH APPLICATIONS IN LINEAR SYSTEMS THEORY
Robert Mahony
March 1994
Presented in partial fulfilment of the requirements
for the degree of Doctor of Philosophy
at the Australian National University
Department of Systems Engineering
Research School of Information Sciences and Engineering
Australian National University
Acknowledgements
I would like to thank my supervisors John Moore and Iven Mareels for their support, insight,
technical help and for teaching me to enjoy research. Thanks to Uwe Helmke for his enthusiasm
and support and Wei-Yong Yan for many useful suggestions. I would also like to thank the
other staff and students of the department for providing an enjoyable and exciting environment
for work, especially the students from lakeview for not working too hard. I reserve a special
thanks for Peter Kootsookos because I owe him one.
I have been lucky enough to visit Unite Auto, Catholic University of Leuven, Louvain-la-Neuve
and the Department of Mathematics, University of Regensburg, for extended periods during
my studies and thank the staff and students of both institutions for their support.
A number of people have made helpful comments and contributions to the results contained in
this thesis. In particular, I would like to thank George Bastin, Guy Campion, Kenneth Driessel,
Ed Henrich, Ian Hiskens, David Hill and David Stewart as well several anonymous reviewers.
Apart from the support of the Australian National University I have also received additional
financial support from the following sources:
The Cooperative Research Centre for Robust and Adaptive Systems, funded by the Aus-
tralian Commonwealth Government under the Cooperative Research Centres Program.
Grant I-0184-078.06/91 from the G.I.F., the German-Israeli Foundation for Scientific
Research and Development
Boeing Commercial Aircraft Corporation.
Lastly I thank Pauline Allingham for her support and care throughout my doctorate.
i
Statement of Originality
The work presented in this thesis is the result of original research done by myself, in col-
laboration with others, while enrolled in the Department of Systems Engineering as a Doctor
of Philosophy student. It has not been submitted for any other degree or award in any other
university or educational institution.
Following is a list of publications in refereed journals and conference proceedings completed
while I was a Doctor of Philosophy student. Much of the technical discussion given in this
thesis is based on work described in the papers numbers [1,2,5,6,10,11] from the list below.
The remaining papers cover material I chose not to include in this thesis.
Journal Papers:
1. R. E. Mahony and U. Helmke. System assignment and pole placement for symmetric
realisations. Submitted to Journal of Mathematical Systems, Estimation and Control,
1994.
2. R. E. Mahony, U. Helmke, and J. B. Moore. Gradient algorithms for principal component
analysis. Submitted to Journal of the Australian Mathematical Society, 1994.
3. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and
implications for Lyapunov direct stability methods. To appear Journal of Mathematical
Systems, Estimation and Control, 1994.
4. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Non-linear feedback laws for
output regulation. Draft version, 1994.
5. J. B. Moore, R. E. Mahony, and U. Helmke. Numerical gradient algorithms for eigenvalue
and singular value calculations. SIAM Journal of Matrix Analysis, 15(3), 1994.
ii
Conference Papers:
6. R. E. Mahony, U. Helmke, and J. B. Moore. Pole placement algorithms for symmetric
realisations. In Proceedings of IEEE Conference on Decision and Control, San Antonio,
U.S.A., 1993.
7. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and
implications for Lyapunov stability methods. In Proceedings of the 12’th World Congress
of the International Federation of Automatic Control, Sydney, Australia, 1993.
8. R. E. Mahony and I. M. Mareels. Non-linear feedback laws for output stabilization.
Submitted to the IEEE Conference on Decision and Control, 1994.
9. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Output regulation for systems
linear in the input. In Conference on Mathematical Theory of Networks and Systems,
Regensburg, Germany, 1993.
10. R. E. Mahony and J. B. Moore. Recursive interior-point linear programming algo-
rithm based on Lie-Brockett flows. In Proceedings of the International Conference on
Optimisation: Techniques and Applications, Singapore, 1992.
11. J. B. Moore, R. E. Mahony, and U. Helmke. Recursive gradient algorithms for eigenvalue
and singular value decompositions. In Proceedings of the American Control Conference,
Chicago, U.S.A., 1992.
Robert Mahony
iii
iv
Abstract
Constrained optimization problems are commonplace in linear systems theory. In many cases
the constraint set is a homogeneous space and the additional geometric insight provided by
the Lie-group structure provides a framework in which to tackle the numerical optimization
task. The fundamental advantage of this approach is that algorithms designed and implemented
using the geometry of the homogeneous space explicitly preserve the constraint set.
In this thesis the numerical solution of a number of optimization problems constrained
to homogeneous spaces are considered. The first example studied is the task of determining
the eigenvalues of a symmetric matrix (or the singular values of an arbitrary matrix) by inter-
polating known gradient flow solutions using matrix exponentials. Next the related problem
of determining principal components of a symmetric matrix is discussed. A continuous-time
gradient flow is derived that leads to a discrete exponential interpolation of the continuous-time
flow which converges to the desired limit. A comparison to classical algorithms for the same
task is given. The third example discussed, this time drawn from the field of linear systems
theory, is the task of arbitrary pole placement using static feedback for a structured class of
linear systems.
The remainder of the thesis provides a review of the underlying theory relevant to the three
examples considered and develops a mathematical framework in which the proposed numerical
algorithms can be understood. This framework leads to a general form for a solution to any
optimization problem on a homogeneous space. An important consequence of the theoretical
review is that it develops the mathematical tools necessary to understand more sophisticated
numerical algorithms. The thesis concludes by proposing a quadratically convergent numerical
optimization method, based on the Newton-Raphson algorithm, which evolves explicitly on a
Smith 1991, Helmke & Moore 1990, Brockett 1991b, Helmke et al. 1994) has lead to the
design of numerical algorithms based explicitly on the dynamical systems developed. Recent
advances in such techniques are discussed in the articles (Chu 1992, Brockett 1993, Moore,
Mahony & Helmke 1994). These methods are essentially based on classical unconstrained
optimization methodologies reformulated on the constraint set.
Unconstrained scalar optimization techniques fall into roughly three categories (Aoki 1971)
i) Methods that use only the cost-function values.
ii) Methods that use first order derivatives of the cost function.
iii) Methods that use second (and higher) order derivatives of the cost function.
12 Introduction Chapter 1
Methods of the first type tend not to be useful for other than linear search and non-smooth
optimization problems due to computational cost. An excellent survey of early techniques such
as pattern searches, relaxation methods, Rosenbrock and Powell’s methods as well as random
search methods and some other variations of these ideas is contained in Aoki (1971, section
4.7). Other good references for these methods are the books (Luenburger 1973, Minoux 1986).
Recent developments are discussed in the collection of articles (Kumar 1991).
The fundamental method of type ii) is the gradient descent method. For a potential
f : Rn � Rwith the gradient denoted Df � � �f�x1 � � � � �
�f�xn �
T the method of gradient descent
is
xk�1 � xk � skDf�xk�
where sk � 0 is some pre-specified sequence of real positive integers known as step-sizes. Here
the integer k indexs the iterations of the numerical algorithm acting like a discrete-time variable
for the solution sequence fxkg�k�0. A suitable choice of step-size sk is any sequence such that
sk � 0 as k �� andP�
k�1 sk ��. Polyak (1966) showed that provided f satisfies certain
convexity assumptions then the solution sequence of the gradient descent algorithm converges
to the minimum of f . The optimal gradient descent method is known as the method of steepest
descent (Cauchy 1847, Curry 1944) where the step-size is chosen at each step by
sk � arg mins�0
f�xk � sDf�xk���
Here “arg min” means to find the value of s that minimises f�xk � sDf�xk��. The method of
steepest descent has the advantage of being associated with strong global convergence theory
(Minoux 1986, Theorem 4.4, pg. 86). The step-size selection procedure is usually completed
using a linear search algorithm or using some estimation technique based on approximations of
f�xk�skDf�xk��. Using a linear search technique generally provides a faster but less reliable
algorithm while a good approximation technique will inherit the strong global convergence
theory of the optimal method. The disadvantage of the overall approach is the linear rate
of convergence of the solution sequence fxkg to the desired limit (even for optimal step-size
selection). Nevertheless, when the reliability and not the rate of convergence of an optimization
problem is at issue the steepest descent method or an approximate suboptimal gradient descent
method remains a preferred numerical algorithm.
x1-1 Historical Perspective 13
There are a number of algorithm which improve on the convergence properties of the
steepest descent method. Of these only the Newton-Raphson method is important for the se-
quel, however, it is worth mentioning that multi-step methods, combining a series of estimates
xk�1� � � � � xk�p and derivativesDf�xk�1�� � � � � Df�xk�p� can be devised which converge with
superlinear, quadratic and higher orders of convergence, but which have much weaker conver-
gence results associated with them than the steepest descent methods. The most prominent of
these methods are the accelerated steepest descent methods (Forsythe 1968) and the method of
conjugate gradients (Fletcher & Reeves 1964).
The Newton-Raphson method falls into the third category and relies on the idea of approx-
imating the scalar function f�x� by its truncated Taylor series
f�x� � f�xk� � �x� xk�TDf�xk� � �x� xk�
TD2f�xk��x� xk��
where D2f�xk� is the square matrix with ij’th entry �2f�xk��xi�xj
. If f�x� is quadratic then this
approximation is exact and the optimal minimum can be found in a single step
x� � xk � �D2f�xk���1 Df�xk��
Of course, in general this will not be true but if the approximation is fairly good, one would
expect the residual error jjx� � xk�1jj to be of order2 O�jj�x� � xk�jj3�. Indeed, the Newton-
Raphson algorithm is the most natural algorithm that displays quadratic convergence proper-
ties. A disadvantage of the Newton-Raphson algorithm is the cost of determining the inverse
�D2f�xk���1 and a number of methods have been devised to reduce the computational cost of
this calculation. The most common of these are the Davidon-Fletcher-Powell approach (Davi-
don 1959, Fletcher & Powell 1963) and a rank-2 correction formula independently derived by
2The big O order notation, jjx� � xk�1jj is of order O�jj�x� � xk�jj3� means that there exists real numbers
B � 0 and � � 0 such thatjj�x� � xk�1�jj
jj�x�� xk�jj3� B�
for all jj�x� � xk�jj � �. If jjx� � xk�1jj is of order O�jj�x� � xk�jj3� then it follows using the little o order
notation that jjx� � xk�1jj is of order o�jj�x� � xk�jj2�,
limk��
jj�x� � xk�1�jj
jj�x� � xk�jj2� 0�
Thus the error bound at each step decreases like a quadratic function around the limit point. Methods with thisconvergence behaviour are known as quadratically convergent.
14 Introduction Chapter 1
Broyden (1970), Fletcher (1970), Goldfarb (1970) and Shanno (1970). An excellent review of
these methods is provided by Minoux (1986, Chapter 4).
The approach to optimization described above is closely related to the task of numerically
approximating the solution of an ordinary differential equation. Indeed, the gradient descent
method is just the Euler method (Butcher 1987, Section 20) applied to determine the solution
of the gradient differential equation
�x � �Df�x�� x�0� � x0� �1�1�2�
where f : Rn � R (Euler’s original work is republished in the monograph (Euler 1913)). The
Euler method is rarely used in modern numerical analysis since it is only a first order method.
That is, the error between xk�1 and x�h; xk� (the solution of (1.1.2) with x�0� � xk evaluated
at time h) is o�h�,
limh�0
jjxk�1 � x�h; xk�jjh
� 0�
(Naturally, this translates to a linear convergence rate for the gradient descent method.) More
advanced numerical integration methods exist, the most common of which in engineering
applications are the Runge-Kutta methods (Butcher 1987, Section 22) or linear multi-step
methods (Butcher 1987, Section 23).
The idea of stability for a numerical approximation of the solution to an initial value
problem is usually described in terms of the ability of the numerical method to accurately
reproduce the behaviour of the continuous-time solution. Thus, if one is considering the scalar
linear differential equation
�x � qx� x�0� � x0 � C �1�1�3�
where q � C is a fixed complex number with real part Re�q� � 0, then the solution x�t�� 0
as t � 0. A numerical approximation to this problem is loosely said to be stable if the
approximation also converges to zero. A Runge-Kutta method, with step size h, is said to be A-
stable if the numerical solution of the scalar linear differential equation given above converges
to zero for any z � hq lying in the complex left half plane. Thus, for any real positive step-size
selection h � 0 and any linear system with Re�q� � 0 an A-stable Runge-Kutta method
solution of (1.1.3) will converge to zero. The concept of AN -stability captures the same
qualitative behaviour for non-autonomous linear systems (Burrage 1978). A strengthening of
x1-1 Historical Perspective 15
the concept of A-stability for contractive numerical problems (cf. the review article (Stuart
& Humphries 1994)) termed B-stability was proposed by Butcher (1975) which can also be
generalised to non-autonomous systems (BN -stability) (Burrage & Butcher 1979). In this
paper Burrage and Butcher also introduced the important concept of “algebraic stability”
which they showed implied bothB- and BN -stability. Algebraic stability is a condition on the
parameters that define a Runge-Kutta method which has relevance to many different stability
problems (Stuart & Humphries 1994) and even to question of existence and uniqueness of
solutions to implicit Runge-Kutta methods (Cooper 1986). For systems with Re�q� �� 0
the continuous-time solution to (1.1.3) will converge very quickly to zero and one would
like this behaviour to be replicated in the numerical solution. A definition of L-stability
due to Ehle (1973) which captures this idea is also a strengthening of standard A-stability.
There are a number of useful numerical schemes that do not satisfy A-stability and weaker
definitions of stability are available, the most common of which are “A� �-stability” (Widlund
1967, Dahlquist 1978) (the numerical solution of (1.1.3) must converge to zero for any z � hq
with Re�z� � � , where � 0 is a small real number) and “stiff stability” (Gear 1968) (the
method must be stable for all z � fz � C j j arg��z�j � g for some small real number
� 0).
The unifying idea behind each of these stability definitions is the ability of the numeri-
cal method to replicate the properties of the continuous-time solution that is being approxi-
mated. The classical definitions of stability discussed above consider only simple convergence
behaviour of systems (A- and AN -stability for linear decay problems, L-stability for fast
convergence rates, B- and BN -stability for contractive problems). Another important class
of differential equations are those which preserve certain quantities, for example energy or a
Hamiltonian. Numerical methods for these two classes of problems (conservative and Hamilto-
nian systems) have been the subject of considerable research recently. Methods for conservative
systems are discussed in the articles (Greenspan 1974, Greenspan 1984). Methods for Hamilto-
nian systems are of more relevance to the present work. These methods can be divided roughly
into two types (Sanz-Serna 1991), firstly methods that are classical numerical differential equa-
tion solvers which happen also to preserve a Hamiltonian, and secondly methods which are
constructed explicitly from generating functions for solving Hamiltonian systems. The earlier
methods were based on generating functions (Ruth 1983, Channell 1983, Menyuk 1984, Feng
1985, Zhong & Marsden 1988). When it was observed that expressing these methods could
16 Introduction Chapter 1
often be interpreted as numerical Runge-Kutta methods with particular properties people be-
came interested in exactly which Runge-Kutta methods would have the property of preserving
a Hamiltonian. This question was answered independently by a number of authors (Sanz-Serna
1988, Suris 1989, Lasagni 1988). Application of these ideas to engineering problems associ-
ated with equations of motion of a rigid body has been undertaken by Crouch, Grossman and
Yan (1992). Crouch, Grossman and Yan are also working on related integration techniques
for engineering problems (Crouch & Grossman 1994, Crouch, Grossman & Yan 1994). A
recent review article for Hamiltonian integration methods is Sanz-Serna (1991). Interestingly,
the characterisation of Runge-Kutta methods that preserve Hamiltonians is related to the alge-
braic construction first described when defining algebraic stability (Burrage & Butcher 1979).
Indeed, Stuart and Humphries (1994), describe a number of connections between early sta-
bility theory and modern numerical methods for Hamiltonian and conservative systems. In
Stuart and Humphries (1994) the concept of numerical stability, the question of whether, and
in what sense the dynamical properties of a continuous-time flow are inherited by a discrete
numerical approximation, is defined. This concept is sometimes termed “practical stability”
and is closely related to the definition of constraint stability given on page 2. I have opted
not to use the term numerical stability to describe the algorithms proposed in the sequel since
the optimization tasks considered require two types of numerical stability, preservation of a
constraint and convergence to a limit.
In certain cases the Toda lattice, double-bracket flow and related dynamical systems can be
interpreted as a completely integrable Hamiltonian flow (Bloch 1985a, Bloch et al. 1992, Bloch
1990b). In these cases one could think to apply the modern Hamiltonian integration techniques
discussed by Sanz-Serna (1991). To do this however, one would have to consider the various
differential equations as Hamiltonian flows on Rn and the insight gained by considering the
solution in matrix space would be lost.
Several authors have looked directly at discretizing flows on Lie-groups and homogeneous
spaces. Moser and Veselov (1991) considered discrete versions of classical mechanical systems.
Chu (1992) considered discrete methods for inverse singular problems based on dynamical
systems insights while Brockett (1993), Smith (1993) and Moore et al. (1994) have studied
more deliberate discretizations of gradient flows on Lie-groups and homogeneous spaces.
x1-1 Historical Perspective 17
1.1.3 Linear Systems Theory and Pole Placement Results
Textbooks on feedback control and linear systems theory are those of Kailath (1980), Wonham
(1985) and Sontag (1990). An excellent reference for classical linear quadratic methods is the
book (Anderson & Moore 1971) or the more recent book (Anderson & Moore 1990). A recent
review article on developments in pole placement theory is Byrnes (1989).
The field of systems engineering during the mid seventies was the scene of a developing
understanding of the mathematical and geometric foundation of linear systems theory. Sem-
inal work by Kalman (1963) among others set a foundation of mathematical systems theory
which lead people naturally to use algebraic geometric tools to solve some of the fundamental
questions that arose. This lead to a strong geometric framework for linear systems theory
being developed in the late seventies and early eighties (Bitmead & Anderson 1977, Martin
& Herman 1979, Hazewinkel 1979, Byrnes, Hazewinkel, Martin & Rouchaleau 1980, Helmke
1984, Falb 1990). See also the conference proceedings (Martin & Hermann 1977b, Byrnes &
Martin 1980). The development of the Toda lattice was of considerable interest to researchers
working in linear systems theory in the late seventies and lead to several new developments in
scaling actions on spaces of rational functions in system theory (Byrnes 1978, Krishnaprasad
1979, Brockett & Krishnaprasad 1980). More recently Nakamura (1989) has showed a con-
nection between the Toda lattice and the study of moduli spaces of controllable linear systems.
Also Brockett and Faybusovich (1991) have made connections with realization theory.
One of the principal questions in linear systems theory that remained unanswered until
recently was the question of how the natural frequencies or poles of a multi-input multi-output
system are effected by changing feedback gain. In the case where the full state of a multi-input
multi-output state space system is available as output, Wonham (1967) showed that arbitrary
pole placement is equivalent to complete controllability of the system. The case for output
feedback (when only part of the state is available directly from the output) was found to be
far more difficult. Indeed, even after the theory of optimal linear quadratic methods was far
advanced (Anderson & Moore 1971) an understanding of the output feedback pole placement
problem remained elusive. A few preliminary results on pole shifting were obtained in the
early seventies (for example Davison and Wang (1973)) which lead to the first important result,
obtained independently by Davison and Wang (1975) and Kimura (1975). Given a linear
18 Introduction Chapter 1
system with n states, m inputs and p outputs, the result stated that for almost all controllable
and observable linear state-space systems for which
m� p� 1 n�
the poles of that system could be almost arbitrarily changed using output feedback.
In 1977 Herman and Martin published a pair of articles (Hermann & Martin 1977, Martin
& Hermann 1977a) which used the dominant morphism theorem to show that mp n is a
necessary and sufficient condition for output feedback pole placement if one allows complex
gain matrices. Observe that if m, p 1 then mp m� p � 1 and thus the results obtained
by Hermann and Martin are stronger than those obtained earlier apart from the disadvantage
of requiring complex feedback. Unfortunately, their results don’t generalise to real feedback
gains though it was hoped that the condition mp n would also be a necessary and sufficient
for real output feedback pole placement. However, Willems and Hesselink (1978) soon gave a
counter example (m � 2, p � 2, n � 4) showing that the strict inequality mp � n could not
be achieved for arbitrary pole placement using real feedback.
The case mp � n was studied by Brockett and Byrnes (1979, 1981) using tools from
algebraic geometry and constructions on Grassmannian manifolds. By using these ideas
Brockett and Byrnes generalised Nyquist and Root locus plots to multi-input and multi-output
systems, however though useful, their results only applied in the case mp � n and fell short
of completely characterising the pole placement map in this case also. In Byrnes (1983)
the Ljusternik-Snivel’mann category of real Grassmannians is used to improve on Kimura’s
original result. There have been no other significant advances in dealing with this problem
during the mid eighties. A recent review article (Byrnes 1989) outlines the early results as well
as describing the state of the art towards the end of the eighties.
Recently Wang and Rosenthal have made new contributions to the problem of output
feedback pole placement (Wang 1989, Rosenthal 1989, Rosenthal 1992). Most recently Wang
(1992) has given a necessary and sufficient condition for pole placement using the central
projection model. Given a linear system with n states,m inputs and p outputs Wang has shown
that arbitrary output feedback pole placement is possible for any strictly proper controllable
and observable plant with mp � n. If the plant is only proper then almost arbitrary pole
x1-2 Summary of Results 19
placement is possible. The case mp � n is still not fully understood.
Little has been done to study classes of linear systems and the pole placement map. In
Martin and Herman (1977a) pole placement for linear Hamiltonian systems was considered.
More recently Mahony et al. (1993) (cf. Chapter 4) studied pole placement for symmetric
state-space systems. Simultaneous pole placement for multiple systems is also a problem that
has had little study. Ghosh (1988) has written a paper on this topic using algebro-geometric
techniques and recently Blondel (1992) and Blondel, Campion and Gevers (1993) have also
contributed. Such problems can also be tackled using the ideas outlined by Mahony and Helmke
(1993) (cf. Chapter 4). The development of efficient numerical methods for pole placement by
output feedback is a challenge. Methods from matrix calculus have been applied by Godbout
and Jordan (1989) and more recently, gradient descent methods have been proposed (Mahony
et al. 1993) (cf. Section 4.6).
1.2 Summary of Results
The thesis is divided into seven chapters. Chapter 1 provides an overview of the subject matter
considered. Chapters 2 to 4 consider three example optimization problems in detail. The first
problem discussed is a smooth optimization problem which can be used to solve the symmetric
eigenvalue problem. A considerable amount is known about the continuous-time gradient
dynamical systems associated with this optimization problem and the development builds on
this knowledge to generate a recursive numerical algorithm. The next problem considered is an
optimization problem related to principal component analysis. A discussion of the continuous-
time gradient flow is given before a numerical algorithm is developed. The connection of
the numerical method proposed and classical numerical linear algebraic algorithms for the
same task is investigated. The third example, drawn from the field of linear systems theory,
is the task of pole placement for the class of symmetric linear systems. A discussion of the
geometry of the task is undertaken yielding results with the flavour of traditional pole placement
results. Continuous-time gradient flows are derived and used to investigate the structure of
the optimization problem. A numerical method is also proposed based on the continuous-time
gradient flow.
The latter chapters approach the subject from a theoretical perspective. In Chapter 5
20 Introduction Chapter 1
a theoretical foundation is laid in which the algorithms proposed in Chapters 2 to 4 may be
understood. Chapter 6 goes on to consider the particular numerical algorithms proposed in detail
and provides a template for designing numerical optimization algorithms for any constrained
optimization problem on a homogeneous space. Later in Chapter 6 a more sophisticated
numerical algorithm based on the Newton-Raphson algorithm is developed in a general context.
The algorithm is applied to a specific problem (the symmetric eigenvalue problem) to provide
an example of how to use the theory in practise. Concluding remarks are contained in Chapter
7.
The principal results contained in Chapters 2 to 6 are summarised below.
Chapter 2: In this chapter a numerical algorithm, termed the the double-bracket algorithm
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
is proposed for computing the eigenvalues of an arbitrary symmetric matrix. For suitably small
k , termed time-steps, the algorithm is an approximation of the solution to the continuous-
time double-bracket equation. Since the matrix exponential of a skew symmetric matrix is
orthogonal it follows that this iteration has the important property of preserving the spectrum
of the iterates. That is the eigenvalues of Hk remain constant for all k. By choosing a suitable
diagonal target matrix N the sequence Hk will converge to a diagonal matrix from which the
eigenvalues ofH0 can be directly determined. To ensure that the algorithm converges a suitable
step-size k must be chosen at each step. Two possible choices of schemes are presented
along with analysis showing that the algorithm converges to the desired matrix for almost all
initial conditions. A related algorithm for determining the singular values of an arbitrary (not
necessarily square) matrix is proposed and is shown to be equivalent to the double-bracket
equation applied to an augmented symmetric system. An analysis of convergence behaviour
showing linear convergence to the desired limit points is presented. Associated with the main
algorithms presented for the computation of the eigenvalues or singular values of matrices are
algorithms evolving on Lie-groups of orthogonal matrices which compute the full eigenspace
decompositions of given matrices.
The material presented in this chapter was first published in the conference article (Moore,
Mahony & Helmke 1992). A journal paper based on an expanded version of the conference
x1-2 Summary of Results 21
paper is to appear this year (Moore et al. 1994).
Chapter 3: In this chapter an investigation is undertaken of the properties of Oja’s learning
equation
�X � XXTNX �NX� N � NT � Rn�n �
evolving on the set of matrices fX � Rn�m j XTX � Img, the Stiefel manifold of real
n �m matrices where n m are integers. This differential equation was proposed by Oja
(1982, 1989) as a model for learning in certain neural networks. Explicit proofs of convergence
for the flow are presented which extend the results in Yan et al. (1994) so that no genericity
assumption is required on the eigenvalues of N . The homogeneous nature of the Stiefel
manifold allows one to develop an explicit numerical method (a discrete-time system evolving
on the Stiefel manifold) for principal component analysis. The method is based on a modified
gradient ascent algorithm for maximising the scalar potential
RN �X� � tr�XTNX�
known as the generalised Rayleigh quotient. Proofs of convergence for the numerical algorithm
proposed are given as well as some modifications and observations aimed at reducing the
computational cost of implementing the algorithm on a digital computer. The discrete method
proposed is similar to the classical power method and steepest ascent methods for determining
the dominant p-eigenspace of a matrix N . Indeed, in the case where p � 1 (for a particular
choice of time-step) the discretization is shown to be equivalent to the power method. When
p � 1, however, there are subtle differences between the power method and the proposed
method.
The chapter is based on the journal paper (Mahony, Helmke & Moore 1994). Applications
of the same ideas have also been considered in the field of linear programming (Mahony &
Moore 1992).
Chapter 4: In this chapter, the task of pole placement is considered for a structured class
of systems (those with symmetric state space realisations) for which, to my knowledge, no
previous pole placement results are available. The assumption of symmetry of the realisation,
besides having a natural network theoretic interpretation, simplifies the geometric analysis
considerably. It is shown that a symmetric state space realisation can be assigned arbitrary
22 Introduction Chapter 1
(real) poles via symmetric output feedback if and only if there are at least as many system inputs
as states. This result is surprising since a naive counting argument (comparing the number of
free variables 12m�m � 1� of symmetric output feedback gain to the number of poles n of a
symmetric realization having m inputs and n states) would suggest that 12m�m � 1� n is
sufficient for pole placement. To investigate the problem further gradient flows of least squares
cost criteria (functions of the matrix entries of realisations) are derived on smooth manifolds
of output feedback equivalent symmetric realisations. Limiting solutions to these flows occur
at minima of the cost criteria and relate directly to finding optimal feedback gains for system
assignment and pole placement problems. Cost criteria are proposed for solving the tasks of
system assignment, pole placement, and simultaneous multiple system assignment.
The theoretical material contained in Sections 4.1 to 4.4 along with the simulations in
Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the numerical
method proposed in Section 4.6 was presented at the 1993 Conference on Decision and Control
(Mahony et al. 1993). Much of the material presented in this chapter was developed in
conjunction with the results contained the monograph (Helmke & Moore 1994b, Section 5.3),
which focusses on general linear systems.
Chapter 5: In this chapter a brief review of the relevant theory associated with developing
numerical methods on homogeneous spaces is presented. The focus of the development is
on classes of homogeneous spaces encountered in engineering applications and the simplest
theoretical constructions which provide a mathematical foundation for the numerical methods
proposed. A discussion is given of the relationship between gradient flows on Lie-groups and
homogeneous spaces (related by a group action) which motivates the choice of a particular
Riemannian structure for a homogeneous space. Convergence behaviour of gradient flows
is also considered. The curves used in constructing numerical methods in Chapters 2 to 4
are all based on matrix exponentials and the theory of the exponential map as a Lie-group
homomorphism is reviewed to provide a theoretical foundation for this choice. Moreover, a
characterisation of the geodesics associated with the Levi-Civita connection (derived from a
given Riemannian metric) is discussed and conditions are given on when the matrix exponential
maps to a geodesic curve on a Lie-group. Finally, an explicit discussion of the relationship
between geodesics on Lie-groups and homogeneous spaces is given.
Much of the material presented is standard or at least easily accessible to people working
x1-2 Summary of Results 23
in the fields of Riemannian geometry and Lie-groups. However, this material is not standard
knowledge for researchers in the field of systems engineering. Moreover, the development
strongly emphasizes the aspects of the general theory that is relevant to problems in linear
systems theory.
Chapter 6: In this chapter the gradient descent methods developed in Chapters 2 to 4 are
reviewed in the context of the theoretical developments of Chapter 5. The conclusion is that
the proposed algorithms are modified gradient descent algorithms where geodesics are used to
replace the straight line interpolation of the classical gradient descent method. This provides a
template for a simple numerical approach suitable for solving any scalar optimization problem
on a homogeneous space. Later in Chapter 6 a coordinate free Newton-Raphson method is
proposed which evolves explicitly on a Lie-group. This method is proposed in a general form
with convergence analysis and then used to generate a quadratically convergent numerical
method for the symmetric eigenvalue problem. A comparison is made to the QR algorithm
applied to an example taken from Golub and Van Loan (1989, pg. 424) which shows that the
Newton-Raphson method proposed converges in the same number of iterations as the classical
QR method.
Chapter 2
Numerical Gradient Algorithms for
Eigenvalue Calculations
A traditional algebraic approach to determining the eigenvalue and eigenvector structure of an
arbitrary matrix is theQR algorithm. In the early eighties it was observed that theQR algorithm
is closely related to a continuous-time differential equation which had become known through
study of the Toda lattice. Symes (1982), and Deift et al. (1983) showed that for tridiagonal
real symmetric matrices, the QR algorithm is a discrete-time sampling of the solution to a
continuous-time differential equation. This result was generalised to full complex matrices by
Chu (1984a), and Watkins and Elsner (1989b) provided further insight in the late eighties.
Brockett (1988) studied dynamical matrix flows generated by the double Lie-bracket1
equation,
�H � �H� �H�N ��� H�0� � H0�
for constant symmetric matrices N and H0. This differential equation is termed the double-
bracket equation, and solutionsof this equation are termed double-bracket flows. Similar matrix
differential equations appear earlier than those references given above in Physics literature. An
1The Lie-bracket of two square matrices X , Y � Rn�n is
�X�Y � � XY � Y X�
If X � XT and Y � Y T are symmetric matrices then �X�Y �T � ��X�Y � is a skew symmetric matrix.
24
x2-0 Introduction 25
example, is the Landau-Lifschitz-Gilbert equation of micromagnetics
d �m
dt�
�
1 � 2 � �m�H � �m� � �m�H�� j �mj2 � 1�
as � � and �� � k, a constant. In this equation �m�H � R3 and the cross-product
is equivalent to a Lie-bracket operation. The relationship between this type of differential
equation and certain problems in linear algebra, however, has only recently been investigated.
An important property of the double-bracket equation is that its solutions have constant
spectrum (i.e. the eigenvalues of a solution remain the same for all time) (Chu & Driessel
1990, Helmke & Moore 1994b). By suitable choice of the matrix parameter N Brockett (1988)
showed that the double-bracket flow can be used to diagonalise real symmetric matrices (and
hence compute their eigenvalues), sort lists, and even to solve linear programming problems.
In independent work by Driessel (1986), Chu and Driessel (1990), Smith (1991) and Helmke
and Moore (1990), a similar gradient flow approach was developed for the task of computing
the singular values of a general non-symmetric, non-square matrix. The differential equation
obtained in these approaches is almost identical to the double-bracket equation. In Helmke
and Moore (1990), it is shown thatthese flows can also be derived as special cases of the
double-bracket equation for a non-symmetric matrix, suitably augmented to be symmetric.
When the double-bracket equation is viewed as a dynamical solution to linear algebra
problems (Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b) one is lead naturally
to consider numerical methods based on the insight provided by the double-bracket flow. In
particular, the double-bracket flow evolves on a smooth submanifold of matrix space, the
set of all symmetric matrices with a given spectrum (Helmke & Moore 1994b, pg. 50). A
numerical method with such a property is termed constraint stable (cf. page 2) Such methods
are particularly of interest when accuracy or robustness of a given computation is an important
consideration. Robustness is of particular interest for engineering applications where input data
will usually come with added noise and uncertainty. As a consequence when one considers
numerical approximation of solutions to the double-bracket equation it is important to study
those methods which preserve the important structure of the double-bracket flow.
For the particular problem of determining the eigenvalues of a symmetric matrix, there
are many well tested and fast numerical methods available. It is not so much to challenge
26 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
established algorithms in speed or efficiency that one would study numerical methods based
on the double-bracket equation. Rather, with the developing theoretical understanding of a
number of related differential matrix equation (many of which with important applications in
linear systems theory, for example the area of balanced realizations (Imae, Perkins & Moore
1992, Perkins et al. 1990) one may look upon a detailed study of numerical methods based
on the double-bracket flow as providing a stepping stone to a new set of robust and adaptive
computational methods in linear systems theory.
The material presented in this chapter was first published in the conference article (Moore
et al. 1992). A journal paper based on an expanded version of the conference paper is to appear
this year (Moore et al. 1994).
In this Chapter, I propose a numerical algorithm, termed the the double-bracket algorithm,
for computing the eigenvalues of an arbitrary symmetric matrix,
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
For suitably small k, termed time-steps, the algorithm is an approximation of the solution to the
continuous-time double-bracket equation. Since the matrix exponential of a skew symmetric
matrix is orthogonal it seen that this iteration has the important property of preserving the
spectrum of the iterates. It is shown that for suitable choices of time-steps the double-bracket
algorithm inherits the same equilibria and limit points as the double-bracket flow and displays
linear convergence to its limit. A related algorithm for determining the singular values of an
arbitrary (not necessarily square) matrix is proposed and is shown to be equivalent to the double-
bracket equation applied to an augmented symmetric system. An analysis of convergence
behaviour showing linear convergence to the desired limit points is presented. Associated with
the main algorithms presented for the computation of the eigenvalues or singular values of
matrices are algorithms which compute the full eigenspace decompositions of given matrices.
These algorithms also display linear convergence to the desired limit points.
The chapter is divided into seven sections. In Section 2.1 the double-bracket algorithm is
introduced and the basic convergence results are presented. Section 2.2 deals with choosing
step-size selection schemes, and proposes two valid methods for generating the time-steps
k . Section 2.3 discusses the question of stability and proves that the double-bracket equation
x2-1 The Double-Bracket Algorithm 27
has a unique attractive fixed point under assumptions that both the step-size selection schemes
proposed in Section 2.2 satisfy. The remainder of the chapter deals with computing the singular
values of an arbitrary matrix, Section 2.4 and computing the full spectral decomposition of
symmetric (or arbitrary) matrices Section 2.5. A number of computational issues are briefly
mentioned in Section 2.6 and Section 2.7 considers some remaining open issues.
2.1 The Double-Bracket Algorithm
In this section a brief review of the continuous-time double-bracket equation is given with
emphasis on its interpretation as a gradient flow. The double-bracket algorithm is introduced
and conditions are given which guarantee convergence of the algorithm to the desired limit
point.
Let N and H be real symmetric matrices, and consider the potential function
��H� :� jjH �N jj2 (2.1.1)
� jjH jj2 � jjN jj2 � 2tr�NH��
where the norm used is the Frobenius norm
jjX jj2 :� tr�XTX� �X
x2ij �
with xij the elements of X . Note that ��H�measures the least squares difference between the
elements of H and the elements of N . Let M�H0� be the set of orthogonally similar matrices,
generated by some symmetric initial condition H0 � HT0 � Rn�n. Then
M�H0� � fUTH0U j U � O�n�g� �2�1�2�
where O�n� denotes the group of all n�n real orthogonal matrices. It is shown in Helmke and
Moore (1994b, pg. 48) that M�H0� is a smooth compact Riemannian manifold with explicit
forms given for its tangent space and Riemannian metric. Furthermore, in the articles (Bloch,
Brockett & Ratiu 1990, Chu & Driessel 1990) the gradient of ��H�, with the respect to the
28 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
normal Riemannian metric2 on M�H0� (Helmke & Moore 1994b, pg. 50), is shown to be
grad��H� � ��H� �H�N ��. Consider the gradient flow given by the solution of
�H � �grad��H� (2.1.3)
� �H� �H�N ��� with H�0� � H0�
which is termed the double-bracket flow (Brockett 1988, Chu & Driessel 1990). Thus, the
double-bracket flow is a gradient flow which acts to decrease, or minimise, the least squares
potential �, on the manifold M�H0�. Note that from (2.1.1), this is equivalent to increasing,
or maximising, tr�NH�. The matrix H0 is termed the initial condition, and the matrix N is
referred to as the target matrix.
The double-bracket algorithm proposed in this chapter is,
Hk�1 � e��k �Hk�N �Hke�k�Hk�N �� �2�1�4�
for arbitrary symmetric n� n matrices H0 and N , and some suitably small scalars k , termed
time-steps. Consider the curve Hk�1�t� � e�t�Hk�N �Hket�Hk �N � where Hk�1�0� � Hk and
Hk�1 � Hk�1� k�, the �k � 1�’th iteration of (2.1.4). Observe that
d
dt�e�t�Hk�N �Hke
t�Hk�N ��
����t�0
� �Hk� �Hk� N ���
and thus, e�t�Hk�N �Hket�Hk�N � is a first approximation of the double-bracket flow at Hk �
M�H0�. It follows that for small k , the solution to (2.1.3) evaluated at time t � k with
H�0� � Hk, is approximately Hk�1 � Hk�1� k�.
It is easily seen from above that stationary points of (2.1.3) will be fixed points of (2.1.4). In
general, (2.1.4) may have more fixed points than just the stationary points of (2.1.3), however,
Proposition 2.1.5 shows that this is not the case for suitable choice of time-step k. The term
equilibrium point is used to refer to fixed points of the algorithm which are also stationary
points of (2.1.3).
To implement (2.1.4) it is necessary to specify the time-steps k . This is accomplished by
2A brief discussion of the derivation of gradient flows on Riemannian manifolds is given in Sections 5.3 and 5.4.
x2-1 The Double-Bracket Algorithm 29
considering functions N : M�H0� � R� and setting k :� N �Hk�. The function N is
termed the step-size selection scheme.
Condition 2.1.1 Let N : M�H0� � R� be a step-size selection scheme for the double-
bracket algorithm on M�H0�. Then N is well defined and continuous on all of M�H0�,
except possibly those points H � M�H0� where HN � NH . Furthermore, there exist real
numbers B� � � 0, such that B � N �H� � for all H �M�H0� where N is well defined.
Remark 2.1.2 The variable step-size selection scheme proposed in this chapter is discontinu-
ous at all the points H �M�H0�, such that �H�N � � 0. �
Remark 2.1.3 Observe that the definition of a step-size selection scheme depends implicitly
on the matrix parameter N . Indeed, N may be thought of as a function in two matrix variables
N and H . �
Condition 2.1.4 Let N be a diagonal n� n matrix with distinct diagonal entries �1 � �2 �
� � � � �n.
Let �1 � �2 � � � � � �r be the eigenvalues of H0 with associated algebraic multiplicities
n1� � � � � nr satisfyingPr
i�1 ni � n. Since H0 is symmetric, the eigenvalues of H0 are all real
and the diagonalisation of H0 is
:�
�������1In1 0
.... . .
...
0 �rInr
� � �2�1�5�
where Ini is the ni � ni identity matrix. For generic initial condition H0 and a target matrix
N that satisfies Condition 2.1.4, the continuous-time equation (2.1.3) converges exponentially
fast to (Brockett 1988, Helmke & Moore 1994b). Thus, the eigenvalues of H0 are the
diagonal entries of the limiting value of the solution to (2.1.3). The double-bracket algorithm
behaves similarly to (2.1.3) for small k and, given a suitable step-size selection scheme,
should converge to the same equilibrium as the continuous-time equation.
30 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
The eigenvalues of the linearisation (2.3.1) can be read directly from (2.3.2) as 1 � d��i ��j���i � �j�, for i � j and �i �� �j . Since �i �j when i � j then if d � 1�2jjH0jj2jjN jj2(where jjX jj2 is the induced matrix 2-norm, the maximum singular value of X) it is easily
verified that j1� d��i � �j���i � �j�j � 1. It follows that is asymptotically stable with
rate of convergence at least linear. The linear scaling factor for the convergence error is
maxi�j��i ���jf1� d��i � �j���i � �j�g.
Remark 2.3.3 As jjN jj2jjH0jj2 � 2jjN jj jjH0jj, the constant step-size selection scheme cN
is an example of such a selection scheme. �
Remark 2.3.4 Let N : M�H0�� R� be a step-size selection scheme that satisfies Condition
2.1.1 and (2.1.7), and is also continuous on all M�H0�. Let be the locally asymptotically
stable equilibrium point given by (2.1.5). Set � � N �� and observe that the linearisation
of the double-bracket algorithm at is given by (2.3.1) with d replaced by �. Recall that the
LN , scheme defined in (2.2.10), is continuous, with limit LN �H�� � 1��4jjH0jj jjN jj�. Thus,
is an exponentially asymptotically stable equilibrium point for the double-bracket algorithm
equipped with the step-size selection scheme LN . �
To show that the double-bracket algorithm is exponentially stable at for the �N step-size
selection scheme is technically difficult due to the discontinuous nature of �N at equilibrium
points. A full proof of the following proposition is given in Moore et al. (1994).
Proposition 2.3.5 Let N satisfy Assumption (2.1.4) and �N be the step-size selection scheme
given by Lemma 2.2.4. The iterative algorithm (2.1.4), has a unique linearly attractive equi-
librium point given by (2.1.5).
To give an indication of the behaviour of the double-bracket algorithm two plots of a
simulation have been included, Figures 2.3.1 and 2.3.2. The simulation was run on a random
40 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70 80
Simulation 1
Iterations
Dia
gona
l ele
men
ts o
f es
timat
e
Figure 2.3.1: A plot of the diagonal elements of each iteration, Hk, of the double-bracketalgorithm (2.1.4) run on a 7 � 7 initial conditionH 0 with eigenvalues �1� � � � � 7�. The targetmatrix N was chosen to be diag�1� � � � � 7�.
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70 80
Simulation 1
Iterations
Pote
ntia
l
Figure 2.3.2: The potential��Hk� � jjHk �N jj2 for the double-bracket algorithm (2.1.4).
7� 7 symmetric initial value matrix with eigenvalues 1� � � � � 7. The target matrix N is chosen
as diag�1� � � � � 7� and as a consequence the minimum potential is �� � 0. Figure 2.3.1 is a
plot of the diagonal entries of the recursive estimate Hk. The off-diagonal entries converge to
zero as the diagonal entries converge to the eigenvalues of Hk. Figure 2.3.2 is a plot of the
potential jjHk �N jj2 verses the iteration k. This plot clearly shows the monotonic decreasing
nature of the potential at each step of the algorithm.
The results of Sections 2.1, 2.2 and 2.3 are summarised in the following theorem.
Theorem 2.3.6 Let H0 � HT0 be a real symmetric n� n matrix with eigenvalues �1 � � �
�n. Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection
x2-4 Singular Value Computations 41
(2.2.5) or the variable step-size selection (2.2.9). The double-bracket algorithm
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
k � N �Hk��
with initial condition H0, has the following properties:
i) The recursion is isospectral.
ii) IfHk is a solution of the double-bracket algorithm, then��Hk� � jjHk�N jj2 is strictly
monotonically decreasing for every k � N where �Hk� N � �� 0.
iii) Fixed points of the recursive equation are characterised by matrices H � M�H0� such
that
�H�N � � 0�
iv) Fixed points of the recursion are exactly the stationary points of the double-bracket
equation. These points are termed equilibrium points.
v) Let Hk, k � 1� 2� � � �, be a solution to the double-bracket algorithm, then Hk converges
to a matrix H� �M�H0�, �H�� N � � 0, an equilibrium point of the recursion.
vi) All equilibrium points of the double-bracket algorithm are strictly unstable, except
� diag��1� � � � � �n� which is locally asymptotically stable.
vii) The rate of convergence of the double-bracket algorithm to the unique asymptotically
stable equilibrium point is at least linear in a neighbourhood of .
2.4 Singular Value Computations
In this section the task of determining the the singular values of an arbitrary matrix is considered.
A singular value decomposition of a matrixH0 � Rm�n, m n, is a matrix decomposition
H0 � V T�U� �2�4�1�
42 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
where V � O�m�, U � O�n� and
� �
�BBBBBBB�
�1In1 0...
. . ....
0 �rInr
0�m�n��n
�CCCCCCCA � �2�4�2�
Here �1 � �2 � � � � � �r 0 are the distinct singular values of H0, occurring with
multiplicitiesn1� � � � � nr such thatPr
i�1 ni � n. By convention the singular values of a matrix
are chosen to be non-negative. It should be noted that though such a decomposition always
exists and � is unique, there is no unique choice of orthogonal matrices V and U .
Let S�H0� be the set of all orthogonally congruent matrices to H0,
S�H0� � fV TH0U � Rm�n j V � O�m�� U � O�n�g� �2�4�3�
It is shown Helmke and Moore (1994b, pg. 89) that S�H0� is a smooth compact Riemannian
manifold with explicit forms given for its tangent space and Riemannian metric. Following
the articles (Chu 1986, Chu & Driessel 1990, Helmke & Moore 1990, Helmke & Moore
1994b, Smith 1991) consider the task of calculating the singular values of a matrix H0, by
minimising the least squares cost function � : S�H0� � R�, ��H� � jjH � N jj2. It is
shown Helmke and Moore (1990, 1994b) that � achieves a unique local and global minimum
at the point � � S�H0�. Moreover, in the articles (Helmke & Moore 1990, Helmke & Moore
1994b, Smith 1991) the explicit form for the gradient grad� is calculated. The minimizing
gradient flow is
�H � �grad��H� (2.4.4)
� HfH�Ng� fHT � NTgH�
with H�0� � H0 the initial condition. Here the generalised Lie-bracket
fX� Y g :� XTY � Y TX � �fX� Y gT �
is used.
x2-4 Singular Value Computations 43
Condition 2.4.1 Let N be an m� n matrix, with m n,
N �
���������
�1 0...
. . ....
0 �n
0�m�n��n
��
where �1 � �2 � � � � � �n � 0 are strictly positive, distinct real numbers.
For generic initial conditions, and a target matrix N that satisfies Condition 2.4.1, it is
known that (2.4.4) converges exponentially fast to � � S�H0� (Helmke & Moore 1990, Smith
1991). For H0 and N constant m� n matrices, the singular value algorithm proposed is
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng� �2�4�5�
This algorithm is analogous to the double-bracket algorithm eqLB:eq:DB3.
Lemma 2.4.2 Let H0, N be m � n matrices. For any H � Rm�n define a map H � bH �R�m�n���m�n� , where
bH �
�B� 0m�m H
HT 0n�n
�CA � �2�4�6�
For any sequence of real numbers k, k � 1� � � � �� the iterations,
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng� �2�4�7�
with initial condition H0 and
bHk�1 � e��k�bHk� bN� bHke
�k�bHk� bN�� �2�4�8�
with initial condition bH0 are equivalent.
Proof Consider the iterative solution to (2.4.8), and evaluate the multiplication in the block
form of (2.4.6). This gives two equivalent iterative solutions, one the transpose of the other,
both of which are equivalent to the iterative solution to (2.4.7).
44 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Remark 2.4.3 Note that bH0 and bN are symmetric �m� n�� �m� n� matrices, and that as a
result, the iteration (2.4.8) is just the double-bracket algorithm (2.1.4). �
Remark 2.4.4 The equivalence given by this lemma is complete in every way. In particular,
H� is an equilibrium point of (2.4.7) if and only if bH� is an equilibrium point of (2.4.8).
Similarly, Hk � H� if and only if bHk � bH� as k ��. �
This leads one to consider step-size selection schemes for the singular value algorithm
induced by selection schemes which were derived in Section 2.2 for the double-bracket algo-
rithm. Indeed if bN : M�cH0� � R� is a step-size selection scheme for (2.1.4), on M� bH0�,
and Hk � S�H0�, then one can define a time-step k for the singular value algorithm by
k � bN � bHk�� �2�4�9�
Thus, if (2.4.8) equipped with a step-size selection scheme bN , satisfies Condition 2.1.1
and (2.1.7), then from Lemma 2.4.2, (2.4.7) will satisfy similar conditions. For the sake of
simplicity the following development considers only the constant step-size selection scheme
(2.2.5) and the variable step-size selection (2.2.9).
Theorem 2.4.5 Let H0, N be m � n matrices where m n and N satisfies Condition 2.4.1.
Let bN : M�cH0� � R� be either the constant step-size selection (2.2.5), or the variable
step-size selection (2.2.9). The singular value algorithm
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng�
k � bN � bHk��
with initial condition H0, has the following properties:
i) The singular value algorithm is a self-equivalent (singular value preserving) recursion
on the manifold S�H0�.
ii) If Hk is a solution of the singular value algorithm, then ��Hk� � jjHk �N jj2 is strictly
monotonically decreasing for every k � N where fHk� Ng �� 0 and fHTk � N
Tg �� 0.
x2-4 Singular Value Computations 45
iii) Fixed points of the recursive equation are characterised by matrices H � S�H0� such
that
fHk� Ng � 0 and fHTk � N
Tg � 0� �2�4�10�
Fixed points of the recursion are exactly the stationary points of the singular value
gradient flow (2.4.4) and are termed equilibrium points.
iv) Let Hk, k � 1� 2� � � �, be a solution to the singular value algorithm, then Hk converges
to a matrix H� � S�H0�, an equilibrium point of the recursion.
v) All equilibrium points of the double-bracket algorithm are strictly unstable except �,
given by (2.4.2), which is locally asymptotically stable with at least linear rate of
convergence.
Proof To prove part i) note that the generalised Lie-bracket, fX� Y g � �fX� Y gT , is skew
symmetric, and thus (2.4.5) is an orthogonal congruence transformation and preserves the
singular values of Hk. Also note that the potential ��Hk� �12��
bHk�. Moreover, Lemma
2.4.2 shows that the sequence bHk is a solution to the double-bracket algorithm and thus,
from Proposition 2.1.5, 12��
bHk� must be monotonically decreasing for all k � N such that
� bHk� bN � �� 0, which is equivalent to (2.4.10). This proves part ii), and part iii) follows by noting
that if fHTk � N
Tg � 0 and fHk� Ng � 0, then Hk�l � Hk for l � 1� 2� � � �, and Hk is a fixed
point of (2.4.5). Moreover, since ��Hk� is strictly monotonic decreasing for all fHk� Ng �� 0
and fHTk � N
Tg �� 0, then these points can be the only fixed points. It is known that these are
the only stationary points of (2.4.4) (Helmke & Moore 1990, Helmke & Moore 1994b, Smith
1991).
In order to prove iv) one needs the following characterisation of equilibria of the singular
value algorithm.
Lemma 2.4.6 Let N satisfy Condition 2.4.1 and bN be either the constant step-size selec-
tion (2.2.5), or the variable step-size selection (2.2.9). The singular value algorithm (2.4.5)
equipped with time-steps k � bN � bHk�, has exactly 2nn!�Qri�1�ni!� distinct equilibrium
46 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
points in S�H0�. Furthermore, these equilibrium points are characterised by the matrices
�B� �T 0n��m�n�
0�m�n��n 0�m�n���m�n�
�CA�S��
where � is an n� n permutation matrix, and S � diag��1� � � � ��1� a sign matrix.
Proof Equilibrium points of (2.4.5) are characterised by the two conditions (2.4.10). Since N
satisfies Condition 2.4.1, then setting H � �hij� one has fH�Ng � 0 is equivalent to
�jhji � �ihij � 0� for i � 1� � � �n� j � 1� � � �n�
Similarly, the condition fHT � NTg � 0 is equivalent to
�jhij � �ihji � 0� for i � 1� � � �n� j � 1� � � �n�
hij�j � 0 for i � n� 1� � � �m� j � 1� � � �n�
By manipulating the relationships, and using the distinct, positive nature of the �i, it is
easily shown that hij � 0 for i �� j. Using the fact that (2.4.5) is self equivalent, the only
possible matrices of this form which have the same singular values as H0 are characterised
as above. A simple counting argument shows that the number distinct equilibrium points is
2nn!�Qri�1�ni!�.
The proof of part iv) is now directly analogous to the proof of part c) Proposition 2.1.5. It
remains only to prove part v), which involves the stability analysis of the equilibrium points
characterised by (2.4.10). It is not possible to directly apply the results obtained in Section 2.3
to the double-bracket algorithm bHk, since the bN does not satisfy Condition 2.1.4. However, for
the constant step-size selection scheme induced by (2.2.5), and using analogous arguments to
those used in Lemma 2.3.1 and 2.3.2, it follows that� is the unique locally attractive equilibrium
point of the singular value algorithm. Similarly, by linearizing (2.4.4) for continuous step-size
selection schemes at the point�, it can be shown that the rate of convergence is at least linear in
a neighbourhood of �. Thus, using Lemma 2.4.2 it follows that b� is the unique exponentially
attractive equilibrium point of the double-bracket algorithm on M�cH0�. To obtain the same
results for the variable step-size selection scheme (2.2.9) one applies Proposition 2.3.5 to the
x2-5 Associated Orthogonal Algorithms 47
double-bracket algorithm on M�cH0� and uses the equivalence given by Lemma 2.4.2 to obtain
the same result for the singular value algorithm (2.4.4). This completes the proof.
Remark 2.4.7 The above theorem holds true for any time-steps k � bN� bHk� induced by a
step-size selection scheme, bN , which satisfies Condition 2.1.1, such that Theorem 2.3.6 holds.
�
2.5 Associated Orthogonal Algorithms
In addition to finding eigenvalues or singular values of a matrix it is often desired to determine
the full eigen-decomposition of a matrix, i.e. the eigenvectors related to each eigenvalue. As-
sociated with the double-bracket algorithm and singular value algorithm there are algorithms
evolving on the set of orthogonal matrices which converge to the matrix of orthonormal eigen-
vectors (for the double-bracket algorithm) and separate matrices of left and right orthonormal
singular direction (for the singular value algorithm). To simplify the subsequent analysis one
imposes a genericity condition on the initial condition H0.
Condition 2.5.1 If H0 � HT0 � Rn�n is a real symmetric matrix then assume that H0 has
distinct eigenvalues �1 � � � � � �n. If H0 � Rm�n, where m n, then assume that H0 has
distinct singular values �1 � � � � � �n � 0.
For a sequence of positive real numbers k, for k � 1� 2� � � �, the associated orthogonal
double-bracket algorithm is
Uk�1 � Uke�k�U
TkH0Uk�N �� U0 � O�n�� �2�5�1�
where H0 � HT0 � Rn�n is symmetric. For an arbitrary initial condition H0 � Rm�n the
associated orthogonal singular value algorithm is
Vk�1 � Vke�kfU
TkHT
0 Vk�NT g� V0 � O�m� (2.5.2)
Uk�1 � Uke�kfV
TkH0Uk�Ng� U0 � O�n��
48 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Note that in each case the exponents of the exponential terms are skew symmetric and thus, the
recursions will remain orthogonal.
Let H0 � HT0 � Rn�n and consider the map g : O�n�� M�H0�, U � UTH0U , which
is a smooth surjection. If Uk is a solution to (2.5.1) observe that
g�Uk�1� � e��k�g�Uk��N �g�Uk�e�k�g�Uk��N �
is the double-bracket algorithm (2.1.4). Thus, g maps the associated orthogonal double-bracket
algorithm with initial condition U0, to the double-bracket algorithm with initial condition
UT0 H0U0, on M�UT
0 H0U0� �M�H0�.
Remark 2.5.2 Consider the potential function : O�n� � R�, �U� � jjUTH0U � N jj2
on the set of orthogonal n� n matrices. Using the standard induced Riemannian metric from
Rn�n on O�n�, the associated orthogonal gradient flow is (Brockett 1988, Chu 1984a, Chu &
Driessel 1990, Helmke & Moore 1994b)
�U � �grad�U� � U �UTH0U�N ��
�
Theorem 2.5.3 Let H0 � HT0 be a real symmetric n�n matrix that satisfies Condition 2.5.1.
Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection
(2.2.5) or the variable step-size selection (2.2.9). The recursion
Uk�1 � Uke�k�U
TkH0Uk�N �� U0 � O�n��
k � N�Hk�
referred to as the associated orthogonal double-bracket algorithm, has the following properties:
i) A solution Uk, k � 1� 2� � � �, to the associated orthogonal double-bracket algorithm
remains orthogonal.
ii) Let �U� � jjUTH0U � N jj2 be a map from O�n� to the set of non-negative reals
R�. Let Uk , k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket
x2-5 Associated Orthogonal Algorithms 49
algorithm. Then �Uk� is strictly monotonically decreasing for every k � N where
�UTk H0Uk � N � �� 0.
iii) Fixed points of the algorithm are characterised by matrices U � O�n� such that
�UTH0U�N � � 0�
There are exactly 2nn! distinct fixed points.
iv) Let Uk, k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket algo-
rithm, then Uk converges to an orthogonal matrix U�, a fixed point of the algorithm.
v) All fixed points of the associated orthogonal double-bracket algorithm are strictly un-
stable, except those 2n points U� � O�n� such that
UT� H0U� � �
where � diag��1� � � � � �n�. Such points U� are locally asymptotically stable with at
least linear rate of convergence and H0 � U�UT� is an eigenspace decomposition of
H0.
Proof Part i) follows directly from the orthogonal nature of e�k �UTkH0Uk�N �. Note that in part
ii) the definition of can be expressed in terms of the map g�U� � UTH0U from O�n� to
M�H0� and the double-bracket potential ��H� � jjH �N jj2 of (2.1.1), i.e.
�Uk� � ��g�Uk���
Observe that g�U0� � UT0 H0U0, and thus, g�Uk� is the solution of the double-bracket algorithm
with initial conditionU T0 H0U0. As the step-size selection scheme N is either (2.2.5) or (2.2.9),
then g�Uk� satisfies (2.1.7). This ensures that part ii) holds.
If Uk is a fixed point of the associated orthogonal double-bracket algorithm with initial
condition UT0 H0U0, then g�Uk� is a fixed point of the double-bracket algorithm. Thus, from
Proposition 2.1.5, �g�Uk�� N � � �UTk H0Uk� N � � 0. Moreover, if �UT
k H0Uk � N � � 0 for
some given k � N, then by inspection Uk�l � Uk for l � 1� 2� � � �, and Uk is a fixed
point of the associated orthogonal double-bracket algorithm. From Lemma 2.1.6 it follows
50 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
that if U is a fixed point of the algorithm then UTH0U � �T� for some permutation
matrix �. By inspection any orthogonal matrix W � SU�T , where S is a sign matrix
S � diag��1� � � � ��1�, is also a fixed point of the recursion, and indeed, any two fixed points
are related in this manner. A simple counting argument shows that there are exactly 2nn!
distinct matrices of this form.
To prove iv), note that since g�Uk� is a solution to the double-bracket algorithm, it converges
to a limit point H� � M�H0�, �H�� N � � 0 (Proposition 2.1.5). Thus Uk must converge
to the preimage set of H� via the map g. Condition 2.5.1 ensures that set generated by the
preimage of H� is a finite distinct set, any two elements U 1� and U2
� of which, are related by
U1� � U2
�S, S � diag��1� � � � ��1�. Convergence to a particular element of this preimage
follows since k�UTk H0Uk� N �� 0 as in Proposition 2.1.5.
To prove part v), observe that the dimension of O�n� is the same as the dimension of
M�H0�, due to the genericity Condition 2.5.1. Thus g is locally a diffeomorphism on O�n�,
which forms an exact equivalence between the double-bracket algorithm and the associated
orthogonal double-bracket algorithm. Restricting g to a local region the stability structure of
equilibria are preserved under the map g�1. Thus, all fixed points of the associated orthogonal
double-bracket algorithm are locally unstable except those that map via g to the unique locally
asymptotically stable equilibrium of the double-bracket algorithm recursion. Observe that due
to the monotonicity of �Uk� a locally unstable equilibrium is also globally unstable.
The proof of the equivalent result for the singular value algorithm is completely analogous
to the above proof.
Theorem 2.5.4 Let H0 � Rm�n where m n satisfies Condition 2.5.1. Let N � Rm�n
satisfy Condition 2.4.1. Let the time-step k be given by
k � bN � bH��where bN is either the constant step-size selection (2.2.5), or the variable step-size selection
scheme (2.2.9), on M� bH0�. The recursion
Vk�1 � Vke�kfU
TkHT
0 Vk�NT g� V0 � O�m�
Uk�1 � Uke�kfV
TkH0Uk �Ng� U0 � O�n��
x2-6 Computational Considerations 51
referred to as the associated orthogonal singular value algorithm, has the following properties:
i) Let �Vk� Uk� be a solution to the associated orthogonal singular value algorithm, then
both Vk and Uk remain orthogonal.
ii) Let �V� U� � jjV TH0U �N jj2 be a map from O�m��O�n� to the set of non-negative
reals R�, then �Vk� Uk� is strictly monotonically decreasing for every k � N where
fV Tk H0Uk� Ng �� 0 and fUT
k HT0 Vk� N
Tg �� 0. Moreover, fixed points of the algorithm
are characterised by matrix pairs �V� U� � O�m��O�n� such that
fV TH0U�Ng � 0 and fUTHT0 V�N
Tg � 0�
iii) Let �Vk� Uk�, k � 1� 2� � � �, be a solution to the associated orthogonal singular value
algorithm, then �Vk� Uk� converges to a pair of orthogonal matrices �V�� U��, a fixed
point of the algorithm.
iv) All fixed points of the associated orthogonal singular value algorithm are strictly unsta-
ble, except those points �V�� U�� � O�m�� O�n� such that
V T� H0U� � ��
where � � diag��1� � � � � �n� � Rm�n. Each such point �V�� U�� is locally exponentially
asymptotically stable and H0 � V T� �U� is a singular value decomposition of H0.
2.6 Computational Considerations
There are several issues involved in the implementation of the double-bracket algorithm as
a numerical tool which have not been dealt with in the body of this chapter. Design and
implementation of efficient code has not been considered and would depend heavily on the
nature of the hardware on which such a recursion would be run. As each iteration requires
the calculation of a time-step, an exponential and a k � 1 estimate it is likely that it would
be best to consider applications in parallel processing environments. Certainly in a standard
computational environment the exponential calculation would limit the possible areas of useful
application of the algorithms proposed.
52 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
It is also possible to consider approximations of the double-bracket algorithm which have
good computational properties. For example, consider a (1,1) Pade approximation to the matrix
exponential
e�k�Hk�N � � 2In � k �Hk� N �
2In � k �Hk� N ��
Such an approach has the advantage that, as �Hk� N � is skew symmetric, the Pade approximation
will be orthogonal, and will preserve the isospectral nature of the double-bracket algorithm.
Similarly an �n� n� Pade approximation of the exponential for any n will also be orthogonal.
There are difficulties involved in obtaining direct step-size selection schemes based on Pade
approximate double-bracket algorithms. Trying to guarantee that the potential � is monotonic
decreasing for such schemes by choosing step-size selection schemes based on the estimation
techniques developed in Section 2.2 yields time-steps which are prohibitively small. A good
heuristic choice of step-size selection scheme, however, can be made based on the selection
schemes given in this chapter and simulations indicate that the Pade approximate double-bracket
algorithm is viable when this is done.
2.7 Open Questions and Further Work
One of the fundamental problems tackled in this chapter is the task of step-size selection. The
best step-size selection scheme developed (2.2.9) is unsatisfactory in several ways; it is not
continuous at critical points of the cost function and it is computationally expensive to evaluate.
A better general understanding of the step-size selection task would be desirable. In particular,
it may be possible to develop linear search techniques that are guaranteed to converge to the
optimal step-size, obviating the need for approximations.
One of the primary motivations for studying the symmetric eigenvalue problem from a dy-
namical systems perspective is the potential for applications to on-line and adaptive processes.
It is instructive to consider how the double-bracket algorithm can be modified to deal with
time-varying data. Subsection 2.7.1 is by no means a comprehensive treatment of this issue,
nevertheless, it provides an indication of how such a task may be approached. To go beyond
the treatment of Subsection 2.7.1 it would be desirable to consider a particular application and
refine the algorithm to provide a useful numerical technique.
x2-7 Open Questions and Further Work 53
2.7.1 Time-Varying Double-Bracket Algorithms
Consider a sequence of ‘input’ matrices Ak � ATk for which an estimate of the eigenvalues
of Ak at each time k is required. One assumes that the spectrum of each Ak is related, for
example the sequence Ak is slowly varying with k. If the sequence Ak is a noisy observation
of some time-varying process or contains occasional large deviations then a sensible algorithm
for estimating the spectrum of Ak�1 would exploit the full data sequence A0� � � � � Ak along
with the new data Ak�1 to generate a new estimate. A gradient descent algorithm achieves this
in a fundamental manner since each new estimate is based on a small change in the previous
estimate which in turn is based on the data sequence up to time k. The issue of constraint
stability is of importance in such situations since the presence of small errors in the constraint
at each step may eventually lead the estimates to stray some distance from the true spectrum.
Given a symmetric matrix H0 � HT0 and a diagonal matrix N � diag��1� � � � � �N�
satisfying Condition 2.1.4 consider the potential
�U� � jjUTH0U �N jj2�
In Section 2.5 the relationship UTH0U � M�H0� was exploited to display the connections
between the double-bracket algorithm and the associated orthogonal algorithm. However, it is
also possible to rewrite the potential as
�U� � jjUNUT �H0jj2F �
Similarly, the associated orthogonal algorithm itself can be rewritten with the matrixNmodified
by an orthogonal congruency transformation
Uk�1 � Uke�k�U
TkH0Uk�N �
� e�k�H0�UkNUTk�Uk �
The advantage of this formulation is the fact that matrixH0 appears explicitly in the algorithm.
The time-varying associated orthogonal double-bracket algorithm is defined to be
Uk�1 � e�k�Ak�UkNkUTk�Uk� U0 � I� �2�7�1�
54 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
An estimate of the spectral decomposition of Ak�1 is given by Hk�1 � UTk�1Ak�1Uk�1.
Observe that the eigenvector decomposition of Ak�1 is derived from the data sequence up to
timek and is applied toAk�1 to approximate a spectral decompositionAk�1 � Uk�1Hk�1UTk�1
where it is hoped that Hk�1 is nearly diagonal.
IfAk � H0 is constant it is easily seen that the time-varying associated orthogonal algorithm
reduces to the standard associated orthogonal algorithm. Also each step of the time-varying
algorithm will reduce the potential jjAk � Uk�1NUTk�1jj � jjAk � UkNUT
k jj. Thus, as long
as the sequence of matrices Ak does not vary too quickly with time the proposed algorithm
should converge to and track the spectral decomposition of Ak .
Remark 2.7.1 A time-varying dual singular value decomposition algorithm is fully analogous
to the development given above. �
Remark 2.7.2 If the sequence Ak is a stationary stochastic process it may be sensible to
replace the driving term Ak in (2.7.1) by Bk �1n
Pnk�1 Ak . �
Chapter 3
Gradient Algorithms for Principal
Component Analysis
The problem of principal component analysis of a symmetric matrixN � NT is that of finding
an eigenspace of specified dimension p 1 which corresponds to the maximal p eigenvalues of
N . There are a number of classical algorithms available for computing dominant eigenspaces
(principal components) of a symmetric matrix. A good reference for standard numerical
methods is Golub and Van Loan (1989).
There has been considerable interest in the last decade in using dynamical systems to solve
linear algebra problems (cf. the review (Chu 1988) and the recent monograph (Helmke &
Moore 1994b)). It is desirable to consider the relationship between such methods and classical
algebraic methods. For example, Deift et al. (1983) investigated a matrix differential equation
based on the Toda flow, the solution of which (evaluated at integer times) is exactly the sequence
of iterates generated by the standard QR algorithm. In general, dynamical system solutions of
linear algebra problems do not interpolate classical methods exactly. Discrete computational
methods based on dynamical system solutions to a given problem provide a way of comparing
classical algorithms with dynamical system methods. Recent work on developing numerical
methods based on dynamical systems insight is contained Brockett (1993) and Moore et al.
(1994).
Concentrating on the problem of principal component analysis, Ammar and Martin (1986)
55
56 Gradient Algorithms for Principal Component Analysis Chapter 3
have studied the power method (for determining the dominant p-dimensional eigenspace of
a symmetric matrix) as a recursion on the Grassmannian manifold Gp�Rn�, the set of all p-
dimensional subspaces of Rn. Using local coordinate charts on Gp�Rn� Ammar and Martin
(1986) show that the power method is closely related to the solution of a matrix Riccati
differential equation. Unfortunately, the solution to a matrix Riccati equation may diverge to
infinity in finite time. Such solutions correspond to solutions that do not remain in the original
local coordinate chart. Principal component analysis has also been studied by Oja (1982, 1989)
in relation to understanding the learning performance of a single layer neural network with n
inputs and p neurons. Oja’s analysis involves computing the limiting solution of an explicit
matrix differential equation evolving on Rn�p (there is no requirement for local coordinate
representations). The evolution of the system corresponds to the ‘learning’ procedure of the
neural network while the columns of the limiting solution span the principal component of the
covariance matrix N � EfukuTk g (where EfukuTk g is the expectation of ukuTk ) of the vector
random process uk � Rn, k � 1� 2� � � �, with which the network was ‘trained’. Recent work by
Yan et al. (1994) has provided a rigourous analysis of the learning equation proposed by Oja.
Not surprisingly, it is seen that the solution to Oja’s learning equations is closely related to the
solution of a Riccati differential equation.
In this chapter I investigate the properties of Oja’s learning equation restricted to the Stiefel
manifold (the set of all n � p real matrices with orthonormal columns). Explicit proofs of
convergence for the flow are presented which extend the results of Yan et al. (1994) and Helmke
and Moore (1994b, pg. 26) so that no genericity assumptionis required on the eigenvalues ofN .
The homogeneous nature of the Stiefel manifold is exploited to develop an explicit numerical
method (a discrete-time system evolving on the Stiefel manifold) for principal component
analysis. The method proposed is a gradient descent algorithm modified to evolve explicitly
on St(p� n). A step-size must be selected for each iteration and a suitable selection scheme is
proposed. Proofs of convergence for the proposed algorithm are given as well as modifications
and observations aimed at reducing the computational cost of implementing the algorithm on
a digital computer. The discrete method proposed is similar to the classical power method and
steepest ascent methods for determining the dominant p-eigenspace of a matrix N . Indeed, in
the case where p � 1 (for a particular choice of time-step) the discretization is shown to be the
power method. When p � 1, however, there are subtle differences between the methods.
x3-1 Continuous-Time Gradient Flow 57
This chapter is based on the journal paper (Mahony et al. 1994). Applications of the same
ideas have also been considered in the field of linear programming (Mahony & Moore 1992).
The chapter is organised into five sections including the introduction. Section 3.1 reviews
the derivation of the continuous-time matrix differential equation considered and gives a general
proof of convergence. In Section 3.2 a discrete-time iteration based on the results in Section 3.1
is proposed along with a suitable choice of time-step. Section 3.3 considers two modifications of
the scheme to reduce the computational cost of implementing the proposed numerical algorithm.
Section 3.4 considers the relationship of the proposed algorithm to classical methods while
Section 3.5 indicates further possibilities arising from this development.
3.1 Continuous-Time Gradient Flow
In this section a dynamical systems solution to the problem of finding the principal component
of a matrix is developed. The approach is based on computing the gradient flow associated
with a generalised Rayleigh quotient function. The reader is referred to Warner (1983) for
technical details on Lie-groups and homogeneous spaces.
LetN � NT be a real symmetricn�nmatrix with eigenvalues�1 �2 � � � �n and an
associated set of orthonormal eigenvectors v1� � � � � vn. A maximal p-dimensional eigenspace,
or maximal p-eigenspace of N is spfv1� � � � � vpg the subspace ofRn spanned by fv1� � � � � vpg.
If �p � �p�1 then the maximal p-eigenspace of N is unique. If �p � �p�1 � � �p�r,
for some r � 0, then any subspace spfv1� � � � � vp�1� wg where w � spfvp� vp�1� � � � � vp�rg is
a maximal p-eigenspace of N .
For p an integer with 1 � p � n, let
St(p� n) � fX � Rn�p j XTX � Ipg� �3�1�1�
where Ip is the p�p identity matrix, denote the Stiefel manifold of real orthogonaln�pmatrices.
ForX � St(p� n), the columns ofX are orthonormal basis vectors for a p-dimensional subspace
of Rn.
Lemma 3.1.1 The Stiefel manifold St(p� n) is a smooth compact np� 12p�p� 1�-dimensional
58 Gradient Algorithms for Principal Component Analysis Chapter 3
submanifold ofRn�p. The tangent space of St(p� n), at a point X is given by
It is easily verified that � is a smooth, transitive, group action on St(p� n). Since G is compact
it follows that St(p� n) is a compact embedded submanifold ofRn�p (Helmke & Moore 1994b,
pg. 352). The tangent space of St(p� n) at a point X � St(p� n) is given by the image of the
linearization of �X : G� St(p� n), �X�U� :� ��U�X�, at the identity element of G (Gibson
1979, pg. 75). Recall that the tangent space of O�n� at the identity is TInO�n� � Sk�n�
(Helmke & Moore 1994b, pg. 349) and consequently that the tangent space at the identity of
G is T�In�Ip�G � Sk�n�� Sk�p�. Computing the linearization of �X gives
D�X j�In�Ip������ � �X �X��
where D�X j�In�Ip������ is the Frechet derivative of �X at �In� Ip� in direction ����� �T�In�Ip�G.
1Given a function f : M � N between two smooth manifolds, a regular point p � M is a point where thetangent map Tpf : TpM � Tf�p�N is surjective. Given q � N let U � fp � M j f�p� � qg then U is knownas a regular level set if for each p � U then Tpf is surjective. It can be shown (using the inverse function theorem)that regular level sets are embedded submanifolds of M (Hirsch 1976, pg. 22).
x3-1 Continuous-Time Gradient Flow 59
The Euclidean inner product onRn�n �Rp�p is
h��1��1�� ��2��2�i � tr��T1 �2� � tr��T
1 �2��
This induces a non-degenerate inner product on T�In�Ip�G. Given X � St(p� n) then the
linearization T�In�Ip��X decomposes the identity tangent space into
T�In�Ip�G � kerT�In�Ip��X � dom T�In�Ip��X �
where kerT�In�Ip��X is the kernel of T�In�Ip��X and
Proof The columns of a critical point X � St(p� n) of RN span a p-eigenspace of N and
thus, the eigenvalues of XTNX are a subset of p eigenvalues of N . Index this subset by
a vector r � �r1� r2� � � � � rq�, such that each ri is an integer between zero and ni inclusive
�0 � ri � ni�, the sumPq
i�1 ri � p and each ri represents the algebraic multiplicity of �i as
an eigenvalue of XTNX . It follows directly that RN�X� �Pq
i�1 ri�i and thus the collection
of sets Lr are the critical level sets of RN .
Since N is symmetric there exists U� � O�n� such that N � UT� U� and
�
�BBBB��1In1 0 0
0. . . 0
0 0 �qInq
�CCCCA �
with Ini the ni � ni identity matrix. To show that the critical level sets Lr are embedded
submanifolds of St(p� n) it is convenient to consider the problem where N is replaced by
directly. In this case the critical level sets Lr� of R� on St(p� n) are exactly Lr� � UT� Lr.
The map X � U�X is a diffeomorphism of St(p� n) into itself which preserves submanifold
structure.
LetH � O�n1��O�n2�� �O�nq��O�p� and observe thatH is a compact Lie group.
Given an arbitrary index r consider the map � : H � Lr� � Lr
�,
���U1� U2� � � �Uq� V �� X� � UXV T � where U �
�BBBB�U1 0 0
0. . . 0
0 0 Uq
�CCCCA �
Observe that UTU � In and consequently R�����U� V �� X�� � R��X�. Moreover,
gradR�����U� V �� X�� � �� UXXTUT �UXV T
� UT �� XXT �XV T � 0
64 Gradient Algorithms for Principal Component Analysis Chapter 3
since it is assumed that gradR��X� � �� XXT�X � 0. It follows that� is a group action ofH
onLr
�. IfX andY are both elements ofLr� thenXTX andY TY have the same eigenvalues
and are orthogonally similar, i.e. there exists V � O�p� such that V TXTXV � Y TY . By
inspection one can find U � �U1� U2� � � �Uq� such that UXV � Y which shows that � is a
transitive group action onLr
�. It follows thatLr
� is itself a homogeneous space (with compact
Lie transformation group) and hence is an embedded submanifold of St(p� n) (Helmke &
Moore 1994b, pg. 352). Since X � U�X is a diffeomorphism of St(p� n) into itself this
shows that Lr � U�Lr
� is also an embedded submanifold of St(p� n).
Observe that any curve Y �t� � U�t�XV T �t�, Y �0� � X , lying in Lr will satisfy
�N� Y �t�Y �t�T �Y �t� � 0. Similarly, it is easily verified that any curve (passing through
Y �0� � X � Lr) satisfying this equality must lie in Lr. Thus, the tangent space TXLr is
given by the equivalence classes of the derivatives at time t � 0, of curves Y �t� such that
�N� Y �t�Y �t�T �Y �t� � 0. Let �U�0� � � � Sk�n� and �V �0� � � � Sk�p� then
d
dt
��N� Y �t�Y �t�T �Y �t�
�����t�o
� �N��XXT �XXT��X � �N�X�XT �X�XT �X
� �N�XXT ��X � �N�XXT �X�
� �N� ��� XXT���
since �N�XXT � � 0 (cf. part ii) Theorem 3.1.3). But this is just the definition (3.1.9) and the
result is proved.
Now at a critical point of RN the Hessian HRN is a well defined bilinear map from
TX St(p� n) to the reals (Helmke & Moore 1994b, pg. 344). Let ��1X �X�1� � TX St(p� n)
and ��2X �X�2� � TX St(p� n) be arbitrary then
HRN ��1X �X�1��2X �X�2� � D�1X�X�1
�D�2X�X�2RN �X�
� D�1X�X�1tr��XXT � N ��2�
� tr����1� XXT �� N ��2��
Observe that ��1� XXT � is skew symmetric since XXT is symmetric and �1 is skew sym-
metric. Similarly, ���1� XXT �� N � is skew symmetric. Since �1 and �2 are arbitrary then
HRN is degenerate in exactly those tangent direction ��X � X�� � TX St(p� n) for which
x3-2 A Gradient Descent Algorithm 65
���� XXT�� N � � 0. But this corresponds exactly to (3.1.9) and one concludes that the Hessian
HRN degenerates only on the tangent space of Lr. It is now possible to apply Lemma 5.5.2 to
complete the proof of part iii).
Part iv) of the theorem is verified by explicitly evaluating the derivative of (3.1.8).
Remark 3.1.5 In the case 1 � p � n no exact solution to (3.1.7) is known, however, for X�t�
a solution to (3.1.7) the solution for H�t� � X�t�X�t�T is known since
�H�t� � �XXT �X �XT
� NXXT �XXTN � 2XXTNXXT
� NH�t� �H�t�N � 2H�t�NH�t�� (3.1.10)
H�0� � X0XT0 and this equation is a Riccati differential equation (Yan, Helmke & Moore
1994). �
3.2 A Gradient Descent Algorithm
In this section a numerical algorithm for solving (3.1.7) is proposed. The algorithm is based
on a gradient descent algorithm modified to ensure that each iteration lies in St(p� n).
Let X0 � St(p� n) and consider the recursive algorithm generated by
Xk�1 � e��k�XkXTk�N �Xk� �3�2�1�
for a sequence of positive real numbers k, termed time-steps. The algorithm generated by
(3.2.1) is referred to as the Rayleigh gradient algorithm. The Lie-bracket �XkXTk � N � is skew
symmetric and consequently e��k�XkXTk�N � is orthogonal and Xk�1 � St(p� n). Observe also
thatd
d�e�� �XkX
Tk�N �Xk
������0
� �In �XkXTk �NXk � gradRN�Xk��
the gradient of RN at Xk. Thus, e�� �XkXTk�N �Xk represents a curve in St(p� n), passing
through Xk at time � � 0, and with first derivative equal to gradRN�X�. The linearization of
66 Gradient Algorithms for Principal Component Analysis Chapter 3
Xk�1��� � e�� �XkXTk�N �Xk around � � 0 is
Xk�1��� � Xk � �gradRN�Xk� � �higher order terms��
The higher order terms modify the basic gradient descent algorithm onRn�p to ensure that the
interpolation occurs along curves in St(p� n). For suitably small time-steps k, it is clear that
(3.2.1) will closely approximate the gradient descent algorithm onRn�p.
To implement the Rayleigh gradient algorithm it is necessary to choose a time-step k ,
for each step of the recursion. A convenient criteria for determining suitable time-steps is to
maximise the change in potential
�RN�Xk� k� � RN�Xk�1��RN�Xk�� �3�2�2�
It is possible to use line search techniques to determine the optimal time-step for each iteration
of the algorithm. Completing a line search at each step of the iteration, however, is compu-
tationally expensive and often results in worse stability properties for the overall algorithm.
Instead, a simple deterministic formulae for the time-step based on maximising a lower bound
�RlN�Xk� �� for (3.2.2) is provided.
Lemma 3.2.1 For any Xk � St(p� n) such that gradRN�Xk� �� 0, the recursive estimate
Xk�1 � e��k�XkXTk�N �Xk, where
k �jj�XkX
Tk � N �jj2
2ppjjN �XkXT
k � N �2jj � �3�2�3�
satisfies�RN�Xk� k� � RN�Xk�1�� RN�Xk� � 0.
Proof Denote Xk�1��� � e�� �XkXTk�N �Xk for an arbitrary time-step � . Direct calculations
show
�R�N�Xk� �� � �2tr�XT
k�1���N �XkXTk � N �Xk�1����
�R��N�Xk� �� � �4tr�XT
k�1���N �XkXTk � N �2Xk�1�����
x3-2 A Gradient Descent Algorithm 67
Taylor’s formula for �RN�Xk� �� gives
�RN�Xk� �� � �2� tr�XTk N �XkX
Tk � N �Xk�
� 4� 2Z 1
0tr�XT
k�1���N �XkXTk � N �2Xk�1�����1� s�ds
2� jj�XkXTk � N �jj2
� 4� 2Z 1
0jjXk�1���X
Tk�1���jjjjN �XkX
Tk � N �2jj�1� s�ds
� 2� jj�XkXTk � N �jj2� 2� 2ppjjN �XkX
Tk � N �2jj �: �Rl
N�Xk� ��
The quadratic nature of RlN�Xk� �� yields a unique maximum occurring at � � k given by
(3.2.3). Observe that if gradRN�Xk� �� 0 then jj�XkXTk � N �jj2 �� 0 and thus Rl
N�Xk� �� � 0.
The result follows since �RN�Xk� �� �RlN�Xk� �� � 0.
Theorem 3.2.2 Given N � NT a real symmetric n � n matrix and p be an integer with
1 � p � n. Denote the eigenvalues of N by �1 � � � �n. For a given estimate
Xk � St(p� n), let k be given by (3.2.3). The Rayleigh gradient algorithm
Xk�1 � e��k�XkXTk�N �Xk�
has the following properties.
i) The algorithm defines an iteration on St(p� n).
ii) Fixed points of the algorithm are critical points of RN , X � St(p� n) such that
�XXT � N � � 0. The columns of a fixed point of (3.2.1) form a basis for a p-dimensional
eigenspace of N .
iii) If Xk, for k � 1� 2� � � �, is a solution to the algorithm then the real sequence RN�Xk�
is strictly monotonic increasing unless there is some k � N with Xk a fixed point of the
algorithm.
iv) Let Xk, for k � 1� 2� � � �, be a solution to the algorithm, then Xk converges to a critical
level set of RN on St(p� n).
v) All critical level sets of RN are unstable except the set for which the Rayleigh quotient
is maximised. The columns of an element of the maximal critical level set form a basis
for the maximal eigenspace of N .
68 Gradient Algorithms for Principal Component Analysis Chapter 3
Proof Part i) follows from the observation that e��k�XkXTk�N � is orthogonal. Part ii) is a direct
consequence of Lemma 3.2.1 (since �RN�Xk� k� � 0 if and only if Xk is a fixed point) and
Theorem 3.1.3. Part iii) also follows directly from Lemma 3.2.1.
To prove part iv) observe that since St(p� n) is a compact set, RN�Xk� is a bounded
monotonically increasing sequence which must converge. As a consequence Xk converges
to some level set of RN such that for any X in this set �RN�X� �X�� � 0. Lemma 3.2.1
ensures that any X in this set is a fixed point of the recursion.
IfX is a fixed point of the recursion whose columns do not span the maximal p-dimensional
subspace of N then it is clear that there exists an orthogonal matrix U � O�n�, with jjU � Injjarbitrarily small and such that RN�UX� � RN�X�. As a consequence, the initial condition
X0 � UX (jjX0�X jj small) will give rise to a sequence of matricesXk that diverges from the
level set containing X , Lemma 3.2.1. This proves the first statement of v) while the attractive
nature of the remaining fixed points follows from La’Salle’s principle of invariance along with
the Lyapunov function V �X� ��Pp
i�1 �i �RN�X�.
Remark 3.2.3 It is difficult to characterise the exact basin of attraction for the set of matrices
whose columns span the maximal p-eigenspace of N . It is conjectured that the attractive basin
for this set is all of St(p� n) except for other critical points. �
Remark 3.2.4 For a fixed initial condition X0 � St(p� n) let Xk be the solution to (3.2.1).
Define Hk � XkXTk and observe
Hk�1 � e��k�Hk�N �Hke�k�Hk�N �� �3�2�4�
Thus, Hk can be written as a recursion on the set of symmetric rank p projection matrices
fH � Rn�n j H � HT � H2 � H� rank H � pg. The algorithm generated in this manner is
known as the double-bracket algorithm (cf. Chapter 2), a discretization of the continuous-time
double-bracket equation (3.1.10) �
x3-3 Computational Considerations 69
3.3 Computational Considerations
In this section two issues related to implementing (3.2.1) is a digital environment are discussed.
Results in both the following subsections are aimed at reducing the computational cost asso-
ciated with estimating the matrix exponential e��k �XkX�k�N �, a transcendental n � n matrix
function. The result presented in Subsection 3.3.1 is also important in Section 3.4.
3.3.1 An Equivalent Formulation
To implement (3.2.1) on conventional computer architecture the main computational cost for
each step of the algorithm lies in computing the n � n matrix exponential e��k�XkX�k�N �. The
following result provides an equivalent formulation of the algorithm which involves the related
p� p transcendental matrix functions “cos” and “sinc”.
Define the matrix function sinc : Rp�p � Rp�p by the convergent infinite sum
sinc�A� � Ip � A2
3!�A4
5!� A6
7!� �
Observe that Asinc�A� � sin�A� and thus, if A is invertible, sinc�A� � A�1 sin�A�. Define
the matrix function cos�A� by an analogous power series expansion. The matrix functions cos
and sinc are related by cos2�A� � Ip �A2sinc2�A�.
Lemma 3.3.1 GivenN � NT a real symmetric n�n matrix with eigenvalues �1 � � � �n,
let k, for k � 1� 2� � � �, be a sequence of real positive numbers. For X0 � St�p� n� an initial
condition that is not a critical point of RN�X�, then,
Xk�1 � e��k�XkXTk�N �Xk�
� Xk
�cos� kYk�� kX
Tk NXksinc� kYk�
�� kNXksinc� kYk�� (3.3.1)
where the power expansions for cos� kYk� and sinc� kYk� are determined by the positive
semi-definite matrix Y 2k � Rp�p
Y 2k � XT
k N2Xk � �XT
k NXk�2� �3�3�2�
70 Gradient Algorithms for Principal Component Analysis Chapter 3
Remark 3.3.2 The matrix Yk need not be explicitly calculated as the power series expansions
of sinc and cos depend only on Y 2k . �
Proof The proof follows from a power series expansion of e��k �XkXTk�N �Xk,
Xk�1 �
��Xl�0
1l!�� k�XkX
Tk � N ��l
�Xk �3�3�3�
Simple algebraic manipulations lead to the relation,
�XkXTk � N �2Xk � �XkY
2k �3�3�4�
where Y 2k is defined by (3.3.2). Pre-multiplying (3.3.4) by�XT
k provides an alternative formula
for Y 2k
Y 2k � XT
k �XkXTk � N �T �XkX
Tk � N �Xk�
which is positive semi-definite.
Using (3.3.4) it is possible to rewrite (3.3.3) as a power series in ��Y 2k �
Xk�1 ��Xm�0
��� k�2m
�2m�!Xk��Y 2
k �m �
�� k�2m�1
�2m� 1�!
�Xk�X
Tk NXk��NXk
���Y 2
k �m
��
�3�3�5�
where the first and second terms in the summation follow from the odd and the even power
powers of �XkXTk � N �lXk respectively. Rewriting this as two separate power series in ��Y 2
k �
Xk�1 � Xk
�Xm�0
�� k�2m
�2m�!��Y 2
k �m � k
�Xk�X
Tk NXk��NXk
� �Xm�0
�� k�2m
�2m� 1�!��Y 2
k �m
� Xk cos� kYk�� k�Xk�X
Tk NXk��NXk
�sinc� kYk��
and the result follows by rearranging terms.
3.3.2 Pade Approximations of the Exponential
It is also of interest to consider approximate methods for calculating matrix exponentials. In
particular, one is interested in methods that will not violate the constraint Xk�1 � St(p� n). A
standard approximation used for calculating the exponential function is a Pade approximation
x3-4 Comparison with Classical Algorithms 71
of order �n�m� where n � 0 and m � 0 are integers (Golub & Van Loan 1989, pg. 557). For
example, a (1,1) Pade approximation of the exponential is
e��k �XkX�k�N � � �In �
k2�XkX
�k� N ���1�In � k
2�XkX
�k� N ���
A key observation is that when n � m and the exponent is skew symmetric the resulting Pade
approximate is orthogonal. Thus,
Xk�1 � �In � k2�XkX
�k� N ���1�In � k
2�XkX
�k� N ��Xk� �3�3�6�
with initial condition X0 � St(p� n), defines an iteration on St(p� n) which approximates the
Rayleigh gradient algorithm (3.2.1). Of course, in practise one would use an algorithm such as
Gaussian elimination (Golub & Van Loan 1989, pg. 92) to solve the linear system equations
�In � k2�XkX
�k� N ��Xk�1 � �In � k
2�XkX
�k� N ��Xk
for Xk�1 rather than computing the inverse explicitly.
The algorithm defined by (3.3.6) can also be rewritten in a similar form to that obtained in
Lemma 3.3.1. Consider the power series expansion
�In � k2�XkX
�k� N ���1 �
�Xi�0
�� k
2�XkX
�k� N �
�i�
From here it is easily shown that
Xk�1 � �Xk ��2Xk � k�Xk�X
�kNXk��NXk�
�Ip �
2k
4Y 2k �
�1� �3�3�7�
where Y 2k � Rp�p is given by (3.3.2).
3.4 Comparison with Classical Algorithms
In this section the relationship between the Rayleigh gradient algorithm (3.2.1) and some classi-
cal algorithms for determining the maximal eigenspace of a symmetric matrix are investigated.
A good discussion of the power method and the steepest ascent method for determining a single
72 Gradient Algorithms for Principal Component Analysis Chapter 3
maximal eigenvalue of a symmetric matrix is given by Faddeev and Faddeeva (1963). Practical
issues arising in implementing these algorithms along with direct generalizations to eigenspace
methods are covered by Golub and Van Loan (1989).
3.4.1 The Power Method
In this subsection the algorithm (3.2.1) in the case where p � 1 is considered. It is shown that
for a certain choice of time-step k the algorithm (3.2.1) is the classical power method.
Recall that St�1� n� � Sn�1 the �n� 1�-dimensional sphere inRn.
Theorem 3.4.1 GivenN � NT a real symmetricn�nmatrix with eigenvalues�1 � � � �n.
For xk � Sn�1 let k be given by
k �y2
2p
2jjN �xxT � N �2jj � �3�4�1�
where yk � R is given by
yk ��xTkN
2xk � �xTkNxk�2� 1
2� �3�4�2�
For x0 � St�1� n� � Sn�1 an arbitrary initial condition then:
i) The formulae
xk�1 � e��k �xkxTk�N �xk�
defines a recursive algorithm on Sn�1.
ii) Fixed points of the rank-1 Rayleigh gradient algorithm are the critical points of rN on
Sn�1, and are exactly the eigenvectors of N .
iii) If xk , for k � 1� 2� � � � is a solution to the Rayleigh gradient algorithm, then the real
sequence rN�xk� is strictly monotonic increasing, unless xk is an eigenvector of N .
iv) For a given xk � Sn�1 which is not an eigenvector of N , then yk �� 0 and
xk�1 �
�cos� kyk�� xTkNxk
sin� kyk�yk
�xk �
sin� kyk�yk
Nxk� �3�4�3�
x3-4 Comparison with Classical Algorithms 73
v) Let xk , for k � 1� 2� � � � be a solution to the rank-1 Rayleigh gradient algorithm, then xk
converges to an eigenvector of N .
vi) All eigenvectors of N , considered as fixed points of (3.4.3) are unstable, except the
eigenvector corresponding to the maximal eigenvalue �1, which is exponentially stable.
Proof Parts i)-iii) follow directly from Theorem 3.2.2. To see part iv) observe that yk �
jjgradrN�xk�jj and yk � 0 if and only if gradrN�xk� � 0 and xk is an eigenvector of N .
The recursive iteration (3.4.3) now follows directly from Lemma 3.3.1, with the substitution
sinc� kyk� �sin��kyk��kyk
. Parts v) and vi) again follow directly from Theorem 3.2.2.
Remark 3.4.2 Equation (3.4.3) involves only Nxk, xTkNxk and �Nxk�T �Nxk� vector com-
putations. This structure is especially of interest when sparse or structured matrices N are
considered. �
A geodesic (or great circle) on Sn�1, passing through x at time t � 0, can be written
��t� � cos�t�x� sin�t�V �3�4�4�
where V � ���0� is a unit vector orthogonal to x. Choosing V �xk� �gradrN�xk�jjgradrN�xk�jj
, x � xk
and evaluating ��t� at time t � k jjgradrN�xk�jj gives (3.4.3). Thus, (3.4.3) is a geodesic
interpolation of (3.1.8) the solution to the rank-1 Rayleigh gradient flow (3.1.7).
For a symmetric n� n matrix N � NT the classical power method is computed using the
recursive formula (Golub & Van Loan 1989, pg. 351)
zk � Nxk (3.4.5)
xk�1 �zkjjzkjj �
The renormalisation operation is necessary if the algorithm is to be numerically stable. The
following lemma shows that for N positive semi-definite and a particular choice of k the
rank-1 Rayleigh gradient algorithm (3.4.3) is exactly the power method (3.4.5).
Lemma 3.4.3 Given N � NT a positive semi-definite n � n matrix. For xk � Sn�1 (not an
74 Gradient Algorithms for Principal Component Analysis Chapter 3
kx
grad rN ( )xk
N kxk+1x
Sn-1
sp { kx N kx, }
0
Figure 3.4.1: The geometric relationship between the power method iterate and the the iterategenerated by (3.4.3).
eigenvector of N ) then jjgradrN�xk�jj � jjNxjj. Let k be given by
k �1
jjgradrN�xk�jj sin�1� jjgradrN�xk�jj
jjNxkjj�
�3�4�6�
where sin�1�jjgradrN �xk�jj
jjNxkjj
�� �0� ��2�. Then
NxkjjNxkjj �
�cos� kyk�� xTkNxk
sin� kyk�yk
�xk �
sin� kyk�yk
Nxk�
where yk is given by (3.4.2).
Proof Observe that jjgradrN�xk�jj2 � y2k � jjNxkjj2��xTkNx2�2 0 and thus jjgradrN�xk�jj
� jjNxkjj. Consider the 2-dimensional linear subspace spfxk� Nxkg ofRn. The new estimate
xk�1 generated using either (3.4.3) or (3.4.5) will lie in spfxk� Nxkg (cf. Figure 3.4.1). Setting
NxkjjNxkjj �
�cos��yk�� xTkNxk
sin��yk�yk
�xk �
sin��yk�yk
Nxk�
for � � 0 and observing that xk and Nxk are linearly independent then
cos��yk�� xTkNxksin��yk�
yk� 0
andsin��yk�
yk�
1jjNxkjj �
Since N � 0 is positive definite then a real solution to the first relation exists for which
�yk � �0� ��2�. The time-step value is now obtained by computing the smallest positive root
of the second relation.
x3-4 Comparison with Classical Algorithms 75
Choosing N � 0 positive definite in Lemma 3.4.3 ensures that (3.4.3) and (3.4.5) con-
verge ‘generically’ to the same eigenvector. Conversely, if N is symmetric with eigenvalues
�1 � � �n, 0 � �n and j�nj � j�ij then the power method will converge to the eigen-
vector associated with �n while (3.4.3) (equipped with time-step (3.4.1) ) will converge to the
eigenvector associated with �1. Nevertheless, one may still choose k using (3.4.6), with the
inverse sin operation chosen to lie in the interval
sin�1� jjgradrN�xk�jj
jjNxkjj�� ���2� ���
such that (3.4.3) and (3.4.5) are equivalent. In this case the geodesics corresponding to each
iteration of (3.4.3) are describing great circles travelling almost from pole to pole of the sphere.
3.4.2 The Steepest Ascent Algorithm
The gradient ascent algorithm for the Rayleigh quotient rN is the recursion (Faddeev &
Faddeeva 1963, pg. 430)
zk � xk � skgradrN�xk� (3.4.7)
xk�1 �zkjjzkjj
where sk � 0 is a real number termed the step-size. It is easily verified that the k� 1’th iterate
of (3.4.7) will also lie on the 2-dimensional linear subspace spfxk� Nxkg ofRn. Indeed for xk
not an eigenvector of N , (3.4.3) and (3.4.7) are equivalent when
sk �1y2k
�1
cos� kyk�� 1
�� �3�4�8�
The optimal step-size for the steepest ascent algorithm (i.e. rN�xk�1�soptk �� rN�xk�1�sk��
for any sk � R) is (Faddeev & Faddeeva 1963, pg. 433)
Proof Assume that there exists a matrix G and a scalar k � 0 such that (3.4.12) holds.
Observe that rank�e��k�XkXTk�N �Xk� � p and thus rank�NXk� � p. Similarly G � Rp�p is
non-singular.
Pre-multiplying (3.4.12) by GTXTk N and using the constraint relation GTXT
k N2XkG �
Ip gives
Ip � GTXTk Ne��k�XkX
Tk�N �Xk�
78 Gradient Algorithms for Principal Component Analysis Chapter 3
Since one need only consider the case where G is invertible it follows that
G�1 � XTk e
�k�XkXTk�N �NXk�
Lengthy matrix manipulations yield
XTk �XkX
Tk � N �2lNXk � ��1�lY 2l
k XTk NXk� for l � 0� 1� � � � �
and
XTk �XkX
Tk � N �2l�1NXk � ��1�lY 2l�2
k for l � 0� 1� � � � �
Expanding e�k�XkXTk�N � as a power series inY 2
k and then grouping terms suitably (cf. Subsection
3.3.1) one obtains
G�1 � cos� kYk�XTk NXk � sin� kYk�Yk�
Using (3.3.1) for e��k�XkXTk�N �Xk then (3.4.12) becomes
NXk � e��k �XkXTk�N �XkG
�1
��Xk cos� Yk�� k �XkX
Tk � N �Xksinc� kYk�
��cos� kYk�XT
k NXk � sin� kYk�Yk��
Pre-multiplying this by XTk yields
XTk NXk � cos2� kYk�X
Tk NXT
k � cos� kYk� sin� kYk�Yk�
and thus
sin2� kYk�XTk NXT
k � cos� kYk� sin� kYk�Yk�
This shows that (3.4.12) implies (3.4.13). If k solves (3.4.13) then defining G�1 �
XTk e
�k�XkXTk�N �NXk ensures (3.4.12) also holds which completes the proof.
Writing Yk �Pp
i�1 �iyiyTi where fy1� � � � � ypg is a set of orthonormal eigenvectors for Yk ,
whose eigenvalues are denoted �i 0 for i � 1� � � � � p, then (3.4.13) becomes
pXi�1
sin2� k�i�yiyTi X
Tk NXk �
pXi�1
�i cos� k�i� sin� k�i�yiyTi �
Fixing i and pre-multiplying by yTi while also post-multiplying by yi gives the following p
x3-5 Open Questions and Further Work 79
equations for k
either sin� k�i� � 0 or cot� k�i� �1�iyTi X
Tk NXkyi�
for i � 1� � � � � p. It follows that either from the first relation k�i � m� for some integer m or
from the second relation
cot� k� ��i � cot��i�yTi X
Tk NXkyi
�i cot��i� � yTi XTk NXkyi
�
for each i � 1� � � � � p. One can easily confirm from this that the p equations will fail to have a
consistent solution for arbitrary choices of Xk and N . Thus, generically the Rayleigh gradient
algorithm (3.2.1) does not correspond to the power generalised method (3.4.11) for any choice
of rescaling operation or time-step selection.
3.5 Open Questions and Further Work
There remains the issue of characterising the basin of attraction for the Rayleigh gradient
algorithm. Simulations indicate that the only points which are not contained in this set are
the non-minimal critical point of the generalised Rayleigh quotient, however, proving this is
likely to be very difficult. Another area where further insight would be desirable is in the
implementation of the (1,1) Pade approximate algorithm (3.3.6). It would seem likely that
for the time-steps generated by (3.2.3) the (1,1) Pade approximate algorithm would inherit all
the properties of the gradient descent algorithm. This appears to be the case in the simulation
studies undertaken.
In the earlier comparison between the Rayleigh gradient algorithm and classical numerical
linear algebra algorithms no account was taken of the various inverse shift algorithms which
tend to be the accepted computational methods. Incorporating the idea of origin shifts into
dynamical systems solutions of such linear algebra problems is an important question that has
not yet been satisfactorily understood.
In Subsection 3.4.1 it was shown that the rank-1 Rayleigh gradient algorithm is closely
related to the power method. Also related to the power method is an inverse shift algorithm
80 Gradient Algorithms for Principal Component Analysis Chapter 3
known as the Rayleigh iteration. Let N � NT � Rn�n be a symmetric matrix and xk � Rn
be some vector which is not an eigenvector of N then a single step of the Rayleigh iteration is
�k �xTkNxk
xTk ck
zk�1 � �N � �kIn��1xk
xk�1 �zk�1
jjzk�1jj �
The Rayleigh iteration converges cubically in a local neighbourhood of any eigenvector of N
(Parlett 1974). By comparing the Rayleigh iteration to the power method and the Rayleigh
gradient algorithm, one is lead to consider an ordinary differential equation of the form
�x � �N � rN�x�In��1x� xT �N � rN �x�In�
�1x x�
where rN�x� � xTNxxx is the Rayleigh quotient. In the vicinity of an eigenvector of N this
differential equation becomes singular and displays finite time convergence to the eigenvector
of N corresponding to the eigenvalue �i� which is the smallest eigenvalue of N such that
�i� � rN�x0�. The connection between singularly perturbed dynamical systems and shifted
numerical linear algebra methods is of considerable interest. There is also a connection to
the theory of differential/algebraic systems. For example the ordinary differential equation
mentioned above is equivalent to the differential/algebraic system
�x � z � �xTz�x
0 � x� �N � rN�x�In�z�
Chapter 4
Pole Placement for Symmetric
Realisations
A classical problem in systems theory is that of pole placement or eigenvalue assignment of
linear systems via constant gain output feedback. This is clearly a difficult task and despite a
number of important results, (cf. Byrnes (1989) for an excellent survey), a complete solution
giving necessary and sufficient conditions for a solution to exist has not been developed. It
has recently been shown that (strictly proper) linear systems with mp � n can be assigned
arbitrary poles using real output feedback (Wang 1992). Here n denotes the McMillan degree
of a system havingm inputs and p outputs. Of course if mp � n for a given linear system then
generic pole assignment is impossible, even when complex feedback gain is allowed (Hermann
& Martin 1977). The case mp � n remains unresolved, though a number of interesting results
are available (Hermann & Martin 1977, Willems & Hesselink 1978, Brockett & Byrnes 1981).
Present results do not apply to output feedback systems with symmetries or structured feedback
systems. More generally, one is also interested in situations where an optimal state feedback
gain is sought such that the closed loop response of the system is a best approximation of a
desired response, though the exact response may be unobtainable. In such cases one would
still hope to find a constructive method to compute the optimal feedback that achieves the
best approximation. The problem appears to be too difficult to tackle directly, however, and
algorithmic solutions are an attractive alternative.
81
82 Pole Placement for Symmetric Realisations Chapter 4
The development given in this chapter is loosely related to a number of recent articles. In
particular, Brockett (1989a) considers a least squares matching task, motivated by problems
in computer vision algorithms, that is related to the system approximation problem, though
his work does not include the effects of feedback. There is also an article by Chu (1992) in
which dynamical system methods are developed for solving inverse singular value problems,
a topic that is closely related to the pole placement question. The simultaneous multiple
system assignment problem considered is a generalisation of the single system problem and is
reminiscent of Chu’s (1991a) approach to simultaneous reduction of several real matrices.
In this chapter I consider a structured class of systems (those with symmetric state space
realisations) for which, to my knowledge, no previous pole placement results are available.
The assumption of symmetry of the realisation, besides having a natural network theoretic
interpretation, simplifies the geometric analysis considerably. It is shown that a symmetric
state space realisation can be assigned arbitrary (real) poles via output feedback if and only if
there are at least as many system inputs as states. This result is surprising since a naive counting
argument (comparing the number of free variables 12m�m� 1� of symmetric output feedback
gain to the number of poles n of a symmetric realization having m inputs and n states) would
suggest that 12m�m�1� n is sufficient for pole placement. To investigate the problem further
gradient flows of least squares cost criteria (functions of the matrix entries of realisations) are
derived on smooth manifolds of output feedback equivalent symmetric realisations. Limiting
solutions to these flows occur at minima of the cost criteria and relate directly to finding optimal
feedback gains for system assignment and pole placement problems. Cost criteria are proposed
for solving the tasks of system assignment, pole placement, and simultaneous multiple system
assignment.
The material presented in this chapter is based on the articles (Mahony & Helmke 1993,
Mahony et al. 1993). The theoretical material contained in Sections 4.1 to 4.4 along with the
simulations in Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the
numerical method proposed in Section 4.6 was presented at the 1993 Conference on Decision
and Control (Mahony et al. 1993). Much of the material presented in this chapter was developed
in conjunction with the results in the monograph (Helmke & Moore 1994b, Section 5.3), which
focusses on general linear systems.
The chapter is divided into seven sections. In Section 4.1 the specific problems considered
x4-1 Statement of the Problem 83
in the sequel are formulated and necessary conditions for generic pole placement and system
assignment are given. Section 4.2 develops the geometry of the set of symmetric state space
systems necessary for the development in later sections. In Section 4.3, a dynamical systems
approach to computing systems assignment problems for the class of symmetric state space
realizations is proposed while Section 4.4 applies the previous results to the pole placement and
the simultaneous multiple system assignment problems. A number of numerical investigations
are given in Section 4.4 which substantiate the theory presented in Sections 4.1 to 4.4. In Section
4.6 a numerical algorithm for computing feedback gains for the pole placement problem is
presented. The chapter concludes with a discussion of open questions and future work in
Section 4.7.
4.1 Statement of the Problem
In this section a brief review of symmetric systems is presented before the precise formulations
of the problems considered in the sequel are given and a pole placement result for symmetric
state space realizations is proved. The reader is referred to Anderson and Vongpanitlerd (1973)
for background material network theory.
A symmetric transfer function is a proper rational matrix function G�s� � Rm�m such that
G�s� � G�s�T �
For any such transfer function there exists a minimal signature symmetric realisation (Anderson
& Vongpanitlerd 1973, pg. 324)
�x � Ax �Bu�
y � Cx�
of G�s� such that �AIpq�T � AIpq and CT � IpqB, with Ipq � diag�Ip��Iq�, a diagonal
matrix with its first p diagonal entries 1 and the remaining diagonal entries -1. A signature sym-
metric realisation is a dynamical model of an electrical network constructed from p capacitors
and q inductors and any number of resistors.
84 Pole Placement for Symmetric Realisations Chapter 4
Static linear symmetric output feedback is introduced to a state space model via a feedback
law
u � Ky � v� K � KT �
leading to the “closed loop” system
�x � �A�BKC�x� Bv�
y � BTx��4�1�1�
In particular, symmetric output feedback, where K � KT � Rm�m, preserves the structure of
signature symmetric realisations and is the only output feedback transformation that has this
property.
A symmetric state space system (also symmetric realisation) is a linear dynamical system
�x � Ax �Bu� A � AT (4.1.2)
y � BT x� (4.1.3)
with x � Rn, u� y � Rm, A � Rn�n, B � Rn�m . Without loss of generality assume
that m � n, B is full rank and BTB � Im the m � m identity matrix. Symmetric state
space systems correspond to linear models of electrical RC-networks, constructed entirely
of capacitors and resistors. The networks are characterised by the property that the Cauchy-
Maslov1 index coincides with the McMillan degree. The matrix pair �A�B� � S�n��O�n�m�,where S�n� � fX � Rn�n j X � XTg the set of symmetric n � n matrices and O�n�m� �
fY � Rn�m j XTX � Img is used to represent a linear system of the form (4.1.2) and (4.1.3).
The set O�n�m� is the Stiefel manifold (a smoothnm� 12m�m� 1� dimensional submanifold
of Rn�m) of n�m matrices with orthonormal columns (cf. Lemma 3.1.1).
Two symmetric state space systems �A1� B1� and �A2� B2� are called output feedback
equivalent if
�A2� B2� � � �A1 �B1KBT1 �
T � B1� �4�1�4�
1The Cauchy-Maslov index for a real rational transfer function z�s� is defined as the number of jumps of z�s�from �� to �� less the number from �� to��. Bitmead and Anderson (1977) generalise the Cauchy-Maslovindex to real symmetric rational matrix transfer functions and show that it is equal to p � q, the signature of Ipq(Bitmead & Anderson 1977, Corollary 3.3).
x4-1 Statement of the Problem 85
holds for � O�n� � fU � Rn�n j UTU � Ing the set of n � n orthogonal matrices and
K � S�m� the set of symmetric m�m matrices. Thus the system �A2� B2� is obtained from
�A1� B1� using an orthogonal change of basis � O�n� in the state spaceRn and a symmetric
feedback transformation K � S�m�. It is easily verified that output feedback equivalence is
an equivalence relation on the set of symmetric state space systems.
Consider the following problem for the class of symmetric state space systems.
Problem A Given �A�B� � S�n� � O�n�m� a symmetric state space system let �F�G� �S�n��O�n�m� be a symmetric state space system which possesses the desired system structure.
Consider the potential
� : Rn�n �O�n�m�� R�
��A�B� :� jjA� F jj2 � 2jjB � Gjj2�
where jjX jj2 � tr�XTX� is the Frobenius matrix norm. Find a symmetric state space system
�Amin� Bmin� which minimises � over the set of all output feedback equivalent systems to
�A�B�. Equivalently, find a pair of matrices � min� Kmin� � O�n�� S�m� such that
�� � K� :� jj �A�BKBT � T � F jj2 � 2jj B � Gjj2�
is minimised over O�n�� S�m�. �
Such a formulation is particularly of interest when structural properties of the desired
realisations are specified. For example, one may wish to choose the “target system” �F�G�
with certain structural zeros. If an exact solution to the system assignment problem exists (i.e.
��Amin� Bmin� � 0) it is easily seen that �Amin� Bmin� will have the same structural zeros as
�F�G�. For general linear systems it is known that the system assignment problem (for general
feedback) is generically solvable only if there are as many inputs and as many outputs as states.
It is not surprising that this is the case for symmetric systems also.
Lemma 4.1.1 Let n and m be integers, n m, and let �F�G� � S�n��O�n�m�. Consider
matrix pairs �A�B� � S�n��O�n�m�.
a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices
86 Pole Placement for Symmetric Realisations Chapter 4
� O�n� and K � S�m� such that
�A �BKBT � T � F� B � G�
b) If m � n then the set of �A�B� � S�n� � O�n�m� for which an exact solution to the
system assignment problem exists is measure zero in S�n��O�n�m�. (I.e. for almost all
systems �A�B� � S�n�� O�n�m� no exact solution to the system assignment problem
exists.)
Proof If m � n then O�n�m� � O�n� and BT � B�1. For any �A�B� � S�n� � O�n�
choose � � K� � �GBT � GTFG �BTAB�. Thus,
�A�BKBT � T � GBTABGT �GBTB�GTFG� BTAB�BTBGT � F
and B � GBTB � G.
To prove part b) observe that since output feedback equivalence is an equivalence relation
the set of systems for which the system assignment problem is solvable are exactly those
systems which are output feedback equivalent to �F�G�. Consider the set
It is shown in Section 4.2 (Lemma 4.2.1) that F�F�G� is a smooth submanifold of S�n� �O�n�m�. But F�F�G� is the image of O�n� � S�m� via the continuous map � � K� �� �F �BGBT � T � G� and necessarily has dimension at most dimO�n��S�m� � 1
2n�n�1�� 1
2m�m�1�. The dimension ofS�n��O�n�m�however is 12n�n�1���nm� 1
2m�m�1��
(Helmke & Moore 1994b, pg. 24). Thus,
dimS�n��O�n�m�� dimO�n�� S�m� � �n�m��m� 1��
which is strictly positive for 0 � m � n. Thus, for m � n the set F�F�G� is a submanifold
of S�n�� O�n�m� of non-zero co-dimension and therefore has measure zero.
A similar task to Problem A is that of pole placement for symmetric state space realizations.
The pole placement task for symmetric systems is; given an arbitrary set of numbers s1
x4-1 Statement of the Problem 87
� � � sn in R and an initial m � m symmetric transfer function G�s� � GT �s� with a
symmetric realisation, find a symmetric matrix K � S�m� such that the poles of GK�s� �
�Im�G�s�K��1G�s� are exactly s1� � � � � sn. Rather than tackle this problem directly, consider
the following variant of the problem.
Problem B Given �A�B� � S�n�� O�n�m� a symmetric state space system let F � S�n�
be a symmetric matrix. Define
��A�B� :� jjA� F jj2�� � K� :� jj �A�BKBT � T � F jj2�
Find a symmetric state space system �Amin� Bmin� which minimises� over the set of all output
feedback equivalent systems to �A�B�. Respectively, find a pair of matrices � min� Kmin� �O�n�� S�m� which minimises over O�n�� S�m�. �
Problem B minimises a cost criterion that assigns the full eigenstructure of the closed loop
system. Two symmetric matrices have the same eigenstructure (up to orthogonal similarity
transformation) if and only if they have the same eigenvalues (since any symmetric matrix may
be diagonalised via an orthogonal similarity transformation). Thus, Problem B is equivalent
to solving the pole placement problem for symmetric systems (assigning the eigenvalues of
the closed loop system). The advantage of considering Problem B rather than a standard
formulation of the pole placement task lies in the smooth nature of the optimization problem
obtained.
It is of interest to consider generic conditions on symmetric state space systems for the
existence of an exact solution to Problem B (i.e. the existence of � min� Kmin� such that
� min� Kmin� � 0). This is exactly the classical pole placement question about which much
is known for general linear systems (Byrnes 1989, Wang 1992). The following result answers
(at least in part) this question for symmetric state space systems. It is interesting to note that
the necessary conditions for “generic” pole placement for symmetric state space systems are
much stronger than those for general linear systems.
Lemma 4.1.2 Let n and m be integers, n m, and let F � S�n� be a real symmetric matrix.
Consider matrix pairs �A�B� � S�n��O�n�m�.
88 Pole Placement for Symmetric Realisations Chapter 4
a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices
� O�n� and K � KT � Rm�m such that
�A�BKBT � T � F� �4�1�5�
b) If m � n then there exists an open set of matrix pairs �A�B� � S�n�� O�n�m� of the
above form such that eigenstructure assignment (to the matrix F ) is impossible.
Proof Part a) follows directly from Lemma 4.1.1.
Observe that the set of matrix pairs f�A�B� j A � BBTABBT g �� S�n� � O�n�m� is
Zariski closed in S�n��O�n�m� and consequently of measure zero (cf. Martin and Hermann
(1977a) for a basic discussion Zariski closed sets). There exists a matrix pair �A�B� �S�n�� O�n�m� and matrices � O�n� and K � KT � Rm�m such that (4.1.5) is satisfied
and A �� BBTABBT or else part b) is trivially true. Direct manipulations of (4.1.5), recalling
that BTB � Im, yield
K � BT � TF �A�B�
Substituting this back into (4.1.5) gives
TF � �A� BBTABBT � �BBT TF BBT �
Observe that
tr��A�BBTABBT �TBBT TF BBT
�� tr
��BBT �A� BBTABBT �BBT � TF
�� 0�
and taking the squared Frobenius norm of TF gives
jjF jj2 � jj�A�BBTABBT �jj2 � jjBBT TF BBT jj2�
recalling that the Frobenius norm is invariant under orthogonal transformations. It follows
directly that jjF jj2 jj�A�BBTABBT �jj2.
Since �A�B� was chosen deliberately such that A �� BBTABBT one may consider the
x4-1 Statement of the Problem 89
related matrix pair �A�� B�� � ��A�B�, where � ��
jjF jj2�1jjA�BBTABBT jj2
� 12 . By construction
jj�A� �B�B�TA�B�B�T �jj2 � jjF jj2 � 1 � jjF jj2
and no solution to the eigenstructure assignment problem exists for the system �A�� B��.
Moreover, the map �A�B� � jj�A � BBTABBT �jj2 is continuous and it follows that there
is an open neighbourhood of systems around �A�� B�� for which the eigenstructure assignment
task cannot be solved.
Remark 4.1.3 It follows directly from the proof of Lemma 4.1.2 that eigenstructure assignment
of a symmetric state space system �A�B� � S�n��O�n�m� to an arbitrary closed loop matrix
F � S�n� is possible only if
jjF jj2 jjA�BBTABBT jj2�
�
Remark 4.1.4 One may weaken the hypothesis of Lemma 4.1.2 considerably to deal with
matrix pairs �A�B� � S�n�� Rn�m, for which B is not constrained to satisfy BTB � Im
and for which m may be greater than n. The analogous statement is that eigenstructure
assignment is generically possible if and only if rankB n. The proof is similar to that given
above observing that the projection operator BBT (for BTB � Im) is related to the general
projection operator B�BTB�yBT , where y represents the pseudo-inverse of a matrix. For
example, the feedback matrix yielding exact system assignment for rankB n is
K � �BTB�yBT �F � A�B�BTB�y�
�
A further problem considered is that of simultaneous multiple system assignment. This is a
difficult problem about which very little is presently known. The approach taken is to consider
a generalisation of the cost criterion � for a single system.
90 Pole Placement for Symmetric Realisations Chapter 4
Problem C For any integer N � N let �A1� B1�� � � � � �AN � BN� and �F1� G1�� � � � � �FN � GN�
be two sets of N symmetric state space systems. Define
�N� � K� :�NXi�1
jj �Ai � BiKBTi �
T � Fijj2 � 2NXi�1
jj Bi �Gijj2�
Find a pair of matrices � min� Kmin� � O�n��S�m�which minimises�N over O�n��S�m�.�
4.2 Geometry of Output Feedback Orbits
It is necessary to briefly review the Riemannian geometry of the spaces on which the optimiz-
ation problems stated in Section 4.1 are posed. The reader is referred to Helgason (1978) and
the development in Chapter 5 for technical details on Lie-groups and homogeneous spaces and,
Helmke and Moore (1994b) for a development of dynamical systems methods for optimization
along with applications in linear systems theory.
The set O�n�� S�m� forms a Lie group under the group operation � 1� K1� � 2� K2� �
� 1 2� K1�K2�. It is known as the output feedback group for symmetric state space systems.
The tangent spaces of O�n�� S�m� are
T��K��O�n�� S�m�� � f�� �� j � � Sk�n�� � S�m�g�
where Sk�n� � f� � Rn�n j � � ��T g the set of n � n skew symmetric matrices. The
Euclidean inner product onRn�n �Rn�m is given by
h�A�B�� �X� Y �i � tr�ATX� � tr�BTY �� �4�2�1�
By restriction, this induces a non-degenerate inner product on the tangent space T�In�0��O�n��S�m�� � Sk�n� � S�m�. The Riemannian metric considered on O�n� � S�m� is the right
invariant group metric
h��1 �1�� ��2 �2�i � 2tr��T1 �2� � 2tr�T1 2��
x4-2 Geometry of Output Feedback Orbits 91
The right invariant group metric is generated by the induced inner product onT�In �0��O�n��S�m��, mapped to each tangent space by the linearization of the diffeomorphism ��� k� ��� � k�K� for � � K� � �O�n��S�m��. It is readily verified that this defines a Riemannian
metric which corresponds, up to a scaling factor, to the induced Riemannian metric on O�n��S�m� considered as a submanifold of Rn�n �Rn�m. The scaling factor 2 serves to simplify
the algebraic expressions obtained in the sequel.
Let �A�B� � S�n�� O�n�m� be a symmetric state space system. The symmetric output
feedback orbit of �A�B� is the set
F�A�B� � f� �A�BTKB� T � B� j � O�n�� K � S�m�g� �4�2�2�
of all symmetric realisations that are output feedback equivalent to �A�B�. Observe that no
assumption on the controllability of the matrix pair �A�B� is made.
Lemma 4.2.1 The symmetric output feedback orbit F�A�B� is a smooth submanifold of
S�n�� O�n�m� with tangent space at a point �A�B� given by
b) There exists a global minimum �Amin� Bmin� � F�A�B� of �,
��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�
c) The submanifoldF�A�B� � S�n��O�n�m� is closed in S�n��Rn�m.
Proof To prove part a), choose 0 such that the sublevel set J � f� � K� j �� � K� � gis non empty. Then �jJ : J � �0� � is a continuous map2 from a compact space into the
2Let f : M � N be a map between two sets M and N . Let U �M be a subset of M , then f jU : U � N isthe restriction of f to the set U .
x4-3 Least Squares System Assignment 95
reals and the minimum value theorem (Munkres 1975, pg. 175) ensures the existence of
� min� Kmin�. The proof of part b) is analogous to that for part a).
To prove c) Assume that F�A�B� is not closed. Choose a boundary point �F�G� �F�A�B��F�A�B� in the closure3 ofF�A�B�. By part b) there exists a minimum �Amin� Bmin� �F�A�B� such that
��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�
� 0
since �F�G� is in the closure ofF�A�B�. But this implies jjAmin�F jj2�2jjBmin�Gjj2 � 0 and
consequently �Amin� Bmin� � �F�G�. This contradicts the assumption that �F�G� �� F�A�B�.
Having determined the existence of a solution to the system assignment problem one may
consider the problem of computing the global minima of the cost functions � and �.
Theorem 4.3.3 Let �A�B�� �F�G� � S�n�� O�n�m� be symmetric state space systems. Let
measure the Euclidean distance between two symmetric realisations. Then
a) The gradient of � with respect to the normal metric is
grad��A�B� �
�B� ��A� ��A� F � �BGT � GBT �� � BBT �A� F �BBT
��A� F � �BGT � GBT �B
�CA �
�4�3�2�
b) The critical points of � are characterised by
�A� F � � GBT �BGT �
0 � BT �A� F �B��4�3�3�
3Let U �M be a subset of a topological spaceM . The closure of U , denotedU , is the intersection of all closedsets in the topology which contain the set U .
96 Pole Placement for Symmetric Realisations Chapter 4
c) Solutions of the gradient flow � �A� �B� � �grad��A�B�,
�A � �A� ��A� F � �BGT � GBT ��� BBT �A� F �BBT
�B � ���A� F � � BGT �GBT �B�4�3�4�
exist for all time t 0 and remain in F�A�B�.
d) Any solution to (4.3.4) converges as t�� to a connected set of matrix pairs �A�B� �F�A�B� which satisfy (4.3.3) and lie in a single level set of �.
Proof The gradient is computed using the identities4
are well defined. Here Irepresents the identity operator and BTB � Im by assumption. The
tangent space of O�n� at a point is TO�n� � f� j � � Sk�n�g with Riemannian metric
h�1 ��2 i � 2tr��T1 �2�, corresponding to the right invariant group metric on O�n�.
Theorem 4.3.8 Let �A�B�� �F�G� � S�n� � O�n�m� be symmetric state space systems.
Define
�� : O�n�� R
�� :� jjQ�A� TF �jj2 � 2jj B �Gjj2 (4.3.12)
then,
x4-3 Least Squares System Assignment 101
a) The gradient of �� with respect to the right invariant group metric is
grad��� � � � Q�A� TF � T � F � � � BGT �GBT T � �
b) The critical points � O�n� of �� are characterised by
�F� Q�A� TF � T � � � BGT �GBT T ��
and correspond exactly to the orthogonal matrix component of the critical points (4.3.8)
of �.
c) The negative gradient flow minimising�� is
� � �F� Q�A� TF � T � � � BGT � GBT T � � �0� � 0� �4�3�13�
Solutions to this flow exist for all time t 0 and converge as t �� to a connected set
of critical points contained in a level set of ��.
Proof The gradient and the critical point characterisation are proved as for Theorem 4.3.3.
The equivalence of the critical points is easily seen by solving (4.3.8) for independently
of K. Part c) follows from the observation that (4.3.13) is a gradient flow on a compact
manifold.
Fixing constant in the second line of (4.3.9) yields a linear differential equation in K
with solution
K�t� � e�t�K�0� �BT �A� TF �B� �BT �A� TF �B�
It follows that K�t�� �BT �A� TF �B as t��. Observe that
��� � � jjQ�A� TF �jj2 � 2jj B � Gjj2
� jj �A�B�BT �A� TF �B�BT � T � F jj2 � 2jj B �Gjj2
� �� ��BT �A� TF �B��
102 Pole Placement for Symmetric Realisations Chapter 4
Recall also that for exact system assignment it has been shown that K � BT � F T �A�B, Lemma 4.1.2. Thus, it is reasonable to consider solutions �t� of (4.3.13) along with
continuously changing feedback gain
K�t� � BT � �t�TF �t�� A�B� �4�3�14�
as an approach to solving least squares system assignment problems. A numerical scheme
based on this approach is presented in Section 4.6.
4.4 Least Squares Pole placement and Simultaneous System Assign-
ment
Having developed the necessary tools it is a simple matter to derive gradient flow solutions to
Problem B and Problem C described in Section 4.1.
Corollary 4.4.1 Pole Placement Let �A�B� � S�n�� O�n�m� be a symmetric state space
system and let F � S�n� be a given symmetric matrix. Define
� : F�A�B�� R� �A�B� � jjA� F jj2� : O�n�� S�m�� R� � � K� � jj �A�BKBT � T � F jj2�
then
a) The gradient of � and with respect to the normal and the right invariant group metric
respectively are
grad��A�B� �
�B� ��A� �A� F �� �BBT �A� F �BBT
�A� F �B
�CA � �4�4�1�
and
grad� � K� �
�B� � �A�BKBT � T � F �
BT �A� BKBT � TF �B
�CA � �4�4�2�
x4-4 Least Squares Pole placement and Simultaneous System Assignment 103
b) The critical points of � and are characterised by
�A� F � � 0
BT �A� F �B � 0� (4.4.3)
and
� �A�BKBT � T � F � � 0
BT � TF �A�B � K� (4.4.4)
respectively.
c) Solutions of the gradient flows � �A� �B� � �grad��A�B�
�A � �A� �A� F ��� BBT �A� F �BBT
�B � ��A� F �B�4�4�5�
exist for all time t 0 and remain in F�A�B�. Moreover, any solution of (4.4.5)
converges as t �� to a connected set of matrix pairs �A�B� � F�A�B� which satisfy
(4.4.3) and lie in a single level set of �.
d) Solutions of the gradient flow � � � �K� � �grad� � K�
� � �� �A�BKBT � T � F �
�K � �BT �A� BKBT � TF �B�4�4�6�
exist for all time t 0 and remain in a bounded subset of O�n� � S�m�. Moreover,
as t � � any solution of (4.4.6) converges to a connected subset of critical points in
O�n�� S�m� which are contained in a single level set of �.
e) If � �t�� K�t�� is a solution to (4.4.6) then� �t��A� BK�t�BT � T �t�� T �t�B
�is a
solution of (4.4.5).
Proof Consider the symmetric state space system �A�B� � S�n�� O�n�m� and the matrix
pair �F�G0� � S�n�� Rn�m where G0 is the n �m zero matrix. Observe that ��A�B� �
��A�B� � 2jjBjj2 and similarly �� � K� � � � K� � 2jjBjj2, where � and � are given by
104 Pole Placement for Symmetric Realisations Chapter 4
(4.3.1) and (4.3.6) respectively. Since the norm jjBjj2 is constant on F�A�B� the structure
of the above optimization problems is exactly that considered in Theorem 4.3.3 and Theorem
4.3.6. The results follow as direct corollaries.
Similar to the discussion at the end of Section 4.3 the pole placement problem can be solved
by a gradient flow evolving on the orthogonal group O�n� alone.
Corollary 4.4.2 Let �A�B� � S�n� � O�n�m� be a symmetric state space system and let
F � S�n� be a symmetric matrix. Define
�� : O�n�� R
�� :� jjQ�A� TF �jj2
where Q�X� � �I�P��X� � X � BBTXBBT (4.3.11). Then,
a) The gradient of �� with respect to the right invariant group metric is
grad��� � � � Q�A� TF � T � F � �
b) The critical points � O�n� of �� are characterised by
�F� Q�A� TF � T � � 0�
and correspond exactly to the orthogonal matrix component of the critical points (4.4.4)
of �.
c) The negative gradient flow minimising �� is
� � �F� Q�A� TF � T � � �0� � 0� �4�4�7�
Solutions to this flow exist for all time t 0 and converge as t �� to a connected set
of critical points contained in a level set of ��.
Proof Consider the matrix pair �F�G0� � S�n��Rn�m where G0 is the n�m zero matrix.
It is easily verified that��� � � ��� ��2jjBjj2 where �� is given by (4.3.12). The corollary
follows as a direct consequence of Theorem 4.3.8.
x4-4 Least Squares Pole placement and Simultaneous System Assignment 105
Simultaneous system assignment is known to be a hard problem which generically does
not have an exact solution. The best that can be hoped for is an approximate solution provided
by a suitable numerical technique. The following discussion is a direct generalisation of the
development given in Section 4.3. The generalisation is similar to that employed by Chu
(1991a) when considering the simultaneous reduction of real matrices.
For any integerN � N let �A1� B1�� � � � � �AN � BN� � S�n��O�n�m�be given symmetric
state space systems. The output feedback orbit for the multiple system case is
F��A1� B1�� � � � � �AN � BN�� :�
f� �A1 � B1KBT1 �
T � B1�� � � � � � �AN � BNKBTN �
T � BN � j � O�n�� K � S�m�g�
An analogous argument to Lemma 4.2.1 shows that F��A1� B1�� � � � � �AN � BN�� is a smooth
Indeed, F��A1� B1�� � � � � �AN � BN �� is a Riemannian manifold when equipped with the normal
metric, defined analogously to the normal metric on F�A�B�.
Corollary 4.4.3 For any integerN � N let �A1� B1�� � � � � �AN � BN � and �F1� G1�� � � �, �FN � GN�
be two sets of N symmetric state space systems. Define
�N � F��A1� B1�� � � � � �AN � BN ��� R
�N ��A1� B1�� � � � � �AN � BN�� :�NXi�1
�jjAi � Fijj2 � 2jjBi �Gijj2
�
and
�N � O�n�� S�m�� R
�N� � K� :�NXi�1
�jj �Ai �BiKBT
i � T � Fijj2 � 2jj Bi �Gijj2
��
Then,
106 Pole Placement for Symmetric Realisations Chapter 4
a) The negative gradient flows of �N and �N with respect to the normal and the right
invariant group metric are
�Ai � �Ai�NXj�1
��Aj � Fj� � BjG
Tj �GjB
Tj
���
NXj�1
BiBTj �Aj � Fj�B
Tj Bi�
�Bi � �NXj�1
��Aj � Fj � � BjGTj �GjB
Tj �Bi� (4.4.8)
for i � 1� � � �N , and
� �NXj�1
��Aj � Fj� �BjG
Tj � GjB
Tj
�
�K � �NXj�1
BTj �Aj � BjKBT
j � Fj T �Bj � (4.4.9)
respectively.
b) The critical points of �N and �N are characterised by
NXj�1
�Aj � Fj � �NXj�1
�GjB
Tj � BjG
Tj
�NXj�1
BTj �Aj � Fj�Bj � 0� (4.4.10)
and
NXj�1
� �Aj � BjKBTj �
T � Fj � �NXj�1
� BjG
Tj �GjB
Tj
T�
K �NXj�1
BTj � Fj
T �Aj�Bj � (4.4.11)
respectively.
c) Solutions of the gradient flow (4.4.8) exist for all time t 0 and remain in F��A1� B1�,
� � � � �AN � BN ��. Moreover, any solution of (4.4.8) converges as t � � to a connected
set of matrix pairs ��A1� B1�, � � �, �AN � BN�� � F��A1� B1�, � � �, �AN � BN�� which
satisfy (4.4.10) and lie in a single level set of �N .
x4-5 Simulations 107
d) Solutions of the gradient flow (4.4.9) exist for all time t 0 and remain in a bounded
subset of O�n� � S�m�. Moreover, as t � � any solution of (4.4.9) converges to a
connected subset of critical points in O�n��S�m�which are contained in a single level
set of �N .
e) If � �t�� K�t�� is a solution to (4.4.6) then �Ai�t��Bi�t�� � � �Ai�BiKBTi �
T � Bi�,
for i � 1� � � � � N , is a solution of (4.4.8).
Proof Observe that the potentials�N and �N are linear sums of potentials of the form � and
� considered in Theorem 4.3.6 and Theorem 4.3.3. The proof is then a simple generalisation
of the arguments employed in the proofs of these theorems.
4.5 Simulations
A number of simulations studies have been completed to investigate the properties of the
gradient flows presented and obtain general information about the system assignment and pole
placement problems5.
In the following simulations the solutions of the ordinary differential equations considered
were numerically estimated using the MATLAB function ODE45. This function integrates
ordinary differential equations using the Runge-Kutta-Fehlberg method with an automatic step
size selection. Numerical integration is undertaken using fourth order Runge-Kutta method
while the accuracy of each iteration over the step length is checked against a fifth order method.
At each step of the interpolation the step length is reduced until the error between the fourth
and fifth order method estimates is less than a pre-specified constantE � 0. In the simulations
undertaken the error bound was set to E � 1 � 10�7, this allowed for reasonable accuracy
without excessive computational cost.
Due to Lemma 4.1.1 one does not expect to see convergence of the solution of (4.3.4) to an
exact solution of the System Assignment problem for arbitrary initial condition (unless n � m
in which case a solution can be computed algebraically). The typical behaviour of solutions to
5Indeed, computing the gradient flows (4.3.4) and (4.4.1) has already helped in understanding of the poleplacement and system assignment tasks since it was the non-convergence of the original simulations that lead to afurther investigation of the existence of exact solutions to the problems, and eventually to Lemmas 4.1.1 and 4.1.2.
108 Pole Placement for Symmetric Realisations Chapter 4
Time
Pote
ntia
l Ψ
Figure 4.5.1: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a typical solution to (4.3.4).
(4.3.4) is shown in Figure 4.5.1, where the potential,��A�t�� B�t��, for �A�t�� B�t�� a solution
to (4.3.4), is plotted verses time. The potential is plotted on log10 scaled axis for all the plots
presented to display the linear convergence properties of the solution. The initial conditions
�A0� B0� � S�5� � O�5� 4� and the target system �F�G� � S�5� � O�5� 4� were randomly
generated apart from symmetry and orthogonality requirements. The state dimension, n � 5,
and the input and output dimension, m � 4, were arbitrarily chosen. Similar behaviour is
obtained for all simulations for any choice of n and m for which m � n. In Figure 4.5.1,
observe that the potential converges to a non-zero constant limt����A�t�� B�t�� � 9�3. For
the limiting value of the solution to be an exact solution to the system assignment problem one
would require limt����A�t�� B�t�� � 0.
In contrast, Lemma 4.1.2 ensures only that the pole placement task is not solvable on some
open set of symmetric state space systems but leaves open the question of whether other open
sets of systems exists for which the pole placement problem is solvable. Simulations show that
the pole placement problem is indeed solvable for some open sets of symmetric state space
systems. Figure 4.5.2 shows a plot of the potential ��A�t�� B�t�� (cf. Corollary 4.4.1) verses
time for �A�t�� B�t�� a solution to (4.4.5). The initial conditions and target matrix here were
the initial conditions �A0� B0� and the state matrix F , from �F�G�, used to generate Figure
4.5.1. The plot clearly shows that the potential converges exponentially (linearly in the log10
verses unscaled plot) to zero. Consequently, the solution �A�t�� B�t�� converges to an exact
solution the pole placement problem, limt��A�t� � F . Comparing Figures 4.5.1 and 4.5.2
and recalling that they were generated using the same initial conditions, one sees explicitly that
x4-5 Simulations 109
Simulation ��A�40�� B�40��1 2�63� 10�10
2 2�09� 10�9
3 5�65� 10�9
5 3�35� 10�10
6 3�16� 10�11
7 1�62� 10�11
8 1�05� 10�10
9 3�68� 10�10
10 1�20� 10�8
11 2�72� 10�8
Table 4.5.1: Potentials��Ai�40�� Bi�40�� for experiments i � 1� � � � � 10 where �Ai�t�� Bi�t��is a solution to (4.3.4) with initial conditions �A i�0�� Bi�0�� � �A0 � Ni� UiB0� � S�n� �O�n�m�. Here Ni � NT
iis a randomly generated symmetric matrix with jjNijj � 0�25 and
Ui � O�n� is an randomly generated orthogonal matrix with jjUi � Injj � 0�25.
the system assignment problem is strictly more difficult than the pole placement problem.
One may ask does the particular initial condition �A0� B0� lie in an open set of initial
conditions for which the pole placement problem can be exactly solved. A series of ten
simulations was completed, integrating (4.4.5) for initial conditions �Ai� Bi� close to �A0� B0�,
jjA0 �Aijj� jjB0 �Bijj � 0�5. Each integration was carried out over a time interval of forty
seconds and the final potential��A�40�� B�40�� for each simulation is given in Table 4.5. The
plot of � verses time for each simulation was qualitatively the same as Figure 4.5.2. It is my
conclusion from this that the pole placement problem could be exactly solved for all initial
conditions in a neighbourhood of �A0� B0�.
Remark 4.5.1 It may appear reasonable that the pole placement problem could be solved for
all initial conditions with state matrix A0 in a neighbourhood of the desired structure F . In
fact simulations have shown this to be false.
LetC � O�n� n�m� be a matrix orthogonal toB, (i.e. BTC � 0). Observe that a solution
to the pole placement problem requires TF � A � BKBT and thus
TF C �AC � 0 �� F C � AC � 0�
Since A and C are specified by the initial condition (the span of C is the important object)
110 Pole Placement for Symmetric Realisations Chapter 4
Time
Pote
ntia
l Φ
Figure 4.5.2: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a solution to (4.4.5) with initialconditions �A0� B0� for which the solution �A�t�� B�t�� converges to a global minimum of �.
then � Rn�n must lie in the linear subspace defined by the kernel of the linear map
� F C � AC. Of course must also lie in the set of orthogonal matrices and the
intersection of the kernel of � F C � AC with the set of orthogonal matrices provides
an exact criterion for the existence of a solution to the pole placement problem.
The difficulty for initial conditions where jjA0 � F jj is small is related to the fact that the
solution to the pole placement problem for initial conditions �A0� B0� � �F�B0�, (i.e. the state
matrix already has the desired structure), is given by the matrix pair �In� 0� � O�n� � S�m�
in the output feedback group. The matrix In lies at an extremity of O�n� in Rn�n and
it is reasonable that small perturbations of �A0� B0� may shift the kernel of the linear map
� F C � A0C such that it no longer intersects with O�n�. �
An advantage, mentioned in Section 4.3, in computing the limiting solution of (4.4.7)
(Figure 4.5.3) compared to computing the full gradient flow (4.4.5) (Figure 4.5.2) is the
associated reduction in the order of the O.D.E. that must be solved. Interestingly, it appears
that the solutions of the projected flow (4.4.7) will also converge more quickly than those
of (4.4.5). Figure 4.5.3 shows the potential ��� �t�� (cf. Corollary 4.4.2) verses time for
�t� a solution to (4.4.7). The initial conditions for this simulation were 0 � In while the
specified symmetric state space system used for computing the norm �� was �A0� B0� the
initial conditions for Figures 4.5.1 and 4.5.2. Observe that from time t � 1�2 to t � 2, Figure
4.5.3 displays unexpected behaviour which I interpret to be numerical error. The presence
of this error is not surprising since the potential (and consequently the gradient) is of order
Table 4.5.2: Linear rate of convergence for the solution of (4.4.5), given by �, and (4.4.7)given by ��. The final column shows the ratio between the rates of convergence for the twodifferential equations.
10�12 � E2, where E is the error bound chosen for the ODE routine in MATLAB. The
relationship of numerical error to order of the potential was checked by adjusting the error
bound E in a number of early simulations.
The exponential (linear) convergence rates of the solution to (4.4.7) and to (4.4.5) are
computed by reading off the slope of the linear sections of plots 4.5.2 and 4.5.3. For the
example shown in Figures 4.5.2 and 4.5.3 convergence of the solutions is characterised by
��A�t�� B�t�� � e��t� � � 2�05
��� �t�� � e���t� �� � 53
where �A�t�� B�t�� is a solution to (4.4.5) and �t� is a solution to (4.4.7). Five separate
experiments were completed in which the two flows were computed for randomly generated
target matrices and initial conditions with n � 5 and m � 4. The linear convergence rates
computed from these five experiments are given in Table 4.5. I deduce that solutions of (4.4.7)
converge around twenty times faster than solutions to (4.4.5) when the systems considered
have five states and four inputs and outputs. A brief study of the behaviour of systems with
other numbers of states and inputs indicate that the ratio between convergence rates is around
an order of magnetude.
In the system assignment problem Lemma 4.1.1 ensures that an exact solution to the
system assignment problem does not generically exist. The gradient flow (4.3.4), however, will
certainly converge to a connected set of local minima of the potential �, Theorem 4.3.3. An
important question to ask concerns the structure the critical level set associated with the local
minima of � may have. In particular, one may ask whether the level set is a single point or is
112 Pole Placement for Symmetric Realisations Chapter 4
Time
φ∗Po
tent
ial
Figure 4.5.3: Plot of �����t�� verses t for ��t� a solution to (4.4.5) with initial conditions��0� � In the identity matrix. The potential ����� :� jj�A0 � �TF�� � B0B
T
0 �A0 ��TF��B0B
T
0 jj2 is computed with respect to the initial conditions �A0� B0� used in Figures
4.5.1 and 4.5.2.
it a submanifold (at least locally) of F�A�B�.
Remark 4.5.2 Observe that critical level sets of � are given by two algebraic conditions
jjgrad��A�B�jj � 0 and ��A�B� � �0, for some fixed �0, thus they are algebraic varieties
of the closed submanifold F�A�B� � Rn�n �Rn�m. It follows, apart from a set of measure
zero in F�A�B� (singularities of the algebraic conditions), that the critical sets will locally
have submanifold structure in F�A�B�. �
Rather than consider the computationally huge task of mapping out the local minima of
� by integrating out (4.3.4) for many different initial conditions in F�A�B�, it is possible
to obtain some qualitative information in the vicinity of a given local minima). Choosing
any initial condition and integrating (4.3.4) for a suitable time interval an estimate of a local
minima �A�� B�� is obtained. If this point is an isolated minima then it should be locally
attractive. By choosing a number of initial conditions �Ai� Bi� in the vicinity of �A�� B��
and integrating (4.3.4) a second time one obtains new estimates of local minima �A�i � B�i �.
If �A�� B�� approximates an isolated local minima then the ratio
ri �jj�A�i � B�
i �� �A�� B��jjjj�Ai� Bi�� �A�� B��jj �4�5�1�
should be approximately zero. If �A�� B�� is not isolated then one expects the ratio ri to
be significantly non-zero. Of course ri should be less than one on average since the flow is
x4-6 Numerical Methods for Symmetric Pole Placement 113
Ratio r i
Freq
uenc
y
Figure 4.5.4: Plot of frequency distribution of r i given by (4.5.1) computed for the limitingvalues of 100 simulations with initial conditions close to �A�� B��.
convergent. The difficulty in this approach is deciding on suitable time intervals for the various
integrations. The first time interval was determined by repeatedly integrating over longer and
longer time intervals (for the same initial conditions) until the norm difference between the
final values was less than 1 � 10�8. An initial time interval of two hundred seconds was
found to be suitable. Each subsequent simulation was integrated over a time interval of fifty
seconds. The results of one hundred measurements of the ratio ri for a given estimated local
minima �A�� B�� are plotted as a frequency plot, Figure 4.5.4. The frequency divisions for
this plot are 0�05, thus in the one hundred experiments undertaken eleven experiments yielded
an estimate of ri between 0�325 and 0�375. It is obvious from Figure 4.5.4 that the probability
of ri being zero is small and one concludes that the critical sublevel sets of � have a local
submanifold structure of non-zero dimension. In particular, the local minima of � are not
isolated.
4.6 Numerical Methods for Symmetric Pole Placement
In this section a numerical algorithm, based on the continuous-time flow (4.4.7) coupled with
the feedback gain (4.3.14) is proposed. The algorithm is analogous to those discussed in
Chapters 2 and 3.
Let �A�B� be a symmetric output feedback system and let F � S�n� posses the desired
114 Pole Placement for Symmetric Realisations Chapter 4
closed loop eigenstructure. For 0 � O�n� consider the iterative algorithm generated by
i�1 � ie��i�T
iFi�Q�T
iFi�A��� (4.6.1)
Ki � BT � Ti F i �A�B� (4.6.2)
for i � N and i a sequence of positive real numbers termed time-steps. Observe that the
Lie-bracket � Ti F i, Q� T
i F i � A�� is skew symmetric, hence, e��i�Ti Fi�Q�T
i Fi�A��
is orthogonal and i�1 lies in O�n�.
To motivate the algorithm observe that
d
d� ie
�� �TiFi�Q�T
iFi�A��
������0
� i� Ti F i�Q�
Ti F i �A��
� �F� iQ�A� Ti F i�
Ti � i
the negative gradient of �� at i (cf. Corollary 4.4.2). Thus, ie�� �Ti Fi�Q�T
i Fi�A��
represents a curve in O�n�, passing through i at time � � 0, and with first derivative equal to
�grad��� i�. Indeed, the algorithm proposed can be thought of as a modified gradient descent
algorithm where instead of straight line interpolation the curves ie�� �T
i Fi�Q�Ti Fi�A�� are
used.
To implement (4.6.1) it is necessary to choose a time-step i for each step of the recursion.
A convenient criteria for determining a suitable time-step is to minimise the smooth function
���� i� i� � ��� i�1�� ��� i�� �4�6�3�
In particular, one would like to ensure that ���� i� �� is strictly negative unless i is a
equilibrium point of ��. The following argument is analogous to the derivation of step-size
selection schemes given in Section 2.2.
Lemma 4.6.1 Let �A�B� be a controllable6 symmetric output feedback system and F �S�m�, F �� 0, posses the desired closed loop eigenstructure. For any i � O�n� such that
6I.e. the controllability matrix �B AB A2B � � �An�1B� is full rank. It is easily shown that controllability of�A�B� ensures thatQ�A� �� 0.
x4-6 Numerical Methods for Symmetric Pole Placement 115
grad��� i� �� 0, the recursive estimate i�1 � ie��i�T
iFi�Q�T
iFi�A��, where
i �1
4jjF jj�jjP� Ti F i�jj� jjQ�A�jj�
� �4�6�4�
satisfies���� i� i� � ��� i�1�� ��� i� � 0.
Proof Let i�1��� � ie�� �T
i Fi�Q�Ti Fi�A�� for an arbitrary time-step � and define
Xi � Ti F i and Xi�1��� � T
i�1���F i�1���. The Taylor expansion for Xi�1��� is
The controllability of �A�B� along with the assumptions grad��� i� �� 0 and F �� 0 ensures
that the quadratic coefficient of���u� i� �� does not vanish and it is easily seen that its unique
minimum is strictly negative and occurs for � � i of (4.6.4). The result follows since
0 � ���u� i� i� ���� i� i�.
Theorem 4.6.2 Let �A�B� be a controllable symmetric output feedback system and let F �S�n�, F �� 0, posses the desired eigenstructure. For a given estimate i � O�n�, let i be
given by (4.6.4). The algorithm (4.6.1)
i�1 � ie��i�T
iFi�Q�T
iFi�A��� �4�6�5�
116 Pole Placement for Symmetric Realisations Chapter 4
has the following properties.
a) The algorithm defines an iteration on O�n�.
b) Fixed points of the algorithm are the equilibrium points of (4.4.7).
c) If i is a solution to (4.6.5) then the real sequence ��� i� is strictly monotonic decreasing
unless there is some i � N with i a fixed point of the algorithm.
d) Any solution i to (4.6.5) converges as i�� to a set of equilibrium points contained in a
level set of ��.
Proof Part a) follows from the observation that e��i�TiFi�Q�T
iFi�A�� is orthogonal. Fixed
points of the recursion are those for which the first derivative of � i�1��� vanishes (Lemma
4.6.1) and correspond exactly to the equilibrium points of (4.4.7). This proves part b) while
part c) is a corollary of Lemma 4.6.1.
To prove part d) observe that O�n� is a compact set, and thus��� i�, a bounded monoton-
ically decreasing sequence, must converge. This implies that ���� i� i�� 0 as i� �. It
follows that i converges to a level set of�� such that for any in this set���� � � �� � 0.
Lemma 4.6.1 ensures that any point in this set is an equilibrium point of (4.4.7).
Remark 4.6.3 Observe that there is an associated sequence of realisations
Ai � i�A�BKiBT � T
i
Bi � iB
for any solution � i� Ki� of (4.6.1) and (4.6.2). �
A primary aim in developing the algorithm (4.6.1) is to provide a reliable numerical tool
with which to investigate the structure of the pole placement (system assignment) problem
for symmetric realisations. Figure 4.6.1 is a simulation for a fifth order symmetric state space
system with four inputs. The initial condition is 0 � In, the identity matrix, and the algorithm
is run for 1000 steps. The linear convergence properties of the algorithm are shown by the
linear appearance of the log verses iteration plot, Figure 4.6.1. The time-step selection for this
simulation is displayed in Figure 4.6.2 and indicates both the non-linear nature of the selection
x4-6 Numerical Methods for Symmetric Pole Placement 117
IterationL
og o
f po
tent
ial
Figure 4.6.1: Iteration verses ����i� showing linear convergence properties.
scheme as well as its limiting behaviour. The existence of a limit to the time-step selection
scheme (4.6.4) as i��, ensures that the linearization of (4.6.1) around a critical point exists.
By computing this linearization the linear convergence properties displayed in Figure 4.6.1 can
be confirmed theoretically.
Simulation studies have shown the presence of many local minima in the cost potential��.
Figure 4.6.3 is a plot of both the cost �� and the norm of the gradient jjgrad��� i�jj2 for a
simulation of a seventh order symmetric state space system with four inputs. The system was
chosen such that an exact solution to the pole placement problem existed. Thus, the global
minimum of �� was known to be zero, however, Figure 4.6.3 shows the cost �� converging
to a constant while the gradient converges to zero. The algorithm (4.6.1) provides a reliable
numerical method to investigate the presence and position of such local minima.
118 Pole Placement for Symmetric Realisations Chapter 4
Iteration
Pote
ntia
l and
nor
m o
f gr
adie
ntNorm of gradient
Potential
Figure 4.6.3: Iteration verses both potential����i� and the norm of the gradient jj�F��Q�A��TF���T �jj2.
4.7 Open Questions and Further Work
An important question that has not been addressed in this chapter is that of understanding the
equilibrium conditions for the various dynamical systems in the context of classical systems
theory. It would be nice to relate conditions such as (4.4.5) to properties such as the frequency
response of the achieved system. Unfortunately, even finding a relationship between the desired
and the achieved pole positions appears to be difficult. The discussion of Problem C, multiple
systems assignment, is another area that would benefit from further work. The results presented
in this chapter are far from comprehensive.
A natural extension of the theory presented in this chapter is to consider more general
systems. For example a class of systems �A�B�C� with a given Cauchy-Maslov index (i.e.
�AIpq�T � AIpq and CT � IpqB where Ipq � diag�Ip��Iq�) could be approached using the
same techniques developed earlier. The Lie transformation group associated with the set of
such systems is
G � fT � Rn�n j TTIpqT � Ipq� det�T � �� 0g
which has identity tangent space (or Lie-algebra)
g � f� � Rn�n j ��Ipq�T � ��Ipqg�
the set of signature skew symmetric matrices.
x4-7 Open Questions and Further Work 119
Related to the general construction for systems with an arbitrary Cauchy-Maslov index is
the problem for Hamiltonian linear systems. These are systems �A�B�C�where �AJ�T � AJ
and CT � JB where
J �
�B� 0 �InIn 0
�CA �
The set of Hamiltonian linear systems is a homogeneous space with Lie transformation group
Sp�n�R� � fT � R2n�2n j TTJT � J� det�T � �� 0g�
termed the symplectic group. The Lie-algebra associated with the symplectic group is the set
of 2n� 2n Hamiltonian matrices
Ham�n�R� � f� � R2n�2n j ��J�T ��J � 0g�
Hamiltonian systems are important for modelling mechanical systems.
One may also consider pole placement problems on the set of general linear systems. A
discussion of some basic results is contained in the monograph (Helmke & Moore 1994b,
Section 5.3). One area in which these results could be extended is to consider dynamic output
feedback. Assume that one knows the degree d of a dynamic compensator applied to a given
linear state space system. The dynamics of the closed loop system can be modelled by the
differential equation
�x � Ax� Bu
y � Cx
�w � Gw � Cx
u � Fw �Ky�
where the feedback law u is allowed to depend both on the dynamic compensator state w and
the direct output y. This system can be rewritten as an augmented system with static feedback
d
dt
�B� x
w
�CA �
�B� A 0
C G
�CA�B� x
w
�CA �
�B� B
0
�CA u
120 Pole Placement for Symmetric Realisations Chapter 4
�B� y
w
�CA �
�B� C 0
0 Id
�CA�B� x
w
�CAu �
�K F
� �B� y
w
�CA �
Once the system is written in this form it is amenable to analysis via the general linear theory
presented in Helmke and Moore (1994b, Section 5.3). Of course, one could also exploit the
structure of the augmented problem itself to reduce computational cost and ensure that the roles
of system and compensator states do not become confused.
Gradient descent methods could also be used to compute canonical forms for system
realizations. For example, to compute the companion form of a given state matrix A consider
the smooth cost function
�A� �nXi�2
Xj ��i�1
A2ij �
nXi�2
�Ai�i�1� � 1�2
on the homogeneous space
S�A� � fTAT�1 j T � Rn�n� det�T � �� 0g�
Given that computating canonical forms is often an ill conditioned numerical problem, dy-
namical system techniques and related numerical gradient descent algorithms with their strong
stability properties may prove to be an important numerical tool in certain situations.
Chapter 5
Gradient Flows on Lie-Groups and
Homogeneous Spaces
The optimization problems considered in Chapters 2, 3 and 4 are all problems where the
constraint set is a homogeneous space. In each case the approach taken is to consider a suitable
Riemannian metric on the homogeneous space and compute the maximising (or minimising)
gradient flow. The limiting value of a solution of the gradient flow (for arbitrary initial condition)
then provides an estimate of the maximum (or minimum). The numerical methods discussed
in Chapters 2 to 4 are closely related to each other. They each rely on using a ‘standard’ curve
lying within the homogeneous space, which can be assigned an arbitrary initial condition and
arbitrary initial tangent vector, to interpolate the solution of the continuous-time gradient flow.
Thus, for an arbitrary point in the constraint set one estimates the solution of the gradient flow
by travelling a short distance along the ‘standard’ curve starting from the present estimate with
initial tangent vector equal to the gradient at that point. It is natural to ask whether there is an
underlying structure on which the numerical solutions proposed in Chapters 2 to 4 are based
and, if there is, to what degree can such an approach be applied to any generic optimization
problem on a homogeneous space.
With the developing interest in dynamical system solutions to linear algebraic problems
(Symes 1982, Deift et al. 1983, Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b)
during the eighties there came an interest in the potential of continuous realizations of classical
121
122 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
problems as efficient numerical methods (Chu 1988). Interestingly, it has taken several years
before the connection between dynamical systems and linear algebraic problems is examined
in the other direction, namely, can one use insights and understanding developed by studying
problems using the dynamical systems approach to design efficient numerical algorithms for
problems in linear algebraic. Recently Chu (1992) has shown that by using the insight provided
by a geometric understanding of a structured inverse eigenvalue problem a better understanding
of a quadratically convergent algorithm first proposed by Friedland et. al. (1987) is obtained.
Perhaps more directly based on the dynamical systems literature is the work by Brockett (1993)
that looks at the design of gradient algorithms on the adjoint orbits of compact Lie-groups. The
methods proposed in Chapters 2 to 4 are gradient descent algorithms constructed explicitly on
the homogeneous space (Moore et al. 1992, Mahony et al. 1993, Moore et al. 1994, Mahony et
al. 1994).
Certainly the numerical methods proposed satisfy the broad requirements of simplicity,
global convergence and constraint stability discussed on page 2. Moreover, the numerical
methods described in each chapter have strong similarities, for example the Riemannian metrics
used are all of a similar form and the ‘standard’ curves used to generate the numerical methods
are all based on matrix exponentials. To develop a general understanding of these methods,
however, it is apparent that one must develop a better understanding of the Riemannian geometry
of the homogeneous constraint sets on which the algorithms are constructed.
In this chapter I attempt to provide a rigourous but brief review of the relevant theory
associated with developing numerical methods on homogeneous spaces. The focus of the
development is on the classes of homogeneous spaces encountered in engineering applications
and the simplest theoretical constructions which provide a rigourous basis for the numerical
methods developed. A careful development is given of the relationship between gradient flows
on Lie-groups and homogeneous space (related by a group action) which motivates the choice of
a particular Riemannian structure for a homogeneous space. Convergence behaviour of gradient
flows is also considered. The curves used in constructing numerical methods in Chapters 2
to 4 were all based on matrix exponentials and the well understood theory of the exponential
map as a Lie-group homomorphism is reviewed to provide a basis for this choice. Moreover,
the geodesic structure of the spaces considered (following from Levi-Civita connection) is
developed and conditions are given on when the matrix exponential maps to a geodesic curve
x5-1 Lie-groups and Homogeneous Spaces 123
on a Lie-group. Finally, an explicit discussion of the relationship between geodesics on Lie-
groups and homogeneous spaces is given. The conclusion is that the algorithms proposed in
Chapters 2 to 4 are modified gradient descent algorithms with geodesic curves used to replace
the straight line interpolation of the classical gradient descent algorithm.
Much of the material presented is standard or at least accessible to people working in the
fields of Riemannian geometry and Lie-groups, however, this material would not be standard
knowledge for researchers in an engineering field. Moreover, the development strongly em-
phasizes the aspects of the general theory that is relevant to problems in linear systems theory.
Due to to the focus of the work, explicit proofs are given for a number of results which do
not appear to be standard in the literature. In particular, I have not seen the results concerning
the interrelation of gradient flows on Lie-groups and homogeneous spaces nor a careful pre-
sentation of the relationship between geodesics on Lie-groups and homogeneous spaces in any
existing reference.
The chapter is divided into nine sections. Section 5.1 presents a brief review of Lie-groups
and homogeneous spaces while Section 5.2 considers a certain class of homogeneous space,
orbits of semi-algebraic Lie-groups, which includes all the constraint sets considered in this
thesis. Section 5.3 describes a natural choice of Riemannian metric for a given homogeneous
space while Section 5.4 discusses the derivation of gradient flows on Lie-groups and homoge-
neous spaces and shows why the choice of Riemannian metric made in Section 5.3 is the most
natural. Section 5.5 discusses the convergence properties of gradient flows. Sections 5.6 to
5.9 develop the geometry of Lie-groups and homogeneous spaces concentrating on providing
a basis for understanding the exponential map and geodesics.
5.1 Lie-groups and Homogeneous Spaces
In this section a brief review of Lie-groups and homogeneous spaces is presented. The reader
is referred to Helgason (1978) and Warner (1983) for further technical details.
A Lie-groupG is an abstract group which is also a smooth manifold on which the operations
of group multiplication(� � �� , for �, � � G) and taking the inverse (� � ��1, for � � G) are
smooth diffeomorphisms of G onto G. For � � G one defines automorphisms of G associated
124 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
with right and left multiplications by a constant
r� : G� G� r���� :� ��� (5.1.1)
l� : G� G� l���� :� ���
Observe that r� and l� are diffeomorphisms of G with smooth inverse given by r��1 and l��1
respectively.
Let M be a manifold and G be a Lie-group. A smooth group action of G on M is a smooth
mapping
: G�M �M
which satisfies
���� q� � ��� ��� q��� q �M� �� � � G�
�e� q� � q� q �M� e is the identity of G�
The action is known as transitive if for any q and r in M there exists � � G such that
��� q� � r. Observe that ��� � : M � M is a diffeomorphism of M into M since
��� � is smooth, surjective (let � � G, then for any q � M , ��� ���1� q�� � q� and
has smooth inverse �1��� � � ���1� �. A smooth manifold M with a transitive group of
diffeomorphisms ( ��� � : M �M ) is known as a smooth homogeneous space.
Let p �M and define the stabiliser of p by
stab�p� � f� � G j ��� p� � pg�
By construction stab�p� � G is an abstract subgroup of G. By inspection the map
p : G�M
p��� :� ��� p� (5.1.2)
is a smooth map which is onto if and only if is transitive. As a consequence, if is a
smooth transitive group action of G onto M one has that dimG dimM . The stabilizer,
stab�p� � �1p �p�, is the inverse image of a single point under a continuous map and is a
x5-2 Semi-Algebraic Lie-Groups, Actions and their Orbits 125
closed set in the manifold topology on G. Consequently, stab�p� is a closed abstract subgroup
of G and is a Lie-subgroup of G with the relative topology inherited from G (Warner 1983,
pg. 110). The left coset space G�stab�p� � f�stab�p� j � � Gg has a natural topology
such that the surjective mapping � : G � G�stab�p�, � � stab�p��, is a continuous, open
mapping. Similarly, equipping G�stab�p� with the unique differential structure that makes �
smooth (Warner 1983, pg. 120), it is easily verified that � is a submersion.
The right coset space is itself a homogeneous space under the group action
� : G� G�stab�p�� G�stab�p�
���� �stab�p�� :� ��stab�p��
Consider the smooth map
�p : G�stab�p��M (5.1.3)
�p ��stab�p�� :� ��� p��
It is a standard result that �p : G�stab�p� � M is a diffeomorphism (Helgason 1978,
Proposition 4.3, pg. 124). By construction, the following diagram commutes
�p p
�
�
HHHHHHHHHHj
M
G�stab�p��G
In particular, p � �p � � is the composition of a submersion and a diffeomorphism and is
itself a submersion.
5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits
A setG � Rs is known as semi-algebraic when it can be obtained by finitely many applications
of the operations of intersection, union and set difference starting from sets of the form
126 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
fx � Rs j f�x� � 0g with f a polynomial function on Rs. A semi-algebraic Lie-group is a
Lie-group which is also a semi-algebraic subset ofRs. The following two sets are examples of
semi-algebraic Lie-groups.
Example 5.2.1 a) The general linear group
GL�N�R� � fT � RN�N j det�T � �� 0g
b) The orthogonal group
O�n� � O�n�R� � fT � RN�N j TT � � INg�
where IN is the N �N identity matrix.
�
Let G be a Lie-group and be a smooth group action : G�Rr � Rr. Fix p � Rr and
define the orbit of the action to be the set
O�p� � f ��� p� j � � Gg�
The set O�p� is an immersed1 submanifold of Rr in the sense that it is a subset of Rr with a
differential structure given by that induced by the diffeomorphism �p : G�stab�p� � O�p�,(cf. (5.1.3)). The map is a smooth transitive group action of G acting on O�p� and thus,
O�p� is given the structure of a homogeneous space. It is certainly not clear that the differential
structure induced by the immersion is compatible with the Euclidean differential structure on
Rr. In the case where the two differential structures are compatible then O�p� is an embedded
submanifold of Rr.
Let G be a subset of Rs. A map f : G � Rr is semi-algebraic when the graph of f ,
f�x� f�x�� j x � Gg � Rs � Rr is semi-algebraic. In particular, if G is a semi-algebraic
1An immersion is a one-to-one map f : M � N between two manifolds M and N which for which thedifferential df is full rank at all points. An immersed submanifold is a subset U � N such that U � f�M� is theimage of some manifoldM via an immersion f . The setU � N inherits the differential structure onM via the mapf , however, this need not correspond to the differential structure associated with the manifold N . An embeddingis an immersion f : M � N such that the image U � f�M� is a manifold with subspace differential structureinherited from N (Warner 1983, pg. 22).
x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127
subset of Rs and f : Rs � Rr is a rational map (i.e. the i’th component of f is a ratio of two
polynomial maps) then the map f : G� Rr is semi-algebraic (Gibson 1979, pg. 223).
The following result shows that for semi-algebraic Lie-groups and semi-algebraic group
actions the orbit of a point p � Rr is always an embedded submanifold of Rr. The result is
standard where G is a compact Lie-group (Varadarajan 1984, pg. 81). For G semi-algebraic
the reader is referred to Gibson (1979, pg. 224).
Proposition 5.2.2 LetG be a Lie-group and : G�RN � Rr be a smooth group action ofG
onRr. Let p � Rr be an arbitrarypoint and denote the orbit of p byO�p� � f ��� p� j � � Gg.Then, O�p� is an embedded submanifold ofRr with the embedding
�p : G�stab�p�� O�p�
given by (5.1.3), if either:
a) G is a compact Lie-group.
b) G is a semi-algebraic Lie-group and : G � RN � RN is a semi-algebraic group
action.
5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces
Let G be a Lie-group, M be a smooth manifold and : G�M � M be a smooth transitive
group action of G on M . Denote the tangent space of G at the identity e by TeG. Let
ge : TeG � TeG be an inner product on TeG, i.e. a positive definite, bilinear map. The inner
product chosen in the sequel is always a Euclidean inner product computed by choosing an
arbitrary fixed basis fE1� � � � � Eng for TeG, expressing given tangent vectors X �Pn
i�1 xiEi
and Y �Pn
i�1 yiEi in terms of this basis and setting
ge�X� Y � �nXi�1
xiyi�
Of course, this construction depends on the basis vectors used. One could also consider other
inner products, for example when G is semi-simple the negative of the killing form (Helgason
128 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
1978, pg. 131) is a positive definite inner product. A number of authors have used the killing
form in related work (Faybusovich 1989, Bloch et al. 1992, Brockett 1993), however, the choice
of a particular inner product is immaterial to the following development.
Let ge be an inner product on TeG. Let r� of (5.1.1), be right translation by � � G. Since
r� is a diffeomorphism, the differential2 at the identity e of G, Ter� : TeG� T�G, is a vector
space isomorphism. Using Ter� one can define an inner product on each tangent space of G
g� : T�G� T�G� R
g���� �� :� ge�Ter
�1� ���� Ter
�1� ���
��
where � and � are elements of T�G. It is easily verified that g� varies smoothly on G and
consequently defines a Riemannian metric,
g��� �� :� g���� ��� �5�3�1�
for � � G, �, � in T�G. This Riemannian metric is termed the right invariant group metric for
G. Observe that for any two smooth vector fields X and Y on G one has
g�dr�X� dr�Y � � g�X� Y �� �5�3�2�
Let p � M be arbitrary and recall that p of (5.1.2) is a submersion of G onto M (since
the group action is transitive). Thus, the differential of p at the identity Te p : TeG� TpM
is a linear surjection of vector spaces. Decompose TeG into the topological product
TeG � kerTe p � dom Te p�
2Let � : M � N be a smooth map between smooth manifolds M and N . Let p �M be an arbitrary point thenthe differential of � at p (or the tangent map of � at p) is the linear map
Tp� : TpM � T��p�N�
Tp��X� :� D�jp �X�
where D�jp �X� is the Frechet derivative (Helmke & Moore 1994b, pg. 334) of � in direction X � TpM . Thefull differential of � is a map from the tangent bundle of M , TM � �p�MTpM to the tangent bundle of N
d� : TM � TN�
d��Xp� :� Tp��Xp��
where Xp is an element of the tangent space TpM for arbitrary p �M .
x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 129
where kerTe p is the kernel of Te p and
dom Te p � fX � TeG j ge�X� Y � � 0� Y � kerTe pg �5�3�3�
is the domain (the subspace orthogonal to kerTe p using the inner product provided on TeG).
By construction, Te p restricts to a vector space isomorphism T�e p,
T�e p : dom Te p � TpM
T�e p�X� :� Te p�X��
Thus, one may define an inner product on the tangent space TpM by
gMp �X� Y � � ge�T�e
�1p �X�� T�e
�1p �Y �
��
where T�e �1p �X� � TeG via the natural inclusion dom Te p � TeG. It is easily verified that
this construction defines a smooth inner product on the tangent bundle TM . Thus, one defines
a Riemannian metric,
gM�X� Y � :� gMq ��� ��� �5�3�4�
for q �M and X , Y in TqM . This is termed the normal metric on M .
Let q � M be arbitrary then the normal Riemannian metric on M and the right invariant
group metric on G are related by the differential of p : G� M . To see this observe that for
any p �M there exists � � G such that ��� p� � q. Thus,
q��� � ��� ��� p�� � ���� p�
� p � r�����
Differentiating at the identity gives the following commutating diagram of vector space homo-
morphisms.
130 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
d pd q
dr�
TqM
T�GTeG�
HHHHHHHHHHj �
In particular, the normal Riemannian metric can also be defined by
gM�X� Y � :� g�T�� �1p ���� T��
�1p ����� �5�3�5�
where g�� � is the right invariant group metric on G, X and Y in TqM and T�� p is the
restriction of T� p to
dom T� p � fY � T�G j g�Y�X� � 0� for all X � kerT� pg� �5�3�6�
the domain of T� p. Observe that dom T� p � dr��dom Te q�.
5.4 Gradient Flows
LetM be a Riemannian manifold (with Riemannian metric gM ) and let : M � Rbe a smooth
potential function. The gradient of on M is defined pointwise on M by the relationships
Djp ��� � gM�grad�p�� �
� � � TpM (5.4.1)
grad�p� � TpM� (5.4.2)
where Djp ��� is the Frechet derivative of in direction � at the point p � M (Helmke &
Moore 1994b, pg. 334). Existence follows from the positive definiteness and bilinearity of the
inner product along with linearity of the Frechet derivative.
Observe that grad is a smooth vector field on M which vanishes at local maxima and
minima of . Consider the ordinary differential equation on M , termed the gradient flow of ,
�p � grad�p�
x5-4 Gradient Flows 131
whose solutions are integral curves3 of grad. Let p0 � M be some initial condition then
solutions of the gradient flow with initial condition p0 exists and are unique (apply classical
O.D.E. theory to the local co-ordinate representation of the differential equation).
Let G be a Lie-group and : G�M � M be a smooth transitive group action of G on
M . Fix p �M and consider the ‘lifted’ potential � : G� R,
���� :� � p���� �5�4�3�
where p is given by (5.1.2). Let ge�� � be an inner product on TeG and define the right
invariant group metric g on G and the normal metric gM onM as described in Section 5.3. The
smooth potential : M � Rand the ‘lifted’ potential � give rise to the two gradient flows
�q � grad�q�� q �M� (5.4.4)
�� � grad���� � � G� (5.4.5)
defined with respect to the normal metric and the right invariant group metric respectively.
Lemma 5.4.1 Let p � M be some fixed element of M . Let q0 � p��0� (where p is given
by (5.1.2)) be an arbitrary initial condition in M . Let q�t� denote the solution of (5.4.4) with
initial condition q0 and let ��t� denote a solution of (5.4.5) with initial condition �0 then
q�t� � p���t���
Proof By construction q0 � p��0�. Consider the time derivative of p���t��
d
dt p���t�� � d p
�d
dt��t�
�� T� p � grad����t���
3Let X be a smooth vector field on a manifold M . An integral curve of X is a smooth map
� : R�M�
���t� � X���t��
where ���� :� d��
ddt
���
and d
dt
���
denotes the tangent vector toRat the point �R.
132 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Thus, it is sufficient to show that
T� p � grad���� � grad� p�����
and use the uniqueness of solutions to (5.4.4) and (5.4.5) to complete the proof.
Let grad� � grad�0 � grad�� be the unique decomposition of grad� into grad�0 �kerT� p and grad�� � dom T� p (cf. (5.3.6)). Observe that
g�grad�0���� grad�0���� � g�grad����� grad�0����
� D�j� �grad�0����
� Dj�p��� �T� p � grad�0���� � 0�
since T� p � grad�0��� � 0. Since the metric g is positive definite then grad�0 � 0 and it
follows that grad� � dom �T� p�.
Let q � M be arbitrary and choose � � G such that p��� � q. Let X � TqM be an
arbitrary tangent vector and observe
gM�T� p � grad����� X� � g�T��
�1p � T� p � grad����� T��
�1p �X�
�
using (5.3.5). Of course T�� �1p � T� p�grad����� � grad���� since grad� � dom �T� p�.
It follows that
gM�T� p � grad����� X
� g
�grad����� T��
�1p �X�
�� D�j� �T�� �1
p �X��
� Djq �T� p � T�� �1p �X��
� Djq �X� � gM�grad�q�� X
�
Since X is arbitrary and gM is positive definite then T� p � grad���� � grad� p���� and
the proof is completed.
x5-5 Convergence of Gradient Flows 133
5.5 Convergence of Gradient Flows
Let M be a Riemannian manifold and let : M � R be smooth function. Let grad denote
the gradient vector field with respect to the Riemannian metric on M . The critical points of
: M � Rcoincide with the equilibria of the gradient flow on M .
�q�t� � �grad�q�t��� �5�5�1�
For any solution x�t� of the gradient flow
d
dt�q�t�� � g�grad�q�t��� �q�t��
� �g�grad�q�t��� grad�q�t��� � 0
and therefore �q�t�� is a monotonically decreasing function of t. The following proposition
is discussed in Helmke and Moore (Helmke & Moore 1994b, pg. 360).
Proposition 5.5.1 Let : M � R be a smooth function on a Riemannian manifold with
compact sublevel sets4. Then every solution q�t� �M of the gradient flow (5.5.1) on M exists
for all t 0. Furthermore, x�t� converges to a connected component of the set of critical
points of as t� ��.
Note that the condition of the proposition is automatically satisfied if M is compact.
Solutions of a gradient flow (5.5.1) display no periodic solutions or strange attractors and there
is no chaotic behaviour. If has isolated critical points in any level set fq � M j �q� � cg,
c � R, then every solution of the gradient flow (5.5.1) converges to one of these critical points
as t � ��. This is also the case where the critical level sets are smooth submanifolds ofM .
In general, however, it is possible that the solution of a gradient flow converges to a connected
level set of critical points of the function . Such ‘non-generic’ behaviour is undesirable when
gradient flows are being used as a numerical tool. For problems of the type considered in this
thesis the following lemma is generally applicable.
Lemma 5.5.2 Let : M � Rbe a smooth function compact sublevel sets, such that
4Let c � R then the sublevel set of � associated with the value c is fq � M j ��q� � cg. If � has compactsublevel sets then each such set (possibly empty) is a compact subset of M .
134 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
(i) The set of critical points of is the union of closed, disjoint, critical level sets, each of
which is a submanifold of M .
(ii) The Hessian5 H at a critical point degenerates exactly on the tangent space of the
critical level sets of . Thus, for q � M a critical point of and � TqM then
Hjq �� � � 0 if and only if is in the tangent space of the critical level set of .
Then every solution of the gradient flow
�q � �grad�q�
converges exponentially fast to a critical point of .
Proof Denote the separate connected components of the critical level sets of by Ni for
i � 1� 2� � � � � K where K is the number of disjoint critical level sets. Thus, the limit set of a
solution to the gradient flow �x � grad is fully contained in someNj for j � �1� 2� � � � � K�. Let
a � Nj be an element of this limit set. Condition ii) ensures that each Nj is a non-degenerate
critical set. It may be assumed without loss of generality that the value of constrained to Nj
is zero. The generalised Morse lemma (Hirsch 1976, pg. 149) gives a open neighbourhoodUa
of a in M and a diffeomorphism f : Ua � Rn, n � dimM , nj � dimNj , such that
�i� f�Ua �Nj� � Rnj � f0g�ii� � f�1�x1� x2� x3� �
12�jjx2jj2 � jjx3jj2��
with x1 � Rnj , x2 � Rn� , x3 � Rn� and nj � n� � n� � n. Let W � f�Ua� � Rn then the
gradient flow of � f�1 on W is
�x1 � 0� �x2 � �x2� �x3 � x3� �5�5�2�
5Let M be a smooth manifold. The Hessian of a smooth function � : M � R is a symmetric bilinear mapH�jq : TqM TqM � Rgiven by
H�jq��� � � ��i
�2 ��
�xi�xj�j�
where x � fx1� � � � � xng is a local coordinate chart on M and ��, � are the local coordinate representations of � and � TqM while �� is a local coordinate representation of �.
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 135
W +
W-
W-
x2
x3
W +
Figure 5.5.1: Flow around a saddle point
Let W� :� f�x1� x2� x3� j jjx2jj jjx3jjg and W� :� f�x1� x2� x3� j jjx2jj � jjx3jjg. Using
the convergence properties of (5.5.2) it follows that every solution of original gradient flow
starting in f�1�W��f�x1� x2� x3� j x3 � 0g�will enter the region f�1�W�� for which � 0
(cf. Figure 5.5.1). On the other hand, every solution starting in ff�1�x1� x2� x3� j x3 � 0g will
converge to the point f�1�x1� 0� 0� � Nj � x1 � Rnj . As is strictly negative on f�1�W��,
all solutions starting in f�1 �W� SW� � ff�1�x1� x2� x3� j x3 � 0g will eventually and
converge to some Ni �� Nj. By repeating this analysis for each Ni and recalling that any
solution must converge to a connected subset of some Ni (Proposition 5.5.1) the proof is
completed.
5.6 Lie-Algebras, The Exponential Map and the General Linear
Group
Let G be a Lie-group. The Lie-algebra of G, denoted g, is the set of all left invariant smooth
vector fields on G, i.e. smooth vector fields X��� � T�G such that
X � l���� � dl�X���
where l���� :� �� is left multiplication by �. In particular,
X��� � dl�X�e�� �5�6�1�
136 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Let X��� and Y ��� be two smooth vector fields on G and think of them as derivations6 of
C��G� (the set of smooth functions which map G� R). The Lie-bracket of X��� and Y ���
is defined with respect to the action of X and Y as derivations
�X���� Y ����f � X���Y ���f � Y ���X���f�
where f � C��G�. By checking the linearity of this map it follows that the Lie-bracket of
two vector fields is itself a derivation and corresponds to a vector field, denoted �X� Y ����.
The set of smooth vector fields on G, denoted D�G�, is a vector space over Runder pointwise
addition of vector fields and scalar multiplication. Considering the Lie-bracket operation as a
multiplication rule, D�G� is given the structure of an algebra. Assume that X��� � X and
Y ��� � Y are left invariant vector fields on G, then �X� Y � � �X� Y ���� is also a left invariant
vector field on G, since
�dl��X� Y ��f � �X� Y ��f � l�� (5.6.3)
� X D�f � l��j� �Y �� Y D�f � l��j� �X�
� X Df jl���� �dl�Y �� Y Df jl���� �dl�X�
� X Df jl���� �Y � l��� Y Df jl���� �X � l�� � ��X� Y � � l��f�
Thus g forms a subalgebra of the algebra of derivations. Note that there is a one-to-one
correspondence between g and TeG the tangent space of G at the identity given by (5.6.1).
Thus, g is a finite dimensional algebra of the same dimension as the Lie-group G. Indeed,
an alternative way of thinking about g is as the tangent space TeG equipped with the bracket
operation
�X�e�� Y �e�� � �dl��X�e��� dl��Y �e����e��
6Let C��G� be the set of all smooth maps from G intoR. The setC��G� acquires a vector space structure underscalar multiplication and pointwise addition of functions. A derivation of C��G� is a linear mapC��G�� C��G�.The set of all derivations of C��G�, denoted D�G�, itself forms a vector space under scalar multiplication andpointwise addition of functions. A smooth vector field is a smooth mapX : G� TGwhich assigns a vectorX��� �T�G to each element � � G. Any smooth vector field X defines a derivation X�f� � Xf�� :� Df j� �X���,the Frechet derivative of f in direction X at the point . Indeed, this correspondence is an isomorphism
D�G� � fthe set of smooth vector fields on Gg� �5�6�2�
betweenD�G� and the vector space of smooth vector fields on G (Varadarajan 1984, pg. 5).
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 137
Example 5.6.1 Let N be a positive integer and consider the set of all real non-singularN �N
matrices
GL�N�R� � f� � RN�N j det��� �� 0g�
where det��� is the determinant of �. The set GL�N�R� is known as the general linear
group and is a Lie-group under the group operation of matrix multiplication. Since GL�N�R�
is an open subset of RN�N it inherits the relative Euclidean topology and differential struc-
ture. The tangent space at the identity IN (the N � N identity matrix) of GL�N�R� is
TINGL�N�R� � RN�N the set of all real N �N matrices. Consequently, the dimension of
the Lie-groupGL�N�R� is n � N 2. The tangent space ofGL�N�R� at a point � � GL�N�R�
can be represented by the image of TINGL�N�R� � RN�N via the linearization of the
diffeomorphism generated by left multiplication l�,
T�GL�N�R� � TIN l��TINGL�N�R��
� f�A j A � RN�Ng�
The Lie-algebra of GL�N�R�, denoted gl�N�R�, is the set of all left invariant vector fields of
GL�N�R�. From (5.6.1) it follows that
gl�N�R� � fX��� � �A j � � G� A � RN�Ng�
Let f � C��GL�N�R�� be any smooth real function then the Lie-bracket of two elements �A,
�B � gl�N�R� acting on f is
��A� �B�f � ��A���B�f � ��B���A�f
� ��A�Df j� ��B�� ��B� Df j� ��B��� D
�Df j� ��B�
j� ��A�� D�Df j� ��A�
j� ��B�Now since GL�N�R� inherits the Euclidean differential structure from RN�N the Frechet
derivative Df j� ��X�, X � RN�N , can be written
Df j� ��X� �NX
i�j�1
df
d�ij��X�ij
138 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
where Xij is the �i� j�’th entry of X � RN�N . Writing ��B�ij �PN
s�1 �isBsj and applying
the product rule of differentiation gives
D�Df j� ��B�
j� ��A� �NX
i�j�1
NXp�k�1
d2f
d�ijd�pk��B�ij��A�pk
�NX
i�j�1
df
d�ij
NXp�k�1
d
d�pk
�NXs�1
�isBsj
���A�pk
�NX
i�j�1
NXp�k�1
d2f
d�ijd�pk��B�ij��A�pk �
NXi�j�1
df
d�ij
NXk�1
Bkj��A�ik
since dfd�pk
�PNs�1 �isBsj
�� 0 unless p � i and s � k. It follows that
��A� �B�f �NX
i�j�1
df
d�ij
�NXk�1
��A�ikBkj �NXk�1
��B�ikAkj
�
�NX
i�j�1
df
d�ij���AB � BA��ij � ���AB �BA��f�
where ��AB � BA� is a smooth left invariant vector field on GL�N�R�. For any two
matrices A�B � RN�N define the matrix Lie-bracket by �A�B� � AB � BA. The bracket
operation on the Lie-algebra can now be written in terms of the matrix Lie-bracket operation
on TeGL�N�R� � RN�N
��A� �B� � ��A�B�� �5�6�4�
Indeed, it is usual to think of gl�N�R� as the set
gl�N�R� � fA j A � RN�Ng �5�6�5�
with the matrix Lie-bracket operation �A�B� � AB �BA. �
Let G and H be two Lie-groups and let g and h be their associated Lie-algebras. A
map : G � H is called a Lie-group homomorphism (or just homomorphism) from G to
H if is smooth and is a group homomorphism (i.e. �g1g�12 � � �g1��g2��1). A map
� : g � h is called a Lie-algebra homomorphism (or just homomorphism) from g to h if �
is linear and preserves the bracket operation ���X� Y �� � ���X�� ��Y ��. The tangent map
Te : TeG� TeH induces a map � : g� h, ��X� � dl�Te�X�e��, which is a Lie-algebra
homomorphism (Warner 1983, pg. 90). Abusing notation slightly it is standard to identify g
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 139
with TeG, h with TeH (cf. (5.6.1)) and write Te : g � h as the Lie-algebra homomorphism
induced by a Lie-group homomorphism : G � H . The following result is fundamental in
the theory of Lie-groups. A typical proof is given in Warner (1983, Theorem 3.27).
Proposition 5.6.2 Let G and H be Lie-groups with Lie-algebras g and h respectively and
assume thatG is simply connected. Let � : g� h be a Lie-algebra homomorphism, then there
exists a unique Lie-group homomorphism : G� H such that Te � �.
Let G be any Lie-group and denote its Lie-algebra by g. Denote the identity component
of G by Ge, the set of all points in G path connected to the identity e. Observe that R is a
Lie-group under addition. The Lie-algebra of R is a one-dimensional vector space r � � ddr ,
� � R, where ddr denotes the derivative inR. Let X � g be arbitrary and consider the map
� : r� g
���d
dr� :� �X�
It is easily seen that� is a Lie-algebra homomorphism and using Proposition 5.6.2 there exists
a unique Lie-group homomorphism
expX : R� Ge �5�6�6�
such that Te � �. Since expX is a Lie-group homomorphism then expX�t1 � t2� �
expX�t1� expX�t2� and the set
�tRexpX�t� � G
is known as a one-parameter subgroup of G.
One may define the full exponential by
exp : g� G (5.6.7)
exp�X� :� expX�1��
The exponential map is a local diffeomorphism from an open neighbourhoodN0 of 0 � g into
an open neighbourhood Me of e � G (Helgason 1978, Proposition 1.6).
140 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Lemma 5.6.3 Let exp : gl�N�R�� GL�N�R� denote the exponential map (5.6.7) and let
eX � IN �X �X2
2!�X3
3!
be the standard matrix exponential. Let X � gl�N�R� � RN�N , then
exp�X� � eX �
Proof Recall that the matrix exponential eX is well defined for all X � RN�N (Horn &
Johnson 1985, pg. 300). Let X � gl�N�R� (thought of as the set of N �N matrices equipped
with the matrix Lie-bracket) then define X : R� GL�N�R� by
X�t� � etX �
Observe that X is well defined smooth map since the matrix exponential is itself smooth and
always non-singular (det�eX� � etr�X� �� 0). Indeed, X is a group homomorphism (R is a
Lie-group under addition) since X�t1 � t2� � X�t1�X�t2� and X��t� � X�t��1. The
tangent space of Rat 0 is the set f� ddr j � � Rg where d
dr denotes normal derivation. Observe
that
TeX��d
dr� � DX j0 ��
d
dr�
� �d
dtetX
����t�0
� �X�
But this is exactly the Lie-algebra homomorphism� that induces the Lie-group homomorphism
expX (5.6.6). Since expX is the unique Lie-group homomorphism that has the property
Te expX��ddr� � �X then it follows that X�t� � expX�t�. The full result follows from the
definition of exp (5.6.7).
x5-7 Affine Connections and Covariant Differentiation 141
5.7 Affine Connections and Covariant Differentiation
Let G be a smooth manifold. An affine connection is a rule r which assigns to each smooth
vector field X � D�G� a linear mapping rX : D�G�� D�G�, rX�Y � :� rXY , satisfying
rfX�gY � frX � grY (5.7.1)
rX�fY � � frXY � �Xf�Y� (5.7.2)
where f , g � C��G� and X , Y � D�G�.
An affine connection naturally defines parallel transport on a manifold. Let X and Y �D�G� be smooth vector fields and let ��t� be a smooth integral curve of X on some time
interval �0� ��, � � 0, then the family of tangent vectors t � Y ���t�� is said to be transported
parallel to ��t� if
�rXY ����t�� � 0� �5�7�3�
Expressing (5.7.3) in local coordinates one can show that the relationship depends only on the
values of the vector fields X and Y along the curve ��t� (Helgason 1978, pg. 29). Thus,
given a curve ��t� on �0� �� and a smooth assignment �0� �� � Y �t� � T�t�G then Y �t� is
transported parallel to � if and only if any smooth extensions X , Y � D�G�, X���t�� � ���t�,
Y ���t�� � Y �t� satisfy (5.7.3).
A geodesic is a curve ��t� for which the family of tangent vectors ���t� is transported
parallel to the curve ��t�. It is usual to write the parallel transport equation for a geodesic
� : R� G as
r �� � 0� �5�7�4�
where by this one means that any smooth extensionX � D�G� of �� satisfiesrXX���t�� � 0.
Given a point � � G and a tangent vector X � T�G there exists a maximal open interval I� Rcontaining zero and a unique geodesic �X : I� G with ��0� � � and ���0� � X (Helgason
1978, pg. 30).
Given a fixed curve between two points �, � � G, (� : �0� ��� G, ��0� � �, ���� � � )
there exists a set of n linearly independent smooth assignments t � Yi�t� � T�t�G, i �
1� � � � � n, t � �0� �� (where each Yi�t� is transported parallel to �) and which span the set of all
142 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
smooth assignments t � Y �t� (Y �t� transported parallel to �) (Helgason 1978, pg. 30). These
solutions correspond to choosing n linearly independent vectors in T�G as initial conditions
and solving (5.7.3) for Yi�t�. The construction induces an isomorphism
P�0��� : T�G� T�G� P�0����Z� �nXi�1
ziYi���� �5�7�5�
where Z �Pn
i�1 ziYi�0� � T�G. Of course, this isomorphism will normally depend on the
on the curve �.
Parallel transport of a smooth covector field w : G� T�G, is defined in terms of its action
on an arbitrary vector field X � D�G�
�P�0���w��X� � w�P���0�X��
where P���0� is parallel transport from ���� backwards to ��0� along the curve ��t�. Parallel
transport of an arbitrary tensor field T : G � T�G � � T �G � TG � � TG of type
�r� s� is given by its action on arbitrary covector and vector fields
144 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Since X , Y and Z are arbitrary, this equation uniquely determines the Levi-Civita connection
in terms of the metric g.
5.8 Right Invariant Affine Connections on Lie-Groups
Let G be a smooth manifold and let : G� G be a smooth map from G into itself. An affine
connection r on G is invariant under if
drXY � rd�XdY�
IfG is a Lie-group thenr is termed right invariant ifr is invariant under each map r���� :� ��,
� � G.
Lemma 5.8.1 Let G be a Lie-group. There is a one-to-one correspondence between right
invariant affine connections on G and bilinear maps
� : TeG� TeG� TeG�
given by
��Y� Z� � �rdr�Y dr�Z��e�� �5�8�1�
for Y , Z � TeG.
Proof If r is an affine connection then (5.8.1) certainly defines a bilinear map from TeG �TeG� TeG.
Conversely, given a bilinear map � : TeG� TeG � TeG, let fE1� � � � � Eng be a linearly
independent basis for TeG. Define the n smooth right invariant vector fields �Ei � dr�Ei,
i � 1� � � � � n. Thus, for arbitrary vector fields Y , Z � D�G� there exist functions yi � C��G�,for i � 1� � � � � n and zj � C��G�, for j � 1� � � � � n such that Y �
Pni�1 yi
�Ei and Z �Pnj�1 zj
�Ej . One defines rY : D�G�� D�G�
rY Z �nXi�1
yi
nXj�1
zjdr���Ei� Ej� � � �Eizj� �E
j� �5�8�2�
x5-8 Right Invariant Affine Connections on Lie-Groups 145
To see that r is well defined observe that both � �Eizj� �Ej and � are bilinear in �Ei and �Ej
and thus the definition is independent of the choice of fE1� � � � � Eng. To see that r is an
affine connection one observes that linearity in �Ei ensures that (5.7.1) holds; while for any
f � C��G�
rY �fZ� �
�� nXi�1
yi
nXj�1
fzjdr���Ei� Ej� � f� �Eizj� �E
j
�A �nXi�1
yi
nXj�1
zj� �Eif� �Ej
� frY Z � �Y f�Z�
and (5.7.2) also holds.
Consider two arbitrary vector fields Y and Z and observe that
rdr�Y dr�Z �nXi�1
yi
nXj�1
zjdr���Ei� Ej� � ��dr� �E
i�zj�dr� �Ej�
� dr�
�� nXi�1
yi
nXj�1
zj��Ei� Ej� � � �Eizj� �E
j
�A �
� dr�rY Z
since for any � � G
�dr� �Ei�zj��� � Dzj j� �dr� �Ei� (5.8.3)
� D�zj � r��j���1 � �Ei�
� Dzj j� � �Ei� � �Eizj����
Thus, r is a right invariant affine connection. Moreover, for any two right invariant vector
fields Y and Z
rY Z�e� � rdr�Yedr�Ze�e� � dre��Ye� Ze� � ��Ye� Ze��
and thusr satisfies (5.8.1). This completes the proof.
The following result provides an important relationship between the exponential map on
G (5.6.6), and geodesics with respect to right invariant affine connections. A proof for left
invariant connections is given in Helgason (1978, pg. 102).
146 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Proposition 5.8.2 Let r be a right invariant affine connection and let � be given by (5.8.1)
then for any X � TeG
��X�X� � 0�
if and only if the geodesic �X : R� Gwith ���0� � X is an analytic Lie-group homomorphism
ofR into G.
In particular, if �X is a group homomorphism then then �X must be the unique group
homomorphism with d�X�1� � X (cf. Proposition 5.6.2). Thus, if ��X�X� � 0 then the
geodesic �X is just
�X � expX � �5�8�4�
the exponential map (5.6.6).
LetG be a Lie-group with an inner product ge : TeG�TeG� Ron the tangent space at the
identity. Let g be the right invariant group metric (cf. (5.3.1)), then the Levi-Civita connection
defined by g is also right invariant. To see this one computes rdr�Zdr�Y for arbitrary vector
fields X , Y , Z � D�G�. Using (5.7.9) it follows that
2g�dr�X�rdr�Zdr�Y � � dr�Zg�X� Y � � g�dr�Z� dr��X� Y �� � dr�Y g�X�Z�
since g is right invariant (cf. (5.3.2)) and d�X� Y � � �dX� dY � (Helgason 1978, pg. 24).
Recalling (5.8.3) one obtains
2g�dr�X�rdr�Zdr�Y � � Zg�X� Y � � g�Z� �X�Y �� � Y g�X�Z� � g�Y� �X�Z��
�Xg�Y� Z�� g�X� �Y�Z��
� 2g�X�rZY ��
But g is right invariant, and thus 2g�dr�X�rdr�Zdr�Y � � 2g�dr�X� dr�rZY � which shows
that dr�rZY � rdr�Zdr�Y .
Example 5.8.3 Consider the general linear group GL�N�R� (cf. example Section 5.6). The
tangent space of GL�N�R� at the identity is TINGL�N�R� � RN�N since GL�N�R� is an
x5-8 Right Invariant Affine Connections on Lie-Groups 147
open subset ofRN�N . Consider the Euclidean inner product on TINGL�N�R�
hX� Y i � tr�XTY ��
The tangent space ofGL�N�R� at a point � � G is represented as T�GL�N�R� � fX� j X �RN�Ng the image ofTINGL�N�R� viadr�. The right invariant metric forGL�N�R�generated
by h� i is just
g : T�GL�N�R�� T�GL�N�R�� R�
g�Y� Z� � tr����1�TY TX��1��
The Levi-Civita connection r associated with g can be explicitly computed on the set of
right invariant vector fields on GL�N�R�. Let X , Y , Z � RN�N then X�, Y � and Z� are
the unique right invariant vector fields associated with X , Y and Z. Using (5.7.9) one has
2g�X��rZ�Y �� � Z�g�X� Y � � g�Z�� �X�� Y��� � Y �g�X�Z� � g�Y� �X��Z���
�X�g�Y� Z�� g�X�� �Y�� Z����
Now �Y �� Z�� is certainly right invariant (since d�X� Y � � �dX� dY � (Helgason 1978, pg.
24)). In particular, observe that Z�g�X� Y � � 0 � Y �g�X�Z� � X�g�Y� Z� since in each
case the metric computation is independent of �. Parallelling the argument leading to (5.6.4)
given in the example Section 5.6 for right invariant vector fields one obtains
The map x2 is just the canonical coordinates of the second kind for the embedded submanifold
r��H� � H�. The relationship of these maps is shown in the commutative diagram, Figure
5.9.
Observe that the range of dx1 is exactly dr��spfE1� � � � � Emg� since the map xi � r� �exp�xiEi�, which is exactly x�0� � � � � 0� xi� 0� � � � � 0� has differential
dx��
�xi� � dr�Ei � �Ei� �5�9�1�
where �Ei is the unique right invariant vector field on G associated withEi � TeG. In addition,
one has d p �Ei � 0 for i � m � 1� � � � � n since H is a coset of the stabilizer stab�p�. Recall
the definition of dom d p (cf. (5.3.6)). It follows directly that dom d p � spf �E1� � � � � �Emg.
Consider the map
y : Rm �M�
y�y1� � � � � ym� :� p � x � i1�y1� � � � � ym��
Observe from the above discussion that the differential dy � d p � dx1 is bijection. Thus,
the map y forms local coordinates for the manifold M centred at p���. This completes the
construction of the local coordinate charts shown in the commutative diagram, Figure 5.9.
������������
PPPP
PPPP
PPPPi
R � M
�R G
��������
�
Rm
Rn
�
�
�
��
p
y
i1
i2
x�
� R���
�
x1
x2
Rn�m
Figure 5.9.1: A commutative diagram showing the various coordinate charts and smooth
curves on G and M constructed in the proof of Lemma 5.9.2.
152 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Consider the local coordinate representation of the Riemannian metric g in the coordinates
x. Canonically associating tangent vectors of Rn at a point x, Z � TxRn with the full space
Rn
Z �nXi�1
zi�
�xi � �z1� � � � � zn��
then the local coordinates representation of the metric g, denoted g� can be written in matrix
form
g��Y� Z� � Y TG�x�Z�
where G�x� � Rn�n is a positive definite, symmetric matrix. Now consider arbitrary vector
fields Y � �y1� � � � � ym� 0� � � � � 0� and Z � �0� � � � � zm�1� � � � � zn� then
g��Y� Z� � g�dxY� dxZ�
�mXi�1
nXj�m�1
yizjg�Ei� Ej� � 0�
Thus, the matrix G�x� is block diagonal of form
G�x� �
�B� G11�x� 0
0 G22�x�
�CA �
Moreover, since the maps shown in Figure 5.9 are commutative and the metric gM on M is
induced by the action of g on dom d p � spf �E1� � � � � �Emg it is easily shown that the local
coordinate representation of gM onRm is
g�M�Y� Z� � Y TG11�i1�y��Z � �di1Y �TG�i1�y���di1Z��
I proceed now to prove the main result. Let � : R� G be a geodesic and define
� : R� M�
��t� :� p � ��t��
Let � � R be a parameter and consider any one parameter smooth variation �� of the curve �
on M . Assume that �0 � � and ���t� is a smooth map from R� M . Both � and �� have
local coordinate representations on Rm in the coordinates described above. Denote the local
x5-9 Geodesics 153
coordinate representations by �� :� y�1 � � and ��� :� y�1 � ��. Let �� : R� Rm be the
smooth curve
�� :� ��� � ���
since subtraction of vectors is well defined in Rm. The curves �, �, �� and �� are shown
on the commutative diagram, Figure 5.9. Denote the local coordinate representation of � by
�� � x�1 � �. Observe that since � is a geodesic of G then �� is a geodesic of Rn equipped
with the metric g� (Mishchenko & Fomenko 1980, Lemma 3, pg. 345). Consider the following
156 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
article of this work is Sanz-Serna (1991). Following from this approach is a general concept
of numerical stability (Stuart & Humphries 1994) which is loosely defined as the ability of
a numerical integration method to reproduce the qualitative behaviour of the continuous-time
solution of a differential equation. The development given in Stuart and Humphries (1994) is
not directly applicable to the solution of optimization problems since it is primarily focussed on
integration methods and considers only a single qualitative behaviour at any one time, either the
preservation of an invariant of a flow (Hamiltonian systems) or the convergence of the solution
to a limit point (contractive problems). In contrast, the optimization problems discussed in
this thesis require two simultaneous forms of numerical stability, namely preservation of the
constraint relation and convergence to a limit point within the constraint set.
This leads one to consider what properties a numerical method for optimization on a
homogeneous space should display. In Chapter 1 the three properties of simplicity, global
convergence and constraint stability were defined (page 2) in the context of numerical methods
for on-line and adaptive processes. The modified gradient descent algorithms proposed in the
early part of this thesis all displayed these properties. It is natural to ask whether the proposed
algorithms are in fact closely related. In particular, since the the only difference between the
proposed algorithms is in the curves used to interpolate the gradient flow it is important to
investigate the properties of these curves more carefully. Indeed, one may ask whether the
choice of curves can be justified or whether there may be more suitable choices available.
In this chapter I begin by reviewing the gradient descent algorithms proposed in Chapters 2
to 4 and using the theoretical results of Chapter 5 to develop a mathematical framework which
explains each algorithm as an example of the same concept. This provides a design procedure
for a deriving numerical methods suitable for solving any constrained optimization problem on
a homogeneous space.
The remainder of the chapter is devoted to developing a more sophisticated constrained
optimization algorithm exploiting the general theoretical framework provided by Chapter 5.
The method considered is based on the Newton-Raphson method reformulated (in coordinate
free form) to evolve explicitly on a Lie-group. Local quadratic convergence behaviour is proved
though the method is not globally convergent. To provide an interesting example the symmetric
eigenvalue problem is considered (first discussed in Chapter 2) and a Newton-Raphson method
derived for this case. It is interesting to compare the behaviour of this example with the classical
x6-1 Gradient Descent Algorithms on Homogeneous Spaces 157
shifted QR algorithm, however, it is not envisaged that the proposed method is competitive for
solving traditional problems. The interest in such methods is for solving numerical problems
for on-line and adaptive processes.
The chapter is divided into five sections. Section 6.1 discusses the theoretical foundation
of the modified gradient descent algorithms proposed in Chapters 2 to 4 and develops a general
template for generating such methods. Section 6.2 develops the general form of the Newton-
Raphson iteration on a Lie-group and proves quadratic convergence of the algorithm in a
neighbourhood of a given critical point. Section 6.3 provides a coordinate free formulation of
the Newton-Raphson algorithm. The theory is applied to the symmetric eigenvalue problem in
Section 6.4 and a comparison is made to the performance of the QR algorithm.
6.1 Gradient Descent Algorithms on Homogeneous Spaces
In this section the numerical algorithms proposed in Chapters 2 to 4 are discussed in the context
of the theoretical discussion of Chapter 5.
Recall the constrained optimization problem posed in Chapter 2 for computing the spectral
decomposition of a matrix H0. The algorithm proposed for this task was the double-bracket
algorithm (2.1.4)
Hk�1 � e��k �Hk�D�Hke�k�Hk�D��
where1 D � diag��1� � � � � �N�. The algorithm has the property of explicitly evolving on the
set
M�H0� � fUTH0U j U � O�N�g
of all orthogonal congruency transformations of H0. The set of orthogonal matrices O�N� is
certainly an abstract group and indeed is a Lie-subgroup ofGL�N�R� (Warner 1983, pg. 107).
The orthogonal group O�N� features in all of the numerical algorithms considered and it
seems a good opportunity to review its geometric structure.
1. The identity tangent space of O�N� is the set of skew symmetric matrices (Warner 1983,
1In Chapter 2 the diagonal target matrix was denoted N , however, to avoid confusion with the notation ofChapter 5 the target matrix is now denoted D and the dimension of the matrices is denoted N .
158 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
pg. 107)
TINO�N� � Sk�N� � f� � RN�N j � � ��T g�
2. The tangent space at a point U � O�N� is given by the image TINO�N� via the
linearization of rU : O�N�� O�N�, rU�W � :� WU (right translation by U )
TUO�N� � f�U � RN�N j � � Sk�N�g� �6�1�1�
3. By inclusion Sk�N� � RN�N is a Lie-subalgebra of the Lie-algebra gl�N�R� of
GL�N�R�. In particular, Sk�N� is closed under the matrix Lie-bracket operation
�X� Y � � Sk�N� if X and Y are skew symmetric.
4. The scaled Euclidean inner product on Sk�N�
h�1��2i � 2tr��T1 �2�
generates a right invariant group metric on O�N�,
g��1U��2U� � 2tr��T1 �2�� �6�1�2�
Observe that g��1U��2U� � 2tr�UT�T1 �2U� � h�1U��2Ui since UTU � IN . Thus
the right invariant group metric onO�N� is the scaled Euclidean inner product restricted
to each individual tangent space.
5. The Levi-Civita connection generated by the right invariant group metric (6.1.2) (cf.
Example 5.8.3) is associated with the bilinear map � : Sk�N�� Sk�N�� Sk�N�,
���1��2� � ��1��2��
This follows directly from (5.8.5) while observing that� � Sk�N� implies ��T ��� � 0.
The extra factor of 2 in(6.1.2) cancels the factor of 1�2 in (5.8.5).
6. The value of ������ � 0 for any � � Sk�N� and thus all curves
��t� � exp�t��
x6-1 Gradient Descent Algorithms on Homogeneous Spaces 159
are geodesics on O�N� passing through IN at time t � 0. By uniqueness this includes
all the possible geodesics on O�N� passing through IN .
7. Geodesics on O�N� passing through U � O�N� and with tangent vector ���0� � �U �TUO�N� at time t � 0 are given by (cf. Section 5.9)
��t� � exp�t��U�
Recall once more the double-bracket algorithm (2.1.4) Hk�1 � e��k�Hk�D�Hke�k�Hk�D�,
mentioned above. In Section 2.5 the associated orthogonal algorithm
Uk�1 � Uke�k �U
TkH0Uk�D�
was discussed and shown to be related to the double-bracket equation via the algebraic rela-
tionship
Hk � UTk H0Uk�
Unfortunately, Uke�k�UTkH0Uk�D� does not appear to be in the correct form for a geodesic
exp�t��U onO�N�. The reason for this lies in the characterisation ofM�H0� � fUTH0U j U �O�N�g. In particular, �U�H� � UTHU is not a group action of O�N� on M�H0�. The use
of this awkward definition for M�H0� is historical (cf. Brockett (1988) and the development
in Helmke and Moore (1994b, Chapter 2)). By considering the related characterisation
M�H0� � fWH0WT jW � O�N�g�
M�H0� is seen to be a homogeneous space with transformation group O�N� and group action
�W�H� � WHWT . Of course, all that has been done is to take the transpose of the orthogonal
matrices. It is easily shown that the associated orthogonal iteration for the new characterisation
of M�H0� is
Wk�1 � e��k �WkH0WTk�D�Wk�
Observe that this iteration is constructed from geodesics on O�N�. Thus, the associated
orthogonal iteration for the double-bracket algorithm is a geodesic interpolation of the flow
�W � �WH0WT � D�W . Using Lemma 5.9.2 geodesics on O�N� will map to geodesics on
M�H0� and one concludes that the the double-bracket algorithm itself is a geodesic interpolation
160 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
of the double-bracket flow
�H � �H� �H�D���
Recall that geodesics are curves of minimum length between two points on a curved surface
and are the natural generalization of straight lines to non-Euclidean geometry. Then, at least
for the double-bracket algorithm, the question posed in the introduction to this chapter, whether
the choice of interpolating curves in the proposed numerical algorithms is justified, is answered
in the affirmative.
It should not come as a surprise that the other algorithms proposed in Chapters 2 to 4 are
also geodesic interpolations of continuous-time flows. The algorithm proposed in Section 2.4 is
based directly on the double-bracket equation and can be analysed in exactly the same manner.
In Chapter 3 the Rayleigh gradient algorithm (3.2.1) is immediately in the correct form to
observe its geodesic nature. Indeed, for the rank-1 case (cf. Subsection 3.4.1) the geodesic
nature of the recursion has already been observed explicitly. Finally the pole placement
algorithm (4.6.1) proposed in Chapter 4
i�1 � ie��i�T
iFi�Q�T
iFi�A���
� e��i�F�iQ�TiFi�A�T
i� i
is explicitly a geodesic interpolation of the gradient flow (4.4.7)
� � �F� Q�A� TF � T � �
evolving directly on the Lie-group O�N�.
Thus, the algorithms proposed in Chapters 2 to 4 form a template for a generic numerical
approach to solving optimization problems on homogeneous spaces associated with the orthog-
onal group. In every case considered exponential interpolation of the relevant continuous-time
flow is equivalent to geodesic interpolation of the flow due to the specific structure of O�N�.
Care should be taken before the same approach is used for more abstract Lie-groups (the easily
constructed exponential interpolation curves may no longer be geodesics), nevertheless, the
basic structure of the algorithms presented is extremely simple and could be applied to almost
any optimization problem on a homogeneous space. Of course, step-size selection schemes
x6-2 Newton-Raphson Algorithm on Lie-Groups 161
must be determined for each new situation and the stability analysis depends on the step-size
selection. The basic properties of the algorithms will remain consistent, however, and provide
a useful technique for practical problems where the properties of constraint stability and global
convergence (cf. page 1) are more important than those of computation cost.
6.2 Newton-Raphson Algorithm on Lie-Groups
In this section a general formulation of the Newton-Raphson algorithm is proposed which
evolves explicitly on a Lie-group. Interestingly, the iteration can be expressed in terms of
Lie-derivatives and the exponential map. In practise, one still has to solve a linear system of
equations to determine the regression vector.
The Newton-Raphson algorithm is a classical (quadratically convergent) optimization tech-
nique for determining the stationary points of a smooth vector field (Kincaid & Cheney 1991,
pg. 64). Given Z : Rn � Rn a smooth vector field2 on Rn, let p � Rn be a stationary
point of Z (i.e. Z�p� � 0) and let q � Rn be an estimate of the stationary point p. Let
k � �k1� k2� � � � � kn�, with k1� � � � � kn non-negative integers be a multi-index and denote its
size by jkj � k1 � k2 � � kn. Expanding Z as a Taylor series around q one obtains for
each element of Z � �Z1� Z2� � � � � Zn�,
Zi�x� � Zi�q� ��X
jkj�1
�h1�k1 �hn�knk1! kn!
�jkjZi
��x1�k1 ��xn�kn �q�
where h � x � q � Rn and hj is the j’th element of h, and the sum is taken over all
multi-indices k with jkj � j for j � 1� 2� � � �. The Taylor series of an analytic3 function is
uniformly and absolutely convergent in a neighbourhood of q (Fleming 1977, pg. 97). Indeed,
if q is a good estimate of p one expects that only the first few terms of the Taylor series are
sufficient to provide a good approximation of Zi. Assume that p is known and consider setting
2When dealing with Euclidean space one naturally associates the element � �x i of the basis of TxRn with thebasis element ei of Rn (the unit vector with a 1 in the i’th position). This induces an isomorphism TxR
n � Rn
(Warner 1983, pg. 86) and one writes a vector field as map Z : Rn � Rn rather than the technically more correctZ : Rn � TRn , Z�x� � TxR
n .3In fact a smooth function f � C�M� on a smooth manifold M is defined to be analytic at a point p �M if the
Taylor series of f �, the expression of f in local coordinates centred at p, is uniformly and absolutely convergent ina neighbourhood of 0.
162 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
x � p � h � q. Ignoring all terms with jkj 2 one obtains the approximation
0 � Zi�p� � Zi�q� �nXj�1
�Zi
�xj�q�hj �
The Jacobi matrix is defined as the n � n matrix with �i� j�’th element �JqZ�ij � �Zi
�xj�q�
(Mishchenko & Fomenko 1980, pg. 16). Thus, the above equation can be rewritten in matrix
form as 0 � Z�q� � JqZ h. When JqZ is non-singular one can solve this relation uniquely
for h, an estimate of the residule error between q and p. Thus, one obtains a new estimate q� of
p based on the previous estimate q and the correction h
q� � q � h�
This estimate is the next estimate of the Newton-Raphson algorithm. Given an initial estimate
q0 � Rn, the Newton-Raphson algorithm is:
Algorithm 6.2.1 [Newton-Raphson Algorithm onRn]
Given qk � Rn compute Z�qk�.
Compute the Jacobi matrix JqkZ given by �JqkZ�ij ��Zi
�xj�qk�.
Set h � ��JqkZ��1Z�qk�.
Set qk�1 � qk � h.
Set k � k � 1 and repeat.�
The convergence properties of the Newton-Raphson algorithm are given by the following
proposition (Kincaid & Cheney 1991, pg. 68).
Proposition 6.2.2 Let Z : Rn � Rn be an analytic vector field on Rn and p � Rn be a
stationary point of Z. Then there is a neighbourhood U of p and a constant C such that the
Newton-Raphson method (Algorithm 6.2.1) converges to p for any initial estimate q0 � U and
the error decreases quadratically
jjqk�1 � pjj � Cjjqk � pjj2�
It is not clear how best to go about reformulating the Newton-Raphson algorithm on an
x6-2 Newton-Raphson Algorithm on Lie-Groups 163
arbitrary Lie-group G. One could use the Euclidean Newton-Raphson algorithm in separate
local coordinate charts on G. Care must be taken, however, since local coordinate charts may
display extreme sensitivity to perturbation in the Euclidean coordinates, leading to numerically
ill conditioned algorithms.
Given a Lie-groupG, let � C��G� be an analytic real function onG. Denote the identity
element of G by e and associate the tangent space TeG with the Lie-algebra g of G in the
canonical manner (cf. Section 5.6). For X � TeG arbitrary define a right invariant vector
field �X � D�G� by �X � dr�X , for i � 1� � � � � n, where r���� :� �� (cf. (5.1.1) and the
analogous definition for left invariant vector fields (5.6.1)). Recall that the map t � exp�tX�
(where the exponential is the unique Lie-group homomorphism associated with the Lie-algebra
homomorphism ��d�dt� � �X , cf. (5.6.7)) is an integral curve of �X passing through e at
time zero. Given � � G arbitrary, the map t � exp�tX�� is an integral curve of the right
invariant vector field �X passing through the point � � G at time zero. It follows directly from
this observation that
� �X��exp��X��������t
�d
dt�exp��X���
������t
�
Indeed, there is a natural extension of this idea which generalizes to higher order derivatives.
These derivatives can be combined into a Taylor theorem for analytic real functions on a Lie-
group. Proposition 6.2.3 is proved in Varadarajan (Varadarajan 1984, pg. 96) and formalises
this concept. Before this result can be stated it is necessary to introduce some notation.
Notation: Let k � �k1� k2� � � � � kn�, with k1, k2� � � � non-negative integers, represent a multi-
index and denote its size by jkj � k1 � k2 � � kn. Let Z1� � � � � Zn be n objects and
let t � �t1� � � � � tn� be any set of n real numbers. The set of objects (in Proposition 6.2.3
the objects will be vector fields) of the form t1Z1 � � tnZn forms a vector space under
addition and scalar multiplication. One also considers formal products of elements, for example
�t1Z1��t2Z2��t1Z1� � t21t2�Z1Z2Z1�, where the scalar multiplication is commutative but
multiplication between elements Z1 and Z2 is non-commutative. One defines an additional
element 1 � Z0 which acts as a multiplicative identity,Z0�t1Z1� � �t1Z1� � �t1Z1�Z0. Given
a multi-index k � �k1� k2� � � � � kn� consider a second multi-index �i1� � � � � ijkj� with jkj entries
ip � �0� 1� � � � � n� such that the number of occurrences where ip � j for 1 � j � n is exactly
164 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
kj . Let Z � t1Z1 � � tnZn then the formal power Zk is defined by
�t1Z1 � � tnZn�k �
1jkj!
X�i1�i2��ijkj�
�tk11 tknn ��Zi1Zi2 Zijkj��
In other words, the sum is taken over all permutations of elements of the form �ti1Zi1��ti2Zi2�
�tijkjZijkj� such that there are exactly k1 occurrences of t1Z1, k2 occurrences of t2Z2 etc.
Of course, if the size of jkj is equal to either zero or one then the situation is particularly simple
�t1Z1 � � tnZn�k � 1� for jkj � 0
�t1Z1 � � tnZn�k � tkjZkj � for jkj � 1�where kj � 1 is the only
nonzero element of k�
Proposition 6.2.3 Given G a Lie-group and � C��G� an analytic real function in a neigh-
bourhood of a point � � G, let X1� � � � � Xn � TeG be a basis for the identity tangent space of
G. Define the associated right invariant vector fields �Xi � dr�Xi, for i � 1� � � � � n and let k
represent a multi-index with n entries. The asymptotic expansion
�exp�t1X1 � � tnXn��
�
�Xjkj�0
tk11 tknnk1! kn!
�� �X1 � � �Xn�k���� �6�2�1�
converges absolutely and uniformly in a neighbourhood of �.
Let G be a Lie-group and � C��G� be an analytic map on G. Choose a basis
X1� � � � � Xn � TeG for the identity tangent space of G and define the associated right in-
variant vector fields �Xi � dr�Xi, for i � 1� � � � � n. Expressing as a Taylor series around a
point � � G one has
�exp�t1X1 � � tnXn��
� ��� �
nXj�1
tj� �Xj���� � O�jjtjj2� �6�2�2�
where O�jjtjj2� represents the remainder of the Taylor expansion, all terms for which jkj 2.
By taking the derivative of this relation with respect to the vector fields �Xi and discarding the
higher order terms one obtains the approximation
�Xi�exp�t1X1 � � tnXn��
� �Xi��� �nXj�1
� �Xi�Xj����tj� �6�2�3�
x6-2 Newton-Raphson Algorithm on Lie-Groups 165
Define the Jacobi matrix of to be the n� n matrix with �i� j�’th element
�J��ij � � �Xi�Xj���� �6�2�4�
which is dependent on the choice of basisX1� � � � � Xn for TeG. Define the two column vectors
t � �t1� � � � � tn�T and ���� � � �X1���� � � � � �Xn����
T . Recalling the discussion of the
Newton-Raphson method on Rn it is natural to consider the following iteration defined for
� � G
t � ��J���1����
�� � exp�t1X1 � � tnXn���
The motivation for considering this algorithm parallels that given above for the Newton-
Raphson method on Rn. If � is a critical point of then �Xi��� � 0 for each �Xi. Thus
assuming that exp�t1X1 � � tnXn�� � � and then solving the approximate relation for
�t1� � � � � tn� gives a new estimate �� � exp�t1X1 � � tnXn��. It follows that if � was a
good estimate of � then the difference between and the approximate Taylor expansion should
be of order O�jjtjj2� and consequently the new estimate �� will be a correspondingly better
estimate of �. Given an initial point �0 � G and a choice of n basis elements fX1� � � � � Xngfor TeG the Newton-Raphson algorithm on G is:
Algorithm 6.2.4 [Newton-Raphson Algorithm on a Lie-group G]
Given �k � G compute ���k�.
Compute the Jacobi matrix �J�k� given by �J�k�ij � � �Xi�Xj���k�.
Set t � ��J�k��1���k�.
Set �k�1 � exp�t1X1 � � tnXn��k.
Set k � k � 1 and repeat.�
Lemma 6.2.5 Given G a Lie-group and � C��G� an analytic real function with a critical
point� � G, let � � G be arbitrary and define f : Rn � G, f�t� :� exp�t1X1� � tnXn��
to be canonical coordinates of the first kind on G centred at � (Varadarajan 1984, pg. 88).
Define a smooth vector field Z on Rn by Z�t� � � �X1�f�t��� � � � � �Xn�f�t���. An iteration
of the Newton-Raphson algorithm (Algorithm 6.2.4) on G with initial condition � is the image
166 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
of a single iteration of the Newton-Raphson algorithm (Algorithm 6.2.1) on Rn with initial
condition 0 via the canonical coordinates f .
Proof Observe that
Z�0� � � �X1�f�0��� � � � � �Xn�f�0���
� �����
Also for 1 � i� j � n one finds
�
�tiZj
����t�0
��
�ti�Xj�f�t��
����t�0
��
�ti� �Xj � r���exp�tiXi��
����ti�0
� �Xi�Xj����
since ddrg�exp�rX��
���r�0
� �Xg�e� for any g � C��G� and X � g. Thus, the two Jacobi
matrices J0Z � J�� are equal. The Newton-Raphson algorithm onRn is just
t � 0� �J0Z��1Z�0� � ��J����1������
and the image of t is exactly �� � exp�t1X1 � � tnXn�� the Newton-Raphson algorithm
on G.
It is desirable to prove a similar result to Proposition 6.2.2 for the Newton-Raphson method
on a Lie-group. To compute the rate of convergence one needs to define a measure of distance
in a neighbourhood of the critical point considered. Let� � G be a neighbourhood of a critical
point � � G of an analytic function � C��G� and let fX1� � � � � Xng be a basis for TeG as
above. There exists a subset U � � such that the canonical coordinates of the first kind on
G centred at �, �t1� � � � � tn� � exp�t1X1 � � tnXn�� are a local diffeomorphism onto U
(Helgason 1978, pg. 104). One defines distance withinU by the distance induced on canonical
coordinates centred at � by the Euclidean norm inRn,
jj exp�t1X1 � � tnXn��jj ��
nXi�1
�ti�2
� 12
�
x6-2 Newton-Raphson Algorithm on Lie-Groups 167
Lemma 6.2.6 Given � C��G� an analytic real function on a Lie-group G let � � G be a
critical point of . There exists a neighbourhoodW � G of p and a constantC � 0 such that
the Newton-Raphson algorithm on G (Algorithm 6.2.4) converges to � for any initial estimate
�0 � W and the error, measured with respect to distance induced by canonical coordinates of
the first kind, decreases quadratically
jj�k�1 � �jj � Cjj�k � �jj2�
Proof LetU1 � Rn be an open neighbourhood of 0 inRn and define a smooth vector byZ�x� �
� �X1�f�x��� � � � � �Xn�f�x���, where f : Rn � G, f�x� :� exp�x1X1 � � xnXn�� are
canonical coordinates of the first kind. Since � is a critical point of then 0 is a stationary
point of Z, i.e. Z�0� � 0. Applying Proposition 6.2.2 one obtains an open neighbourhood
U2 � U1 of 0 and a constant C1 such that the Newton-Raphson algorithm on Rn (Algorithm
6.2.1) converges quadratically to zero for any initial condition in U2.
A standard result concerning the exponential of the sum of two elements of a Lie-algebra
To see that the sequence qk�1 does in fact converge to zero one observes that jjqk�1jj � 12 jjqkjj
since jjqkjj � 14�C1�C2�
. Observing that qk�1 is just the representation of the next iterate �k�1
of the Newton-Raphson algorithm on G (Algorithm 6.2.4) in local coordinates, one has
jj�k�1 � �jj � Cjj�k � �jj2�
where C � 2C2 � C1 and the proof is complete.
Remark 6.2.7 An interesting observation is that though each single iteration of the Newton-
Raphson algorithm (Algorithm 6.2.4) on G is equivalent to an iteration of the Euclidean
Newton-Raphson algorithm (Algorithm 6.2.1) in a certain set of local coordinates this is not
true of multiple iterations of the algorithm and the same coordinate chart. �
6.3 Coordinate Free Newton-Raphson Methods
The construction presented in the previous section for computing the Newton-Raphson method
on a Lie-group G depends on the construction of the Jacobi matrix J� (cf. (6.2.4)) which is
explicitly defined in terms of an arbitrary choice of n basis vectors fX1� � � � � Xng for TeG. In
this section the Newton-Raphson algorithm on an arbitrary Lie-group equipped with a right
invariant Riemannian metric is formulated as a coordinate free manner iteration.
LetG be a Lie-group with an inner product ge�� � defined on TeG. Denote the right invari-
ant group metric that ge generates on G by g (cf. Section 5.3). Choose a basis fX1� � � � � Xngfor TeG which is orthonormal with respect to the inner product ge�� �, (i.e. ge�Xi� Xj� � �ij ,
where �ij is the Kronecker delta function, �ij � 0 unless i � j in which case �ij � 1). Define
the right invariant vector fields
�Xi � dr�Xi�
170 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
associated with the basis vectors fX1� � � � � Xng. Since the basis fX1� � � � � Xng was chosen to
be orthonormal it follows that the decomposition of an arbitrary smooth vector fieldZ � D�G�,can be written
Z �nXj�1
zj �Xj �nXj�1
g� �Xj� Z� �Xj�
In particular, let � C��G� be an analytic real map on G and grad be defined with respect
to the metric g (cf. Section 5.4)
grad �nXj�1
g� �Xj� grad� �Xj �nXj�1
� �Xj� �Xj � �6�3�1�
Let t � �t1� � � � � tn� � Rn and define the vector field �X � D�G� by �X �Pn
j�1 tj�Xj which
is the right invariant vector field associated with the unique element X �Pn
j�1 tjXj � TeG.
Observe thatPn
j�1� �Xj����tj � � �X���� and consequently post-multiplying (6.2.3) by �Xi
and summing over i � 1� � � � � n one obtains the approximation
nXi�1
� �Xi��exp�X��� �Xi �nXi�1
� �Xi���� �Xi�nXi�1
�� �Xi
nXj�1
�tj �Xj����
�A �Xi
� grad��� � grad� �X�����
Now assuming that exp�X�� is a critical point of then computing the regression vector for
the Newton-Raphson algorithm is equivalent to solving the coordinate free equation
0 � grad��� � grad� �X����� �6�3�2�
for the vector field �X (or equivalently the tangent vector X � TeG that uniquely defines �X). In
Algorithm 6.2.4 the choice of fX1� � � � � Xng was arbitrary and it follows that solving directly
for �X using (6.3.2) is equivalent to setting X � t1X1 � � tnXn, where t � �t1� � � � � tn�
is the error estimate t � ��J������. Given an initial point �0 � G the Newton-Raphson
algorithm on a Lie-group G can be written in a coordinate free form as:
The value of r�Y�X is given by the unique bilinear map associated with the right invari-
ant affine connection r (cf. Section 5.8). One has r�Xgrad � grad� �X� if and only if
g�grad�r�Y�X� � 0 for all �Y . The most likely situation for this to occur is when the bilinear
map associated with r is identically zero. For the examples considered in this thesis this will
not be true. �
6.4 Symmetric Eigenvalue Problem
In this section the general structure developed in the previous two sections is used to derive a
coordinate free Newton-Raphson method for the symmetric eigenvalue problem. An advantage
of considering the symmetric eigenvalue problem is that one can compare the Newton-Raphson
algorithm to classical methods such as the shifted QR algorithm. This provides a good
perspective on the performance of the Newton-Raphson algorithm, however, I stress that the
method is not proposed as competition to state of the art numerical linear algebra methods for
solving the classical symmetric eigenvalue problem. Rather, the focus is still on adaptive and
on-line applications.
Recall the constrained optimization problem that was posed in Chapter 2 for computing
the spectral decomposition of a matrix H . It was shown that minimising the functional
��H� :� jjH �Djj2
� jjH jj2 � jjDjj2 � 2tr�DH��
on the set4
M�H0� � fUH0UT j U � O�N�g� H0 � HT
0 � �6�4�1�
where D � diag��1� � � � � �N� (a diagonal matrix with independent eigenvalues) is equivalent
4The original definition (2.1.2) is slightly different M�H0� � fUTH0U j U � O�N�g to the definition usedhere. The mapU � UTH0U , however, is not a group action and the definition given above is equivalent to (2.1.2).
x6-4 Symmetric Eigenvalue Problem 173
to computing the eigenvalues of H (Brockett 1988, Helmke & Moore 1994b). To apply the
theory developed in the previous section one must reformulate this optimization problem on
O�N� the Lie-group associated with the homogeneous space M�H0�. The new optimization
problem that considered is:
Problem A Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N � be
a diagonal matrix with real eigenvalues �1 � � � � � �N . Consider the potential
: O�N�� R�
�U� :� �tr�DUH0UT ��
Find an orthogonal matrix which minimises over O�N�. �
It is easily seen that if one computes a minimum U� of Problem A then U�H0UT� is a
minimum of �. Recalling Section 5.4 one easily verifies that the minimising gradient flow
solutions to Problem A will map via the group action to the minimisinggradient flow associated
with � (Helmke & Moore 1994b, pg. 50).
Computing a single iteration of the Newton-Raphson method (Algorithm 6.3.1) relies on
computing both grad and grad� �X� for an arbitrary right invariant vector field �X . Recall the
discussion in Section 6.1 regarding the Riemannian geometry of O�N�.
Lemma 6.4.1 Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N�
be a diagonal matrix with real eigenvalues �1 � � � � � �N and let
: O�N�� R�
�U� :� �tr�DUH0UT ��
Express the tangent spaces of O�N� by (6.1.1) and consider the right invariant group metric
(6.1.2).
a) The gradient of on O�N� is
grad � ��UH0UT � D�U�
174 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
b) Let X � Sk�N� be arbitrary and set �X � XU � drUX the right invariant vector field
on O�N� generated by X . The gradient of �X on O�N� is
grad� �X� � ��X�D�� UH0UT �U�
Proof Recall the definition of gradient (5.4.1) and (5.4.2). The Frechet derivative of in a
direction �U � TUO�N� is
DjU ��U� � �tr�D�UH0UT �� tr�DUH0��U�
T�
� tr���D�UH0UT �T�� � g���D�UH0U
T �U��U��
Observing that ��D�UH0UT �U � TUO�N� completes the proof of part a).
For part b) observe that
�X � DjU �XU�
� tr��X�D�UH0UT ��
Taking a second derivative of this in an arbitrary direction �U one obtains
Dtr��X�D�UH0UT ����U��U� � tr��UH0U
T � �X�D����
� g���X�D�� UH0UT �U��U��
and thus grad� �X� � ��X�D�� UH0UT �U .
Recall the equation for the coordinate free Newton-Raphson method (6.3.2). Rewriting
this in terms of the expression derived in Lemma 6.4.1 gives the algebraic equation
0 � ��UH0UT � D�U � ��X�D�� UH0U
T �U
which one wishes to solve for X � Sk�N�.
Remark 6.4.2 To see that a solution to this equation exists observe that given a general linear
solutionX � Rn�n (which always exists since the equation is a linear systems of N 2 equations
x6-4 Symmetric Eigenvalue Problem 175
in N 2 unknowns) then
����XT�� D�� UH0UT � � ���D�X �T � UH0U
T �
� ���X�D�� UH0UT �T
� ��UH0UT � D�T � �UH0U
T � D��
Thus, �XT is also a solution and by linearity so is �X�XT �2 . The question of uniqueness for
the solution X � Sk�N� obtained is unclear. In the case where UH0UT � is diagonal
with distinct eigenvalues it is clear that ��X�D��� � 0 �� �X�D� � 0 �� X � 0 and the
solution is unique. As a consequence a genericity assumption on the eigenvalues of H0 would
need to be made to obtain a general uniqueness result. I expect that once such an assumption
is made on the eigenvalues of H0 the skew solution of the linear system would be unique.
Unfortunately I have no proof for this result at the present time. �
Given an initial matrix H0 and choosing U0 � In then the Newton-Raphson solution to
Problem A is:
Algorithm 6.4.3 [Newton-Raphson Algorithm for Spectral Decomposition]
Find Xk � Sk�N� such that
��Xk� D�� UkH0UTk � � �UkH0U
Tk � D�� �6�4�2�
Set Uk�1 � eXkUk , where eXk is the matrix exponential of Xk.
Set k � k � 1 and repeat.�
Remark 6.4.4 To solve (6.4.2) one can reformulate the matrix system of linear equations as a
constrained vector linear system. Denote by vec�A� the vector generated by taking the columns
of A � Rl�m (for l and m arbitrary integers) one on top of the other. Taking the vec of both
sides of (6.4.2) gives5
��DUH0U
T �T � IN � �UH0UT ��D �D � �UH0U
T � � IN � �UH0UTD�
�vec�Xk�
� vec��UH0UT � D��� (6.4.3)
5Let A, B and C be real N N matrices and let Aij denote the ij’th entry of the matrix A. The Kronecker
176 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
Iteration
kD
ista
nce
|| H
– D
||
Gradient Descent
Newton–Raphson
Figure 6.4.1: Plot of jjHk �Djj where Hk � UkH0UT
kand Uk is a solution to both (6.4.4)
and Algorithm 6.4.2. The eigenvalues ofH0 are chosen to be ��1� � � � � �N � the eigenvalues ofD thoughH0 is not diagonal. Thus, the minimum Euclidean distance between Hk �M �H0�and D is zero. By plotting the Euclidean norm distance jjHk�Djj on a logarithmic scale thequadratic convergence characteristics of Algorithm 6.4.2 are displayed.
The constraint Xk � Sk�N� can be written as a vector equation
�IN 2 � P �vec�Xk� � 0�
where P is the N2 �N2 permutation matrix such that vec�A� � Pvec�AT �, A � RN�N .
In practice, it is known that a skew symmetric solution to (6.4.3) exists and one proceeds
by extracting the 12N�N� 1�� 1
2N�N� 1� submatrix of the N 2�N2 Kronecker product and
using Gaussian elimination to solve for the free variables Xij , i � j. �
Of course a Newton-Raphson algorithm cannot be expected to converge globally in O�N�
and for arbitrary choice of H0 one must couple the Newton-Raphson algorithm with some
other globally convergent method to obtain a practical numerical method. In the following
simulations the associated orthogonal iteration described in Section 2.5 is used. In fact the
product of two matrices is defined by
A�B �
�B� A11B � � � A1nB...
...An1B � � � AnnB
�CA � RN2�N2
�
A readily verified identity relating the vec operation and Kronecker products is (Helmke & Moore 1994b, pg. 314)
vec�ABC� � �CT � A�vecB�
x6-4 Symmetric Eigenvalue Problem 177
algorithm implemented is a slight variation of (2.5.1)
Uk�1 � e��k�UkH0UTk�D�Uk � �6�4�4�
where the modification is due to the new definition (6.4.1) of M�H0�. The step size selection
method used is that given in Lemma 2.2.4
k �1
2jj�Hk� D�jj log�jj�Hk� D�jj2
jjH0jj jj�D� �Hk� D��jj � 1��
where Hk � UkH0UTk and Uk is a solution to (6.4.4). The minor difference between (6.4.4)
and the associated orthogonal double-bracket algorithm (2.5.1) does not affect the convergence
results proved in Chapter 1. It follows that (6.4.4) is globally convergent to an orthogonal
matrix U� such that U�H0UT� is a diagonal matrix with diagonal entries in descending order.
Figure 6.4.1 is an example of (6.4.4) combined with the Newton-Raphson algorithm 6.4.3.
The aim of the simulation is to display the quadratic convergence behaviour of the Newton-
Raphson algorithm. The initial condition used was generated via a random orthogonal congru-
ency transformation of the matrix D � diag�1� 2� 3�,
H0 �
�BBBB�2�1974 �0�8465 �0�2401
�0�8465 2�0890 �0�4016
�0�2401 �0�4016 1�7136
�CCCCA �
Thus, the eigenvalues ofH0 are 1, 2 and 3 and the minimum distance between D andM�H0� is
zero. In Figure 6.4.1 the distance jjHk�Djj is plotted for Hk � UkH0UTk and Uk is a solution
to both (6.4.4) and Algorithm 6.4.2. In this example the modified gradient descent method
(6.4.4) was used for the first six iterations and the Newton-Raphson algorithm was used for the
remaining three iterations. The plot of jjHk � Djj measures the absolute Euclidean distance
betweenHk andD. Naturally, there is some distortion involved in measuring distance along the
surface of M�H0�, however for limiting behaviour, jjHk �Djj is a reasonable approximation
of distance measured along M�H0�. The distance jjHk � Djj is expressed on a log scale to
show the linear and quadratic convergence behaviour. In particular, the quadratic convergence
behaviour of the Newton-Raphson algorithm is displayed by iterations seven, eight and nine in
Figure 6.4.1.
178 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
0 2 zero zero 4 zero 61 1.6817 3.2344 0.86492 1.6142 2.5755 0.00063 1.6245 1.6965 10�13
4 1.6245 0.0150 converg.5 1.5117 10�9
6 1.1195 converg.7 0.70718 converg.
Table 6.4.1: The evolution of the lower off-diagonal entries of the shiftedQRmethod describedby Golub and Van Loan (1989, Algorithm 8.2.3., pg. 423). The initial condition used is H �
0(6.4.5).
To provide a comparison of the coordinate free Newton-Raphson method to classical
algorithms the following simulation is completed for both the Newton-Raphson algorithm and
the shifted QR algorithm (Golub & Van Loan 1989, Section 8.2). The example chosen is
taken from page 424 of Golub and Van Loan (1989) and rather than simulate the symmetric
QR algorithm again the results used are taken directly from the book. The initial condition
considered is the tridiagonal matrix
H�0 �
�BBBBBBB�
1 2 0 0
2 3 4 0
0 4 5 6
0 0 6 7
�CCCCCCCA� �6�4�5�
To display the convergence properties of the QR algorithm (Golub & Van Loan 1989) give
a table in which they list the the values of the off-diagonal elements of each iterate generated
for the example considered. This table is included (in a slightly modified format) as Table 6.4.
Each element �Hk�ij is said to have converged when it has norm of order 10�12 or smaller.
The initial condition H �0 is tridiagonal and the QR algorithm preserves tridiagonal structure
and consequently the elements �Hk�31, �Hk�41 and �Hk�42 remain zero for all iterates. The
convergence behaviour of the symmetric QR algorithm is cubic in successive off-diagonal
entries. Thus, �Hk�43 converges cubically to zero, then �Hk�32 converges cubically and so on
(Wilkinson 1968). The algorithm as a whole, however, does not converge cubically since each
x6-4 Symmetric Eigenvalue Problem 179
off-diagonal entry must converge in turn.
It is interesting to also display the results in a graphical format (Figure 6.4.2). Here the
norm jjHk � diag�Hk�jj
jjHk � diag�Hk�jj ���Hk�
221 � �Hk�
231 � �Hk�
241 � �Hk�
232 � �Hk�
242 � �Hk�
243
� 12�
is plotted verses iteration. This would seem to be an important quantity which indicates
robustness and stability margins of the numerical methods considered when the values of Hk
are uncertain or subject to noise in an on-line or adaptive environment. The dotted line show
the behaviour of the QR algorithm. The plot displays the property of the QR algorithm that it
must be run to completion to obtain a solution.
Figure 6.4.2 also shows the plot of jjHk� diag�Hk�jj for a sequence generated initially by
the modified gradient descent algorithm (6.4.4) (the first five iterations) and then the Newton-
Raphson algorithm (for the remaining three iterations). Since the aim of this simulation is to
show the potential of Newton-Raphson algorithm the parameters were optimized to provide
good convergence properties. The step-size for (6.4.4) was chosen as a constant k � 0�1
which is somewhat larger than the variable step-size used in the first simulation. This ensures
slightly faster convergence in this example, although in general there are initial conditions H0
for which the modified gradient descent algorithm is unstable with step-size selection fixed at
0.1. The point at which the modified gradient descent algorithm was halted and the Newton-
Raphson algorithm was begun was also chosen by experiment. Note that the Newton-Raphson
method acts directly to decrease the cost jjHk � diag�Hk�jj, at least in a local neighbourhood
of the critical point. It is this aspect of the algorithm that suggests it would be useful in an
on-line or adaptive environment.
Remark 6.4.5 It is interesting to note that in this example the combination of the modified gra-
dient descent algorithm (6.4.4) and the Newton-Raphson method (Algorithm 6.4.2) converges
in the same number of iterations as the QR algorithm. �
180 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
Iteration
kk
|| H
–
diag
( H
)
||
Gradient descentNewton–Raphson
QR algortihm
Figure 6.4.2: A comparison of jjHk�diag�Hk�jjwhereHk is a solution to the symmetricQRalgorithm (dotted line) and Hk � UkH0U