Optimization Algorithms on Homogeneous Spacesusers.cecs.anu.edu.au/~john/studenttheses/robertmahony.pdf5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127 5.4 Gradient

Optimization Algorithmson

Homogeneous Spaces

WITH APPLICATIONS IN LINEAR SYSTEMS THEORY

Robert Mahony

March 1994

Presented in partial fulfilment of the requirements

for the degree of Doctor of Philosophy

at the Australian National University

Department of Systems Engineering

Research School of Information Sciences and Engineering

Australian National University

Acknowledgements

I would like to thank my supervisors John Moore and Iven Mareels for their support, insight,

technical help and for teaching me to enjoy research. Thanks to Uwe Helmke for his enthusiasm

and support and Wei-Yong Yan for many useful suggestions. I would also like to thank the

other staff and students of the department for providing an enjoyable and exciting environment

for work, especially the students from lakeview for not working too hard. I reserve a special

thanks for Peter Kootsookos because I owe him one.

I have been lucky enough to visit Unite Auto, Catholic University of Leuven, Louvain-la-Neuve

and the Department of Mathematics, University of Regensburg, for extended periods during

my studies and thank the staff and students of both institutions for their support.

A number of people have made helpful comments and contributions to the results contained in

this thesis. In particular, I would like to thank George Bastin, Guy Campion, Kenneth Driessel,

Ed Henrich, Ian Hiskens, David Hill and David Stewart as well several anonymous reviewers.

Apart from the support of the Australian National University I have also received additional

financial support from the following sources:

The Cooperative Research Centre for Robust and Adaptive Systems, funded by the Aus-

tralian Commonwealth Government under the Cooperative Research Centres Program.

Grant I-0184-078.06/91 from the G.I.F., the German-Israeli Foundation for Scientific

Research and Development

Boeing Commercial Aircraft Corporation.

Lastly I thank Pauline Allingham for her support and care throughout my doctorate.

i

Statement of Originality

The work presented in this thesis is the result of original research done by myself, in col-

laboration with others, while enrolled in the Department of Systems Engineering as a Doctor

of Philosophy student. It has not been submitted for any other degree or award in any other

university or educational institution.

Following is a list of publications in refereed journals and conference proceedings completed

while I was a Doctor of Philosophy student. Much of the technical discussion given in this

thesis is based on work described in the papers numbers [1,2,5,6,10,11] from the list below.

The remaining papers cover material I chose not to include in this thesis.

Journal Papers:

1. R. E. Mahony and U. Helmke. System assignment and pole placement for symmetric

realisations. Submitted to Journal of Mathematical Systems, Estimation and Control,

1994.

2. R. E. Mahony, U. Helmke, and J. B. Moore. Gradient algorithms for principal component

analysis. Submitted to Journal of the Australian Mathematical Society, 1994.

3. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and

implications for Lyapunov direct stability methods. To appear Journal of Mathematical

Systems, Estimation and Control, 1994.

4. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Non-linear feedback laws for

output regulation. Draft version, 1994.

5. J. B. Moore, R. E. Mahony, and U. Helmke. Numerical gradient algorithms for eigenvalue

and singular value calculations. SIAM Journal of Matrix Analysis, 15(3), 1994.

ii

Conference Papers:

6. R. E. Mahony, U. Helmke, and J. B. Moore. Pole placement algorithms for symmetric

realisations. In Proceedings of IEEE Conference on Decision and Control, San Antonio,

U.S.A., 1993.

7. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and

implications for Lyapunov stability methods. In Proceedings of the 12’th World Congress

of the International Federation of Automatic Control, Sydney, Australia, 1993.

8. R. E. Mahony and I. M. Mareels. Non-linear feedback laws for output stabilization.

Submitted to the IEEE Conference on Decision and Control, 1994.

9. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Output regulation for systems

linear in the input. In Conference on Mathematical Theory of Networks and Systems,

Regensburg, Germany, 1993.

10. R. E. Mahony and J. B. Moore. Recursive interior-point linear programming algo-

rithm based on Lie-Brockett flows. In Proceedings of the International Conference on

Optimisation: Techniques and Applications, Singapore, 1992.

11. J. B. Moore, R. E. Mahony, and U. Helmke. Recursive gradient algorithms for eigenvalue

and singular value decompositions. In Proceedings of the American Control Conference,

Chicago, U.S.A., 1992.

Robert Mahony

iii

iv

Abstract

Constrained optimization problems are commonplace in linear systems theory. In many cases

the constraint set is a homogeneous space and the additional geometric insight provided by

the Lie-group structure provides a framework in which to tackle the numerical optimization

task. The fundamental advantage of this approach is that algorithms designed and implemented

using the geometry of the homogeneous space explicitly preserve the constraint set.

In this thesis the numerical solution of a number of optimization problems constrained

to homogeneous spaces are considered. The first example studied is the task of determining

the eigenvalues of a symmetric matrix (or the singular values of an arbitrary matrix) by inter-

polating known gradient flow solutions using matrix exponentials. Next the related problem

of determining principal components of a symmetric matrix is discussed. A continuous-time

gradient flow is derived that leads to a discrete exponential interpolation of the continuous-time

flow which converges to the desired limit. A comparison to classical algorithms for the same

task is given. The third example discussed, this time drawn from the field of linear systems

theory, is the task of arbitrary pole placement using static feedback for a structured class of

linear systems.

The remainder of the thesis provides a review of the underlying theory relevant to the three

examples considered and develops a mathematical framework in which the proposed numerical

algorithms can be understood. This framework leads to a general form for a solution to any

optimization problem on a homogeneous space. An important consequence of the theoretical

review is that it develops the mathematical tools necessary to understand more sophisticated

numerical algorithms. The thesis concludes by proposing a quadratically convergent numerical

optimization method, based on the Newton-Raphson algorithm, which evolves explicitly on a

Lie-group.

v

vi

Contents

Acknowledgements � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � i

Statement of Originality � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ii

Abstract v

Glossary of Symbols � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � xi

1 Introduction 1

1.1 Historical Perspective � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 6

1.1.1 Dynamical Systems as Numerical Methods � � � � � � � � � � � � � � 6

1.1.2 Optimization Techniques and Numerical Solutions to Differential Equa-

tions � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 10

1.1.3 Linear Systems Theory and Pole Placement Results � � � � � � � � � � 17

1.2 Summary of Results � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 19

2 Numerical Gradient Algorithms for Eigenvalue Calculations 24

2.1 The Double-Bracket Algorithm � � � � � � � � � � � � � � � � � � � � � � � � 27

2.2 Step-Size Selection � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 32

2.3 Stability Analysis � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 37

2.4 Singular Value Computations � � � � � � � � � � � � � � � � � � � � � � � � � 41

vii

2.5 Associated Orthogonal Algorithms � � � � � � � � � � � � � � � � � � � � � � � 47

2.6 Computational Considerations � � � � � � � � � � � � � � � � � � � � � � � � � 51

2.7 Open Questions and Further Work � � � � � � � � � � � � � � � � � � � � � � � 52

2.7.1 Time-Varying Double-Bracket Algorithms � � � � � � � � � � � � � � 53

3 Gradient Algorithms for Principal Component Analysis 55

3.1 Continuous-Time Gradient Flow � � � � � � � � � � � � � � � � � � � � � � � � 57

3.2 A Gradient Descent Algorithm � � � � � � � � � � � � � � � � � � � � � � � � � 65

3.3 Computational Considerations � � � � � � � � � � � � � � � � � � � � � � � � � 69

3.3.1 An Equivalent Formulation � � � � � � � � � � � � � � � � � � � � � � 69

3.3.2 Pade Approximations of the Exponential � � � � � � � � � � � � � � � 70

3.4 Comparison with Classical Algorithms � � � � � � � � � � � � � � � � � � � � 71

3.4.1 The Power Method � � � � � � � � � � � � � � � � � � � � � � � � � � � 72

3.4.2 The Steepest Ascent Algorithm � � � � � � � � � � � � � � � � � � � � 75

3.4.3 The Generalised Power Method � � � � � � � � � � � � � � � � � � � � 76


4 Pole Placement for Symmetric Realisations 81

4.1 Statement of the Problem � � � � � � � � � � � � � � � � � � � � � � � � � � � 83

4.2 Geometry of Output Feedback Orbits � � � � � � � � � � � � � � � � � � � � � 90

4.3 Least Squares System Assignment � � � � � � � � � � � � � � � � � � � � � � � 93

4.4 Least Squares Pole placement and Simultaneous System Assignment � � � � � 102

4.5 Simulations � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 107

4.6 Numerical Methods for Symmetric Pole Placement � � � � � � � � � � � � � � 113

viii


5 Gradient Flows on Lie-Groups and Homogeneous Spaces 121

5.1 Lie-groups and Homogeneous Spaces � � � � � � � � � � � � � � � � � � � � � 123

5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits � � � � � � � � � � � � � 125

5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces � � � � � � � � � 127

5.4 Gradient Flows � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 130

5.5 Convergence of Gradient Flows � � � � � � � � � � � � � � � � � � � � � � � � 133

5.6 Lie-Algebras, The Exponential Map and the General Linear Group � � � � � � 135

5.7 Affine Connections and Covariant Differentiation � � � � � � � � � � � � � � � 141

5.8 Right Invariant Affine Connections on Lie-Groups � � � � � � � � � � � � � � � 144

5.9 Geodesics � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 148

6 Numerical Optimization on Lie-Groups and Homogeneous Spaces 155

6.1 Gradient Descent Algorithms on Homogeneous Spaces � � � � � � � � � � � � 157

6.2 Newton-Raphson Algorithm on Lie-Groups � � � � � � � � � � � � � � � � � � 161

6.3 Coordinate Free Newton-Raphson Methods � � � � � � � � � � � � � � � � � � 169

6.4 Symmetric Eigenvalue Problem � � � � � � � � � � � � � � � � � � � � � � � � 172


7 Conclusion 182

7.1 Overview � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 182

7.2 Conclusion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 184

ix

x

Glossary of Symbols

Linear Algebra, Sets and Spaces:

R The real numbers.

C The complex numbers.

N The natural numbers.

RN N -dimensional Euclidean space.

RN�M The set of all real N �M matrices, MN -dimensional Euclidean space.

C N�M The set of all complex N �M matrices, 4MN -dimensional Euclidean space.

B��x�, B� The ball of radius � � 0 around a point x � RN or the origin, B��x� � fy �RN j jjx� yjj � �g, B� � B��0�.

Sk�N� The set of all skew symmetric matrices f� � RN�N j �T � ��g.

S�N� The set of all symmetric matrices f� � RN�N j �T � �g.

GL�N�R�, GL�N� The general linear group of all real invertible N �N matrices.

GL�N� C � The general linear group of all complex invertible N �N matrices.

O�N� The set of N �N orthogonal matrices fU � RN�N j UTU � INg�

St(p� n) The Stiefel manifold of n�p orthogonal matrices, fX � Rn�p j XTX � Ipg�

Differential Geometry, Sets and Spaces:

Ck�M� The set of all at least k times differentiable functions from a manifold M to

the real numbers.

C��G� The set of all smooth functions from a manifold G to the real numbers.

Ck� C� The set of at least k times differentiable respectively smooth functions from

an understood set (usuallyRn) to the real numbers.

xi

TxM The tangent space of a manifold M at the point x �M .

TM The tangent bundle of M , the union over all x �M of each TxM .

T �xM The cotangent space of a manifold M at the point x � M . The cotangent

space is the dual space of linear functionals on the vector space TxM .

T �M The cotangent bundle of M , the union over all x �M of each T�xM .

D�G� The algebra of all smooth vector fields on a smooth manifold G.

D��G� The set of all smooth 1-forms fields on a smooth manifold G.

Sn The n-dimensional sphere inRn�1, fx � Rn�1 j xTx � 1g.

RPn n-dimensional real projective space, the set of all vector directions inRn�1.

G�H The quotient space of a group G by a normal subgroup H .

stab�X� The subgroup associated with a group action that leaves X fixed.

M�H0� The set of orthogonally similar matrices to H0.

Grass(p,n) The Grassmannian manifold of p-dimensional subspaces inRn.

gl�N�R� The Lie algebra associated with GL�N�R�. One has that gl�N�R� � RN�N

equipped with the matrix Lie-bracket �A�B� � AB � BA.

g, h The Lie-algebra associated with arbitrary Lie-groups G or H respectively.

gl�N� C � The Lie algebra associated with GL�N� C �, the set of all N �N matrices.

Linear Algebra, Notation and Operators:

IN The N �N identity matrix.

0N�M The N �M matrix with each element zero.

�i Eigenvalues.

�i Singular values.

Re�a� The real part of a � C .

xii

hx� yi The Euclidean inner product of x and y inRn.

�ij The Kronecker delta function, �ij � 0 if i �� j, �ij � 1 if i � j.

AT The transpose of a matrix A.

jjAjj, jjAjjF The Frobenius norm of A � Rn�m. One has jjAjj2F �Pn

i�1Pm

j�1 A2ij .

jjxjj2 The Euclidean two-norm of x � R.

jjAjj2 Matrix two-norm, supremum of jjAxjj for jjxjj2 � 1.

�A�B� The Lie-bracket of two matrices, �A�B� � AB �BA.

adiAB The adjoint operator on matrices, adiAB � A�

adi�1A B

��

adi�1A B

�A where

ad0AB � A.

fA�Bg The generalised Lie-bracket of two matrices fA�Bg � ATB � BTA.

tr�A� The trace of A � Rn�n.

diag��1� � � � � �n� The matrix with diagonal elements ��1� � � � � �n� and all other elements

zero.

vec�A� The vector operation that stacks the columns of a matrix A � Rm�n into a

vector inRnm.

kerT The kernel of a linear operator T .

dom T The domain of a linear operator T , defined as the subspace orthogonal to the

kernel of T with respect to a given inner product.

spfv1� � � � � vng The subspace generated by the span of the vectors v1� � � � � vn.

sp�A� The subspace generated by the span of the columns of A � Rn�n .

dimV The dimension of a subspace V � Rn.

dist The distance between two subspaces, the Frobenius norm of the residual pro-

jection operator.

O�h� Big O notation. A function f�h� is order O�h� when there exists B � 0 and

� � 0 such that jf�h�jh � B for all 0 � h � �.

xiii

o�h� Little o notation. A function f�h� is order o�h� when f�h� is order O�h� and

limh�0jf�h�jh � 0.

Differential Geometry, Notation and Operators:

Djx �� The Frechet derivative of a scalar function � C1�M� evaluated at x � M a

smooth manifold, in direction .

Tpf The tangent map associated with a function f : M � N between two man-

ifolds at the point p � M . One has Tpf : TpM � Tf�p�N , Tpf�X� :�

Df jp �X� the Frechet derivative of f .

df The differential of a function f : M � N between two manifolds. One has

df : TM � TN , df�Xp� � Tpf�Xp� where Xp � TpM .

�jJ The restriction of a map � : M � N between two sets M and N to a map

� : J � N where J �M .

J The closure of J �M in some topology on M .

g�� General notation for a Riemannian metric operating on tangent vectors and

�.

hh� �ii An explicit Riemannian metric operating on tangent vectors and �.

grad The gradient of a potential � C��G� on a Riemannian manifold G.

Z The derivation of � C��M�with respect to a smooth vector fieldZ � D�M�

on a manifold M . One has that Z�x� � Djx �Z�x�� for x � M and

Z�x� � TxM the value of Z at x.

�Y� Z� The Lie-bracket of two smooth vector fields Y� Z � D�M� for M a manifold.

LY Z The Lie-derivative of two vector fields Y� Z � D�M� for M a manifold. One

has that LY Z � �Y� Z�.

r An affine connection.

rY Z The action of an affine connection on (or covariant derivation of) a smooth

vector field Z � D�M� with respect to Y � D�M� on a manifold M .

xiv

Jx The Jacobi matrix of � C2�Rn� evaluated at a point x � Rn.

H� The Hessian of � C2�M� on a manifold M evaluated at a critical point � of

.

xv

Chapter 1

Introduction

The aim of the present work is to investigate a particular class of constrained optimization

problems, those where the constraint set is a smooth homogeneous manifold (embedded in

Euclidean space). Rather than rely on standard numerical techniques the approach taken is to

exploit the geometry of Lie-groups and homogeneous spaces. The advantages of this approach

are considerable especially in the areas of stability, robustness and flexibility of the algorithms

developed.

Optimization problems of the class considered are important in many fields of study. The

two areas from which the principal examples in the body of the thesis are drawn are the

fields of numerical linear algebra and linear systems theory. An advantage of considering

questions drawn from the field of numerical linear algebra is the degree of expertise in solving

such problems using classical techniques. This provides an excellent foundation on which to

develop new results as well as ensuring that there is a battery of existing numerical methods to

which proposed algorithms may be compared. In contrast, the field of linear systems theory

contains many important optimization problems for which no satisfaction solution is known.

Presently accepted solution methods tend to be awkward adaptations of numerical linear algebra

methods which do not exploit the natural structure of the problem. Many of these optimization

problems are of a form for which the methods developed in the this work are applicable.

As a consequence of the ad-hoc development of many of the existing algorithms in linear

systems theory there has been little or no effort to understand the particular requirements of

1

2 Introduction Chapter 1

numerical methods for engineering problems. The neglect of this aspect of a proper numerical

treatment of optimization problems in engineering is especially important when on-line and

adaptive processes are considered. In such processes, conventional numerical methods must be

augmented with additional check procedures and tests to guarantee robustness of the process.

Indeed, one should even consider whether the principal goals of classical numerical methods

are appropriate for numerical methods in an adaptive or on-line engineering application.

Following this line of reasoning further, it is instructive to consider a set of priorities suitable

for numerical methods which solve on-line and adaptive engineering applications. I believe the

following properties in some sense describe the characteristics desirable for such algorithms. It

should be mentioned that the algorithms considered are all recursive iterations whose limiting

value yields the solution of the problem considered and the properties mentioned below are

phrased with this in mind.

Simplicity, The algorithm should be simple in concept and flexible in implementation. The

relationship between the task performed and the computational method employed should

be easily understood.

Global Convergence, The method should converge to the desired solution for almost any initial

condition. In a sense this can be considered as a robustness and stability requirement

on the algorithm. Thus, algorithms should be highly robust with respect to noisy data

sequences and large deviations. Interestingly, this point is often an argument against

using iterative numerical methods that converge too quickly.

Constraint stability, The method should explicitly preserve the constraint set on which the

optimization problem is posed. If a numerical algorithm is implemented on-line it will

be running for a considerable period of time and it is imperative that the constraint be

maintained exactly to preserve the qualitative properties of the system.

Classical numerical optimization methods do not necessarily have these properties as

primary goals. Indeed, most classical algorithms are designed primarily to obtain the best

absolute accuracy with the least computational cost (implemented on a digital computer). For

example standard error analysis of a numerical method ensures that the numerical solution

obtained satisfies some absolute error bound. In contrast, the numerical algorithms considered

in later chapters are designed to exactly preserve the constraint while solving an optimization

x1-0 Introduction 3

problem robustly. The questions of computational cost and absolute accuracy are of secondary

importance to the properties mentioned above.

An important aspect of the properties outlined above is that they do not demand fast

convergence or high accuracy (other than to preserve the constraint set). Certainly, constraint

stability requires that only certain errors may occur (those that preserve the structure of the

problem), however, high accuracy within the constraint set is often not a necessity. For

example, if a computed linear feedback gain that stabilises a given plant, then probably any

nearby feedback will also stabilise the plant. However, if that computation is then used to

initialise a further iterate and introduces modelling errors into a process, these modelling errors

could accumulate and eventually cause more significant problems. The bursting phenomena

observed in early adaptive feedback algorithms provide a useful analogy. Fast convergence

properties within the constraint set are not necessarily desirable either. Indeed, if the algorithm

converges too quickly then it will tend to track input noise disturbances, whereas a scheme that

converges slowly will act somewhat like a low pass filter. In a practise one would like to have a

“knob” which adjusts the rate of convergence of a given algorithm (analogous to adjusting the

pass band of a filter). This is impossible for most classical algorithms, however, the algorithms

proposed in this thesis can all be sped up and slowed down to a certain degree.

In this thesis I propose recursive algorithms for solving a number of constrained opt-

imization problems which satisfy the properties outlined above. The principal algorithms

considered are based on the classical gradient descent algorithm but modified so that they ex-

plicitly preserve that constraint set. This is achieved by exploiting the geometry of Lie-groups

and homogeneous spaces, though the algorithms proposed can be understood without resorting

to deep theoretical results. The algorithms are closely related to certain continuous-time gra-

dient flows and dynamical systems that have been proposed recently by a number of authors

as potential numerical methods for engineering problems (cf. the recent monograph (Helmke

& Moore 1994b) for an excellent review of these developments). By designing numerical

algorithms based on these methods one brings dynamical systems solutions to engineering

problems one step closer to applications.

The modified gradient descent algorithms proposed display all the properties mentioned

above. In the case where there is a unique local (and global) minimum then the gradient

descent gradient algorithms will always converge to the desired minima. Gradient flows (and


also gradient algorithms) are robust to variations in initial conditions (Smale 1961). Since

the algorithms proposed are designed to explicitly preserve the constraint then by definition

they satisfy the property of constraint stability. Finally, the basic gradient descent algorithm

is the simplest optimization method available and the modifications considered are relatively

straightforward changes. Of course, there are applications where the linear convergence rate

associated with gradient descent algorithms is not sufficient for the problem considered. In this

case one must look at more sophisticated methods. The most direct quadratically convergent

method is the Newton-Raphson algorithm for determining the zeros of a gradient vector

field. Exploiting the geometry of Lie-groups again one can formulate a Newton-Raphson

algorithm directly on the constraint set. Unfortunately the region in which the Newton-

Raphson method will converge to the desired minimum is only a local neighbourhood of that

point. Thus, the Newton-Raphson method by itself does not satisfy the global convergence

property. Nevertheless, the method is useful in certain situations and can be implemented in

parallel with a modified gradient descent algorithm to guarantee robustness.

The potential applications of the theory expounded in this work are far reaching and

varied. Originating from recent dynamical systems studies of eigenvalue problems (Brockett

1988, Brockett 1991b, Chu & Driessel 1990, Helmke & Moore 1990) one may design iterative

gradient descent algorithms for the symmetric eigenvalue and singular value problems (cf.

Moore et al. (1994) and Chapter 2). These new algorithms do not aim to compete with state

of the art solutions to these problems. Rather, the symmetric eigenvalue and singular value

problems provide an environment in which to understand the new approach used in the context

of a well understood problem and compare the algorithms generated to classical methods.

Having, understood and developed the theory necessary to implement these methods in a

simple case one is confident in tackling problems in linear systems theory which have proved

amenable to dynamical systems solutions.

For example, Helmke (1993) provided a variational characterisation of several different

classes of balanced realisations for linear systems. Dynamical systems which compute balanced

realisations were proposed several authors (Perkins, Helmke & Moore 1990, Helmke, Moore

& Perkins 1994). The problem of computing balanced realizations can be numerically ill-

conditioned, especially when a plant is nearly uncontrollable/unobservable, and calls for special

numerical methods (Laub, Heath, Paige & Ward 1987, Safanov & Chiang 1989, Tombs &

x1-0 Introduction 5

Postlethwaite 1987) with good numerical properties. Gradient descent methods have good

numerical properties for dealing with ill-conditioned problems and offer an attractive alternative

to modifications of existing numerical linear algebra methods. Yan and Moore (1991) and later

Helmke and Moore (1994a) developed gradient dynamical-systems solutions to computing

balanced realizations and minimising the L2-sensitivity of a state-space realization of a given

transfer function. Yan et al. (1994) has developed a number of recursive algorithms based on

these dynamical systems for L2 sensitivity optimization as well as Euclidean norm balancing.

A generalisation of balancing and sensitivity minimisation for time varying plants, termed

“�-balancing”, is discussed by Imae, Perkins and Moore (1992). An important application of

the dynamical systems approach for balancing and sensitivity minimisation is in the design

of finite word length implementations of controllers. Recent work on designing digital state-

space systems which draws from these ideas is outlined in the articles (Li, Anderson, Gevers

& Perkins 1992, Madievski, Anderson & Gevers 1994). A state of the art discussion of many

of these issues is contained in the recent monograph by Gevers and Li (1993).

The area of balanced realizations and sensitivity minimisation is only one facet of the po-

tential applications of dynamical systems concepts to control theory. Brockett’s (1988) original

work lead him to consider a number of applications related to analogue computing. Brockett

(1989b) went on to show that dynamical systems can be used to realize general arithmetical

and logical operations. Least squares matching problems (Brockett 1989a, Brockett 1991a)

are also a natural application of the original development with practical relevance to computer

vision and statistical principal component analysis. The geometry of least squares and prin-

cipal component analysis was developed by a number of authors in the mid eighties (Bloch

1985a, Bloch 1985b, Byrnes & Willems 1986). An interesting application of these ideas to

the dynamical theory of learning in neural-networks was discussed by Brockett (1991a). This

work was based on Brockett’s own research along with recent developments in using the sin-

gular value decomposition to understand learning procedures (Bouland & Karp 1989, Baldi

& Hornik 1989). The work ties in closely with Oja’s (1982, 1989) results on neural-network

learning. Recently, Yan, Helmke and Moore (1994) have provided a rigourous analysis of Oja’s

learning equations. Numerical methods related to these problems are presented in Mahony,

Helmke and Moore (1994) (cf. Chapter 3).

More generally, Faybusovich (1992a) has developed dynamical-system solutions for com-


puting Pisarenko frequencies, used in certain signal processing applications. Similar techniques

provide new approaches to realization theory (Brockett & Faybusovich 1991). Another po-

tential application in signal processing is the digital quantization of continuous-time signals

considered the articles (Brockett 1989b, Brockett & Wong 1991). Yan, Teo and Moore (n.d.)

have also investigated using dynamical systems for computing LQ optimal output feedback

gains. This has motivated a number of authors (Sreeram, Teo, Yan & Li 1994, Mahony &

Helmke 1993) to use similar methods for difficult simultaneous stabilization problems that

have no classical solution. Preliminary results by Ghosh (1988) have tackled the simultaneous

stabilization problem using algebraic geometric methods, though more recent results (Blondel

1992, Blondel, Campion & Gevers 1993) indicate that the problem can not be solved exactly us-

ing algebraic operations and consequently, recursive methods offer one of the better numerical

approaches to obtain an approximation.

1.1 Historical Perspective

The material presented in this thesis builds primarily on the recent development of dynamical

systems solutions to certain linear algebraic problems. There is also dependence on classical

optimization theory from the last fifty years or so and more recent concepts of numerical stability

for computational integration methods. The pole placement results presented in Chapter 4 relate

to a considerable body of knowledge in linear systems theory developed since the seventies. To

provide a historical background for the work presented in this thesis the present section is split

into three subsections covering the fields of dynamical systems theory, numerical optimization

theory and linear systems theory. There is some overlap between these topics, especially since

the focus is on those developments which lead to the new results presented in the body of this

thesis.

1.1.1 Dynamical Systems as Numerical Methods

Much of the work covered in the following subsection is relatively recent and I know of only

one book (Helmke & Moore 1994b) that is devoted to the study of this topic with applications

to engineering. Nevertheless, there are several good review articles available which cover the

x1-1 Historical Perspective 7

early application of the Toda lattice to eigenvalue problems (Chu 1984a, Watkins 1984) and an

overview of continuous realization methods for traditionally numerical linear algebra problems

(Chu 1988).

Historically, the idea of solving a numerical problem by computing the limiting solution

of a continuous-time differential equation is not new. The accelerating development of digital

computers in the mid twentieth century tended to obscure the potential of such methods though

the study of analogue circuit design was still of interest for practical applications. In the cases

where dynamical solutions to certain problems were considered (for example Rutishauser

(1954, 1958) proposed dynamical systems for solving the symmetric eigenvalue problem) the

classical algorithms known today developed so quickly that the dynamical systems approach

was forgotten. More recently, digital techniques have improved to the point where many

traditionally analogue tasks are being performed digitally. Interestingly, there is a renewed

interest recently in analogue techniques, brought about perhaps by a feeling that the limits of

digital technology may be approaching.

The particular historical development of dynamical system solutions for numerical linear

algebra problems on which my work is based began with the study of a differential equation

proposed by Toda (1970). Toda’s original idea was to study the evolution of point masses in one

dimension related by an exponential attractive force. The differential equation that he proposed

became known as the Toda lattice and was extensively studied by a number of authors (Henon

1974, Flashka 1974, Flashka 1975, Moser 1975, Kostant 1979, Symes 1980a, Symes 1980b).

In Flashka (1974) a representation of the Toda lattice as an isospectral differential equation

on the set of tridiagonal1 symmetric matrices was developed. By isospectral it is understood

that the eigenvalues of the matrix solution to the Toda lattice remain constant for all time.

Moser (1975) extended this to show that a solution of the Toda lattice converges to a diagonal

matrix and thus provides a way in which to compute the eigenvalues of a tridiagonal symmetric

1Tridiagonal symmetric matrices are matrices of the form�BBBBBB�

�1 �1 0 � � � 0

�1 �2 �2. . .

...

0. . .

. . .. . . 0

.... . . �n�2 �n�1 �n�1

0 � � � 0 �n�1 �n

�CCCCCCAfor real numbers �1� � � � � �n and �1� � � � � �n�1.


matrix. Symes (1982) showed that the Toda lattice was in fact related to the classical QR

algorithm for the symmetric eigenvalue problem. This paper generated considerable interest

in dynamical systems solutions of numerical linear algebra problems and was followed by

several papers (Deift, Nanda & Tomei 1983, Watkins 1984, Chu 1984a, Chu 1984b, Nanda

1985, Shub & Vasquez 1987, Watkins & Elsner 1988) which generalise the initial connection

seen by Symes. Present day interest in the Toda flow is considerable with recent work into

developing a VLSI (very large scale integrated circuit) type implementation of the Toda flow

by a nonlinear lossless electrical network (Paul, Hueper & Nossek 1992) as well as its close

connection to the double bracket equation discussed below.

Prompted in part by the potential of the Toda lattice as a theoretical (and potentially

practical) tool in numerical-linear algebra several authors undertook to investigate more general

numerical methods in the context of dynamical systems. Ammar and Martin (1986) investigated

other standard matrix eigenvalue methods and showed a strong connection between both the

discrete-time and continuous-time Riccati equation. Their results were based in part on a

Lie-theoretic interpretation of the Riccati flow developed by Hermann and Martin (1982). A

complete phase portrait of the Riccati equation was given by Shayman (1986) while Helmke

(1991) has related the Riccati flow to Brockett’s double-bracket flow (Brockett 1991b). Articles

by Riddle (1984) on minimax problems for sums of eigenvalues and by Duistermaat, Kolk and

Varadarajan (1983) on flows constrained to evolve on flag manifolds should be mentioned since

both articles have proved useful references for many of the works mentioned below.

The double bracket equation

�H�t� � �H�t�� H�t��N �� H�0� � H0 �1�1�1�

and its properties were first studied by Brockett (1988, 1991b) (see also independent work by

Chu and Driessel (1990) and Chu (1991b)). Here H � HT � Rn�n and N � NT � Rn�n are

symmetric matrices. When H is tridiagonal and N is diagonal then (1.1.1) reduces to the Toda

lattice. Brockett showed that (1.1.1) defines an isospectral flow whose solution H�t�, under

suitable conditions on N , converges to a diagonal matrix. Brockett spoke of using (1.1.1) to

solve various combinatorial optimization tasks such as linear programming problems and the

sorting of lists of real numbers.


The double bracket equation was seen to be a fundamental generalisation of the Toda lattice

with many practical applications. Among the the diverse fields to which the double-bracket

equation appears to be relevant one finds applications to the travelling salesman problem and

quantization of continuous signals (Brockett 1989b, Brockett & Wong 1991). Least squares

matching and applications in computer vision are discussed in the paper (Brockett 1989a).

Chu and Driessel (1990) considered continuous-time solutions to structured inverse eigenvalue

problems along with matrix least squares problems. For applications in subspace learning see

the papers (Brockett 1991a, Yan, Helmke & Moore 1994). Stochastic versions of the double

bracket equation are studied the report (Colonius & Klieman 1990). An important connection

has been recognised between the double-bracket flow and the modern geometric approach to

linear programming pioneered by Khachian (1979) and Karmarkar (1984, 1990). Fundamental

work in this area has been carried out by a number of authors (Bayer & Lagarias 1989, Lagarias

& Todd 1990, Bloch 1990b, Faybusovich 1991a, Faybusovich 1991b, Helmke 1992).

A deep understanding of the double-bracket equation has been developed over the last few

years. The connection between the Toda lattice and the double-bracket flow is thoroughly

described by a series of papers (Bloch 1990a, Bloch, Brockett & Ratiu 1990, Bloch, Flaschka

& Ratui 1990). Lagarias (1991) shows certain monotonicity properties of sums of eigenvalues

of solutions of the Toda lattice. It is not surprising to find that the double-bracket equation

can be interpreted as a gradient flow on adjoint orbits of a compact Lie-group (Bloch, Brockett

& Ratiu 1990, Bloch, Brockett & Ratiu 1992). Indeed there is now an emerging theory

of completely integrable gradient and Hamiltonian flows associated with the double-bracket

equation (Faybusovich 1989, Bloch et al. 1992, Bloch 1990b). The paper by Faybusovich

(1992b) gives a complete phase portrait of the Toda flow and QR algorithm including a

discussion of structural stability.

The development of the double-bracket equation has been parallelled by a number of papers

which investigate the potential of dynamical systems solutions to numerical linear algebra prob-

lems. Watkins and Elsner (1989a, 1989b) considered both the generalised eigenvalue problem

and the singular value decomposition. The symmetric eigenvalue problem is discussed the

articles (Brockett 1988, Chu & Driessel 1990, Brockett 1991b). The singular value decompo-

sition has also been studied in detail (Smith 1991, Helmke & Moore 1990, Helmke et al. 1994).

The Jacobi method for minimising the norm of off-diagonal entries of a matrix is discussed by


Chu (1991a) with application to simultaneous diagonalization of multiple matrices. Chu and

Driessel (1991) have also looked at inverse eigenvalue problems which are related to recent

work in pole placement for classes of structured linear systems by Mahony and Helmke (1993)

(cf. Chapter 4).

Numerical methods based on the double-bracket equation have been discussed by Brockett

(1993) and Smith (1993). Numerical methods with connections to the dynamical systems

solutions for inverse singular value problems are discussed by Chu (1992) while numerical

methods for feedback pole placement within a class of symmetric state-space systems is

discussed in the conference paper (Mahony, Helmke & Moore 1993) (cf. Section 4.6).

1.1.2 Optimization Techniques and Numerical Solutions to Differential Equa-

tions

An early reference for optimization techniques is monograph (Aoki 1971) or the book by Lu-

enburger (1973). More recent material can be obtained in Dennis and Schnabel (1983) and the

recent review of state of the art methods (Kumar 1991). For recent developments in numerical

Hamiltonian integration methods see the review article (Sanz-Serna 1991). Relationships of

these developments to classical numerical integration techniques is contained in the review

(Stuart & Humphries 1994).

The problems considered in this thesis are constrained scalar optimization problems on

smooth manifolds without boundary. That is the problem of minimising (or maximising) a

function f : M � R from the constraint set M to the real numbers. There are strong con-

nections, however, with classical numerical linear algebra problems such as that of computing

the eigenvalues of a symmetric matrix. The tools employed are derived from a geometric un-

derstanding of the problems considered combined with methods from classical unconstrained

optimization theory and Lie-theory. Of course, the geometry of most problems drawn from the

field of numerical linear algebra is well understood. For example, a geometric understanding

of the symmetric eigenvalue problems is not new. Parlett and Poole (1973) first rigourously

analysed the classical QR, LU and power iterations in a geometric framework, though Bu-

urema (1970) had done preliminary work and the geometric structure of the problem must have

been known to many. A recent survey article is Watkins (1982). A geometric understanding of


the problem of determining a single eigenvector of a symmetric matrix was known long before

the general QR algorithm was understood. Indeed, steepest descent optimization techniques

for dominant eigenvector determination were proposed by Hestenes and Karush (1951). An

excellent discussion of the early optimization techniques for such problems is contained in

Faddeev and Faddeeva (1963).

Far from being a closed field there is still much interest in methods similar in scope, though

far advanced in technique (Auchmity 1991, Batterson & Smillie 1989, Batterson & Smillie

1990). For more general numerical linear algebraic techniques such as the QR algorithm it is

necessary to use the language of Grassmannians and Flag manifolds to further develop the early

work of Parlett and Poole (1973). A lot was done to understand these problems in the early

eighties in connection with studying the Toda flow (Symes 1982, Deift et al. 1983, Watkins

1984, Chu 1984a). Later Ammar and Martin (1986) analysed a number of matrix eigenvalue

methods using flows on Grassmannians and Flag manifolds and showed strong connections to

both the discrete-time and the continuous-time Riccati equations. The developing geometric

understanding of classical numerical linear algebra techniques lead to minimax style results for

the eigenvalues of matrices (Riddell 1984). These developments have resulted in a number of

elegant new proofs of matrix eigenvalue inequalities, for example Wieland-Hoffman inequality

(Chu & Driessel 1990) the Courant-Fischer minimax principal (Helmke & Moore 1994b, pg.

14) and Eckart-Young theorem (Helmke & Shayman 1992).

Applications of the double-bracket equation and dynamical systems theory to numerical

linear algebraic problems (Brockett 1988, Watkins & Elsner 1989a, Watkins & Elsner 1989b,

Smith 1991, Helmke & Moore 1990, Brockett 1991b, Helmke et al. 1994) has lead to the

design of numerical algorithms based explicitly on the dynamical systems developed. Recent

advances in such techniques are discussed in the articles (Chu 1992, Brockett 1993, Moore,

Mahony & Helmke 1994). These methods are essentially based on classical unconstrained

optimization methodologies reformulated on the constraint set.

Unconstrained scalar optimization techniques fall into roughly three categories (Aoki 1971)

i) Methods that use only the cost-function values.

ii) Methods that use first order derivatives of the cost function.

iii) Methods that use second (and higher) order derivatives of the cost function.


Methods of the first type tend not to be useful for other than linear search and non-smooth

optimization problems due to computational cost. An excellent survey of early techniques such

as pattern searches, relaxation methods, Rosenbrock and Powell’s methods as well as random

search methods and some other variations of these ideas is contained in Aoki (1971, section

4.7). Other good references for these methods are the books (Luenburger 1973, Minoux 1986).

Recent developments are discussed in the collection of articles (Kumar 1991).

The fundamental method of type ii) is the gradient descent method. For a potential

f : Rn � Rwith the gradient denoted Df � � �f�x1 � � � � �

�f�xn �

T the method of gradient descent

is

xk�1 � xk � skDf�xk�

where sk � 0 is some pre-specified sequence of real positive integers known as step-sizes. Here

the integer k indexs the iterations of the numerical algorithm acting like a discrete-time variable

for the solution sequence fxkg�k�0. A suitable choice of step-size sk is any sequence such that

sk � 0 as k �� andP�

k�1 sk ��. Polyak (1966) showed that provided f satisfies certain

convexity assumptions then the solution sequence of the gradient descent algorithm converges

to the minimum of f . The optimal gradient descent method is known as the method of steepest

descent (Cauchy 1847, Curry 1944) where the step-size is chosen at each step by

sk � arg mins�0

f�xk � sDf�xk��

Here “arg min” means to find the value of s that minimises f�xk � sDf�xk��. The method of

steepest descent has the advantage of being associated with strong global convergence theory

(Minoux 1986, Theorem 4.4, pg. 86). The step-size selection procedure is usually completed

using a linear search algorithm or using some estimation technique based on approximations of

f�xk�skDf�xk��. Using a linear search technique generally provides a faster but less reliable

algorithm while a good approximation technique will inherit the strong global convergence

theory of the optimal method. The disadvantage of the overall approach is the linear rate

of convergence of the solution sequence fxkg to the desired limit (even for optimal step-size

selection). Nevertheless, when the reliability and not the rate of convergence of an optimization

problem is at issue the steepest descent method or an approximate suboptimal gradient descent

method remains a preferred numerical algorithm.


There are a number of algorithm which improve on the convergence properties of the

steepest descent method. Of these only the Newton-Raphson method is important for the se-

quel, however, it is worth mentioning that multi-step methods, combining a series of estimates

xk�1� � � � � xk�p and derivativesDf�xk�1�� Df�xk�p� can be devised which converge with

superlinear, quadratic and higher orders of convergence, but which have much weaker conver-

gence results associated with them than the steepest descent methods. The most prominent of

these methods are the accelerated steepest descent methods (Forsythe 1968) and the method of

conjugate gradients (Fletcher & Reeves 1964).

The Newton-Raphson method falls into the third category and relies on the idea of approx-

imating the scalar function f�x� by its truncated Taylor series

f�x� � f�xk� � �x� xk�TDf�xk� � �x� xk�

TD2f�xk��x� xk��

where D2f�xk� is the square matrix with ij’th entry �2f�xk��xi�xj

. If f�x� is quadratic then this

approximation is exact and the optimal minimum can be found in a single step

x� � xk � �D2f�xk��1 Df�xk��

Of course, in general this will not be true but if the approximation is fairly good, one would

expect the residual error jjx� � xk�1jj to be of order2 O�jj�x� � xk�jj3�. Indeed, the Newton-

Raphson algorithm is the most natural algorithm that displays quadratic convergence proper-

ties. A disadvantage of the Newton-Raphson algorithm is the cost of determining the inverse

�D2f�xk��1 and a number of methods have been devised to reduce the computational cost of

this calculation. The most common of these are the Davidon-Fletcher-Powell approach (Davi-

don 1959, Fletcher & Powell 1963) and a rank-2 correction formula independently derived by

2The big O order notation, jjx� � xk�1jj is of order O�jj�x� � xk�jj3� means that there exists real numbers

B � 0 and � � 0 such thatjj�x� � xk�1�jj

jj�x�� xk�jj3� B�

for all jj�x� � xk�jj � �. If jjx� � xk�1jj is of order O�jj�x� � xk�jj3� then it follows using the little o order

notation that jjx� � xk�1jj is of order o�jj�x� � xk�jj2�,

limk��

jj�x� � xk�1�jj

jj�x� � xk�jj2� 0�

Thus the error bound at each step decreases like a quadratic function around the limit point. Methods with thisconvergence behaviour are known as quadratically convergent.


Broyden (1970), Fletcher (1970), Goldfarb (1970) and Shanno (1970). An excellent review of

these methods is provided by Minoux (1986, Chapter 4).

The approach to optimization described above is closely related to the task of numerically

approximating the solution of an ordinary differential equation. Indeed, the gradient descent

method is just the Euler method (Butcher 1987, Section 20) applied to determine the solution

of the gradient differential equation

�x � �Df�x�� x�0� � x0� �1�1�2�

where f : Rn � R (Euler’s original work is republished in the monograph (Euler 1913)). The

Euler method is rarely used in modern numerical analysis since it is only a first order method.

That is, the error between xk�1 and x�h; xk� (the solution of (1.1.2) with x�0� � xk evaluated

at time h) is o�h�,

limh�0

jjxk�1 � x�h; xk�jjh

� 0�

(Naturally, this translates to a linear convergence rate for the gradient descent method.) More

advanced numerical integration methods exist, the most common of which in engineering

applications are the Runge-Kutta methods (Butcher 1987, Section 22) or linear multi-step

methods (Butcher 1987, Section 23).

The idea of stability for a numerical approximation of the solution to an initial value

problem is usually described in terms of the ability of the numerical method to accurately

reproduce the behaviour of the continuous-time solution. Thus, if one is considering the scalar

linear differential equation

�x � qx� x�0� � x0 � C �1�1�3�

where q � C is a fixed complex number with real part Re�q� � 0, then the solution x�t�� 0

as t � 0. A numerical approximation to this problem is loosely said to be stable if the

approximation also converges to zero. A Runge-Kutta method, with step size h, is said to be A-

stable if the numerical solution of the scalar linear differential equation given above converges

to zero for any z � hq lying in the complex left half plane. Thus, for any real positive step-size

selection h � 0 and any linear system with Re�q� � 0 an A-stable Runge-Kutta method

solution of (1.1.3) will converge to zero. The concept of AN -stability captures the same

qualitative behaviour for non-autonomous linear systems (Burrage 1978). A strengthening of


the concept of A-stability for contractive numerical problems (cf. the review article (Stuart

& Humphries 1994)) termed B-stability was proposed by Butcher (1975) which can also be

generalised to non-autonomous systems (BN -stability) (Burrage & Butcher 1979). In this

paper Burrage and Butcher also introduced the important concept of “algebraic stability”

which they showed implied bothB- and BN -stability. Algebraic stability is a condition on the

parameters that define a Runge-Kutta method which has relevance to many different stability

problems (Stuart & Humphries 1994) and even to question of existence and uniqueness of

solutions to implicit Runge-Kutta methods (Cooper 1986). For systems with Re�q� �� 0

the continuous-time solution to (1.1.3) will converge very quickly to zero and one would

like this behaviour to be replicated in the numerical solution. A definition of L-stability

due to Ehle (1973) which captures this idea is also a strengthening of standard A-stability.

There are a number of useful numerical schemes that do not satisfy A-stability and weaker

definitions of stability are available, the most common of which are “A� �-stability” (Widlund

1967, Dahlquist 1978) (the numerical solution of (1.1.3) must converge to zero for any z � hq

with Re�z� � � , where � 0 is a small real number) and “stiff stability” (Gear 1968) (the

method must be stable for all z � fz � C j j arg��z�j � g for some small real number

� 0).

The unifying idea behind each of these stability definitions is the ability of the numeri-

cal method to replicate the properties of the continuous-time solution that is being approxi-

mated. The classical definitions of stability discussed above consider only simple convergence

behaviour of systems (A- and AN -stability for linear decay problems, L-stability for fast

convergence rates, B- and BN -stability for contractive problems). Another important class

of differential equations are those which preserve certain quantities, for example energy or a

Hamiltonian. Numerical methods for these two classes of problems (conservative and Hamilto-

nian systems) have been the subject of considerable research recently. Methods for conservative

systems are discussed in the articles (Greenspan 1974, Greenspan 1984). Methods for Hamilto-

nian systems are of more relevance to the present work. These methods can be divided roughly

into two types (Sanz-Serna 1991), firstly methods that are classical numerical differential equa-

tion solvers which happen also to preserve a Hamiltonian, and secondly methods which are

constructed explicitly from generating functions for solving Hamiltonian systems. The earlier

methods were based on generating functions (Ruth 1983, Channell 1983, Menyuk 1984, Feng

1985, Zhong & Marsden 1988). When it was observed that expressing these methods could


often be interpreted as numerical Runge-Kutta methods with particular properties people be-

came interested in exactly which Runge-Kutta methods would have the property of preserving

a Hamiltonian. This question was answered independently by a number of authors (Sanz-Serna

1988, Suris 1989, Lasagni 1988). Application of these ideas to engineering problems associ-

ated with equations of motion of a rigid body has been undertaken by Crouch, Grossman and

Yan (1992). Crouch, Grossman and Yan are also working on related integration techniques

for engineering problems (Crouch & Grossman 1994, Crouch, Grossman & Yan 1994). A

recent review article for Hamiltonian integration methods is Sanz-Serna (1991). Interestingly,

the characterisation of Runge-Kutta methods that preserve Hamiltonians is related to the alge-

braic construction first described when defining algebraic stability (Burrage & Butcher 1979).

Indeed, Stuart and Humphries (1994), describe a number of connections between early sta-

bility theory and modern numerical methods for Hamiltonian and conservative systems. In

Stuart and Humphries (1994) the concept of numerical stability, the question of whether, and

in what sense the dynamical properties of a continuous-time flow are inherited by a discrete

numerical approximation, is defined. This concept is sometimes termed “practical stability”

and is closely related to the definition of constraint stability given on page 2. I have opted

not to use the term numerical stability to describe the algorithms proposed in the sequel since

the optimization tasks considered require two types of numerical stability, preservation of a

constraint and convergence to a limit.

In certain cases the Toda lattice, double-bracket flow and related dynamical systems can be

interpreted as a completely integrable Hamiltonian flow (Bloch 1985a, Bloch et al. 1992, Bloch

1990b). In these cases one could think to apply the modern Hamiltonian integration techniques

discussed by Sanz-Serna (1991). To do this however, one would have to consider the various

differential equations as Hamiltonian flows on Rn and the insight gained by considering the

solution in matrix space would be lost.

Several authors have looked directly at discretizing flows on Lie-groups and homogeneous

spaces. Moser and Veselov (1991) considered discrete versions of classical mechanical systems.

Chu (1992) considered discrete methods for inverse singular problems based on dynamical

systems insights while Brockett (1993), Smith (1993) and Moore et al. (1994) have studied

more deliberate discretizations of gradient flows on Lie-groups and homogeneous spaces.


1.1.3 Linear Systems Theory and Pole Placement Results

Textbooks on feedback control and linear systems theory are those of Kailath (1980), Wonham

(1985) and Sontag (1990). An excellent reference for classical linear quadratic methods is the

book (Anderson & Moore 1971) or the more recent book (Anderson & Moore 1990). A recent

review article on developments in pole placement theory is Byrnes (1989).

The field of systems engineering during the mid seventies was the scene of a developing

understanding of the mathematical and geometric foundation of linear systems theory. Sem-

inal work by Kalman (1963) among others set a foundation of mathematical systems theory

which lead people naturally to use algebraic geometric tools to solve some of the fundamental

questions that arose. This lead to a strong geometric framework for linear systems theory

being developed in the late seventies and early eighties (Bitmead & Anderson 1977, Martin

& Herman 1979, Hazewinkel 1979, Byrnes, Hazewinkel, Martin & Rouchaleau 1980, Helmke

1984, Falb 1990). See also the conference proceedings (Martin & Hermann 1977b, Byrnes &

Martin 1980). The development of the Toda lattice was of considerable interest to researchers

working in linear systems theory in the late seventies and lead to several new developments in

scaling actions on spaces of rational functions in system theory (Byrnes 1978, Krishnaprasad

1979, Brockett & Krishnaprasad 1980). More recently Nakamura (1989) has showed a con-

nection between the Toda lattice and the study of moduli spaces of controllable linear systems.

Also Brockett and Faybusovich (1991) have made connections with realization theory.

One of the principal questions in linear systems theory that remained unanswered until

recently was the question of how the natural frequencies or poles of a multi-input multi-output

system are effected by changing feedback gain. In the case where the full state of a multi-input

multi-output state space system is available as output, Wonham (1967) showed that arbitrary

pole placement is equivalent to complete controllability of the system. The case for output

feedback (when only part of the state is available directly from the output) was found to be

far more difficult. Indeed, even after the theory of optimal linear quadratic methods was far

advanced (Anderson & Moore 1971) an understanding of the output feedback pole placement

problem remained elusive. A few preliminary results on pole shifting were obtained in the

early seventies (for example Davison and Wang (1973)) which lead to the first important result,

obtained independently by Davison and Wang (1975) and Kimura (1975). Given a linear


system with n states, m inputs and p outputs, the result stated that for almost all controllable

and observable linear state-space systems for which

m� p� 1 n�

the poles of that system could be almost arbitrarily changed using output feedback.

In 1977 Herman and Martin published a pair of articles (Hermann & Martin 1977, Martin

& Hermann 1977a) which used the dominant morphism theorem to show that mp n is a

necessary and sufficient condition for output feedback pole placement if one allows complex

gain matrices. Observe that if m, p 1 then mp m� p � 1 and thus the results obtained

by Hermann and Martin are stronger than those obtained earlier apart from the disadvantage

of requiring complex feedback. Unfortunately, their results don’t generalise to real feedback

gains though it was hoped that the condition mp n would also be a necessary and sufficient

for real output feedback pole placement. However, Willems and Hesselink (1978) soon gave a

counter example (m � 2, p � 2, n � 4) showing that the strict inequality mp � n could not

be achieved for arbitrary pole placement using real feedback.

The case mp � n was studied by Brockett and Byrnes (1979, 1981) using tools from

algebraic geometry and constructions on Grassmannian manifolds. By using these ideas

Brockett and Byrnes generalised Nyquist and Root locus plots to multi-input and multi-output

systems, however though useful, their results only applied in the case mp � n and fell short

of completely characterising the pole placement map in this case also. In Byrnes (1983)

the Ljusternik-Snivel’mann category of real Grassmannians is used to improve on Kimura’s

original result. There have been no other significant advances in dealing with this problem

during the mid eighties. A recent review article (Byrnes 1989) outlines the early results as well

as describing the state of the art towards the end of the eighties.

Recently Wang and Rosenthal have made new contributions to the problem of output

feedback pole placement (Wang 1989, Rosenthal 1989, Rosenthal 1992). Most recently Wang

(1992) has given a necessary and sufficient condition for pole placement using the central

projection model. Given a linear system with n states,m inputs and p outputs Wang has shown

that arbitrary output feedback pole placement is possible for any strictly proper controllable

and observable plant with mp � n. If the plant is only proper then almost arbitrary pole

x1-2 Summary of Results 19

placement is possible. The case mp � n is still not fully understood.

Little has been done to study classes of linear systems and the pole placement map. In

Martin and Herman (1977a) pole placement for linear Hamiltonian systems was considered.

More recently Mahony et al. (1993) (cf. Chapter 4) studied pole placement for symmetric

state-space systems. Simultaneous pole placement for multiple systems is also a problem that

has had little study. Ghosh (1988) has written a paper on this topic using algebro-geometric

techniques and recently Blondel (1992) and Blondel, Campion and Gevers (1993) have also

contributed. Such problems can also be tackled using the ideas outlined by Mahony and Helmke

(1993) (cf. Chapter 4). The development of efficient numerical methods for pole placement by

output feedback is a challenge. Methods from matrix calculus have been applied by Godbout

and Jordan (1989) and more recently, gradient descent methods have been proposed (Mahony

et al. 1993) (cf. Section 4.6).

1.2 Summary of Results

The thesis is divided into seven chapters. Chapter 1 provides an overview of the subject matter

considered. Chapters 2 to 4 consider three example optimization problems in detail. The first

problem discussed is a smooth optimization problem which can be used to solve the symmetric

eigenvalue problem. A considerable amount is known about the continuous-time gradient

dynamical systems associated with this optimization problem and the development builds on

this knowledge to generate a recursive numerical algorithm. The next problem considered is an

optimization problem related to principal component analysis. A discussion of the continuous-

time gradient flow is given before a numerical algorithm is developed. The connection of

the numerical method proposed and classical numerical linear algebraic algorithms for the

same task is investigated. The third example, drawn from the field of linear systems theory,

is the task of pole placement for the class of symmetric linear systems. A discussion of the

geometry of the task is undertaken yielding results with the flavour of traditional pole placement

results. Continuous-time gradient flows are derived and used to investigate the structure of

the optimization problem. A numerical method is also proposed based on the continuous-time

gradient flow.

The latter chapters approach the subject from a theoretical perspective. In Chapter 5


a theoretical foundation is laid in which the algorithms proposed in Chapters 2 to 4 may be

understood. Chapter 6 goes on to consider the particular numerical algorithms proposed in detail

and provides a template for designing numerical optimization algorithms for any constrained

optimization problem on a homogeneous space. Later in Chapter 6 a more sophisticated

numerical algorithm based on the Newton-Raphson algorithm is developed in a general context.

The algorithm is applied to a specific problem (the symmetric eigenvalue problem) to provide

an example of how to use the theory in practise. Concluding remarks are contained in Chapter

7.

The principal results contained in Chapters 2 to 6 are summarised below.

Chapter 2: In this chapter a numerical algorithm, termed the the double-bracket algorithm

Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��

is proposed for computing the eigenvalues of an arbitrary symmetric matrix. For suitably small

k , termed time-steps, the algorithm is an approximation of the solution to the continuous-

time double-bracket equation. Since the matrix exponential of a skew symmetric matrix is

orthogonal it follows that this iteration has the important property of preserving the spectrum

of the iterates. That is the eigenvalues of Hk remain constant for all k. By choosing a suitable

diagonal target matrix N the sequence Hk will converge to a diagonal matrix from which the

eigenvalues ofH0 can be directly determined. To ensure that the algorithm converges a suitable

step-size k must be chosen at each step. Two possible choices of schemes are presented

along with analysis showing that the algorithm converges to the desired matrix for almost all

initial conditions. A related algorithm for determining the singular values of an arbitrary (not

necessarily square) matrix is proposed and is shown to be equivalent to the double-bracket

equation applied to an augmented symmetric system. An analysis of convergence behaviour

showing linear convergence to the desired limit points is presented. Associated with the main

algorithms presented for the computation of the eigenvalues or singular values of matrices are

algorithms evolving on Lie-groups of orthogonal matrices which compute the full eigenspace

decompositions of given matrices.

The material presented in this chapter was first published in the conference article (Moore,

Mahony & Helmke 1992). A journal paper based on an expanded version of the conference


paper is to appear this year (Moore et al. 1994).

Chapter 3: In this chapter an investigation is undertaken of the properties of Oja’s learning

equation

�X � XXTNX �NX� N � NT � Rn�n �

evolving on the set of matrices fX � Rn�m j XTX � Img, the Stiefel manifold of real

n �m matrices where n m are integers. This differential equation was proposed by Oja

(1982, 1989) as a model for learning in certain neural networks. Explicit proofs of convergence

for the flow are presented which extend the results in Yan et al. (1994) so that no genericity

assumption is required on the eigenvalues of N . The homogeneous nature of the Stiefel

manifold allows one to develop an explicit numerical method (a discrete-time system evolving

on the Stiefel manifold) for principal component analysis. The method is based on a modified

gradient ascent algorithm for maximising the scalar potential

RN �X� � tr�XTNX�

known as the generalised Rayleigh quotient. Proofs of convergence for the numerical algorithm

proposed are given as well as some modifications and observations aimed at reducing the

computational cost of implementing the algorithm on a digital computer. The discrete method

proposed is similar to the classical power method and steepest ascent methods for determining

the dominant p-eigenspace of a matrix N . Indeed, in the case where p � 1 (for a particular

choice of time-step) the discretization is shown to be equivalent to the power method. When

p � 1, however, there are subtle differences between the power method and the proposed

method.

The chapter is based on the journal paper (Mahony, Helmke & Moore 1994). Applications

of the same ideas have also been considered in the field of linear programming (Mahony &

Moore 1992).

Chapter 4: In this chapter, the task of pole placement is considered for a structured class

of systems (those with symmetric state space realisations) for which, to my knowledge, no

previous pole placement results are available. The assumption of symmetry of the realisation,

besides having a natural network theoretic interpretation, simplifies the geometric analysis

considerably. It is shown that a symmetric state space realisation can be assigned arbitrary


(real) poles via symmetric output feedback if and only if there are at least as many system inputs

as states. This result is surprising since a naive counting argument (comparing the number of

free variables 12m�m � 1� of symmetric output feedback gain to the number of poles n of a

symmetric realization having m inputs and n states) would suggest that 12m�m � 1� n is

sufficient for pole placement. To investigate the problem further gradient flows of least squares

cost criteria (functions of the matrix entries of realisations) are derived on smooth manifolds

of output feedback equivalent symmetric realisations. Limiting solutions to these flows occur

at minima of the cost criteria and relate directly to finding optimal feedback gains for system

assignment and pole placement problems. Cost criteria are proposed for solving the tasks of

system assignment, pole placement, and simultaneous multiple system assignment.

The theoretical material contained in Sections 4.1 to 4.4 along with the simulations in

Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the numerical

method proposed in Section 4.6 was presented at the 1993 Conference on Decision and Control

(Mahony et al. 1993). Much of the material presented in this chapter was developed in

conjunction with the results contained the monograph (Helmke & Moore 1994b, Section 5.3),

which focusses on general linear systems.

Chapter 5: In this chapter a brief review of the relevant theory associated with developing

numerical methods on homogeneous spaces is presented. The focus of the development is

on classes of homogeneous spaces encountered in engineering applications and the simplest

theoretical constructions which provide a mathematical foundation for the numerical methods

proposed. A discussion is given of the relationship between gradient flows on Lie-groups and

homogeneous spaces (related by a group action) which motivates the choice of a particular

Riemannian structure for a homogeneous space. Convergence behaviour of gradient flows

is also considered. The curves used in constructing numerical methods in Chapters 2 to 4

are all based on matrix exponentials and the theory of the exponential map as a Lie-group

homomorphism is reviewed to provide a theoretical foundation for this choice. Moreover, a

characterisation of the geodesics associated with the Levi-Civita connection (derived from a

given Riemannian metric) is discussed and conditions are given on when the matrix exponential

maps to a geodesic curve on a Lie-group. Finally, an explicit discussion of the relationship

between geodesics on Lie-groups and homogeneous spaces is given.

Much of the material presented is standard or at least easily accessible to people working


in the fields of Riemannian geometry and Lie-groups. However, this material is not standard

knowledge for researchers in the field of systems engineering. Moreover, the development

strongly emphasizes the aspects of the general theory that is relevant to problems in linear

systems theory.

Chapter 6: In this chapter the gradient descent methods developed in Chapters 2 to 4 are

reviewed in the context of the theoretical developments of Chapter 5. The conclusion is that

the proposed algorithms are modified gradient descent algorithms where geodesics are used to

replace the straight line interpolation of the classical gradient descent method. This provides a

template for a simple numerical approach suitable for solving any scalar optimization problem

on a homogeneous space. Later in Chapter 6 a coordinate free Newton-Raphson method is

proposed which evolves explicitly on a Lie-group. This method is proposed in a general form

with convergence analysis and then used to generate a quadratically convergent numerical

method for the symmetric eigenvalue problem. A comparison is made to the QR algorithm

applied to an example taken from Golub and Van Loan (1989, pg. 424) which shows that the

Newton-Raphson method proposed converges in the same number of iterations as the classical

QR method.

Chapter 2

Numerical Gradient Algorithms for

Eigenvalue Calculations

A traditional algebraic approach to determining the eigenvalue and eigenvector structure of an

arbitrary matrix is theQR algorithm. In the early eighties it was observed that theQR algorithm

is closely related to a continuous-time differential equation which had become known through

study of the Toda lattice. Symes (1982), and Deift et al. (1983) showed that for tridiagonal

real symmetric matrices, the QR algorithm is a discrete-time sampling of the solution to a

continuous-time differential equation. This result was generalised to full complex matrices by

Chu (1984a), and Watkins and Elsner (1989b) provided further insight in the late eighties.

Brockett (1988) studied dynamical matrix flows generated by the double Lie-bracket1

equation,

�H � �H� �H�N �� H�0� � H0�

for constant symmetric matrices N and H0. This differential equation is termed the double-

bracket equation, and solutionsof this equation are termed double-bracket flows. Similar matrix

differential equations appear earlier than those references given above in Physics literature. An

1The Lie-bracket of two square matrices X , Y � Rn�n is

�X�Y � � XY � Y X�

If X � XT and Y � Y T are symmetric matrices then �X�Y �T � ��X�Y � is a skew symmetric matrix.

24

x2-0 Introduction 25

example, is the Landau-Lifschitz-Gilbert equation of micromagnetics

d �m

dt�

�

1 � 2 � �m�H � �m� � �m�H�� j �mj2 � 1�

as � � and �� k, a constant. In this equation �m�H � R3 and the cross-product

is equivalent to a Lie-bracket operation. The relationship between this type of differential

equation and certain problems in linear algebra, however, has only recently been investigated.

An important property of the double-bracket equation is that its solutions have constant

spectrum (i.e. the eigenvalues of a solution remain the same for all time) (Chu & Driessel

1990, Helmke & Moore 1994b). By suitable choice of the matrix parameter N Brockett (1988)

showed that the double-bracket flow can be used to diagonalise real symmetric matrices (and

hence compute their eigenvalues), sort lists, and even to solve linear programming problems.

In independent work by Driessel (1986), Chu and Driessel (1990), Smith (1991) and Helmke

and Moore (1990), a similar gradient flow approach was developed for the task of computing

the singular values of a general non-symmetric, non-square matrix. The differential equation

obtained in these approaches is almost identical to the double-bracket equation. In Helmke

and Moore (1990), it is shown thatthese flows can also be derived as special cases of the

double-bracket equation for a non-symmetric matrix, suitably augmented to be symmetric.

When the double-bracket equation is viewed as a dynamical solution to linear algebra

problems (Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b) one is lead naturally

to consider numerical methods based on the insight provided by the double-bracket flow. In

particular, the double-bracket flow evolves on a smooth submanifold of matrix space, the

set of all symmetric matrices with a given spectrum (Helmke & Moore 1994b, pg. 50). A

numerical method with such a property is termed constraint stable (cf. page 2) Such methods

are particularly of interest when accuracy or robustness of a given computation is an important

consideration. Robustness is of particular interest for engineering applications where input data

will usually come with added noise and uncertainty. As a consequence when one considers

numerical approximation of solutions to the double-bracket equation it is important to study

those methods which preserve the important structure of the double-bracket flow.

For the particular problem of determining the eigenvalues of a symmetric matrix, there

are many well tested and fast numerical methods available. It is not so much to challenge

26 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2

established algorithms in speed or efficiency that one would study numerical methods based

on the double-bracket equation. Rather, with the developing theoretical understanding of a

number of related differential matrix equation (many of which with important applications in

linear systems theory, for example the area of balanced realizations (Imae, Perkins & Moore

1992, Perkins et al. 1990) one may look upon a detailed study of numerical methods based

on the double-bracket flow as providing a stepping stone to a new set of robust and adaptive

computational methods in linear systems theory.

The material presented in this chapter was first published in the conference article (Moore

et al. 1992). A journal paper based on an expanded version of the conference paper is to appear

this year (Moore et al. 1994).

In this Chapter, I propose a numerical algorithm, termed the the double-bracket algorithm,

for computing the eigenvalues of an arbitrary symmetric matrix,


For suitably small k, termed time-steps, the algorithm is an approximation of the solution to the

continuous-time double-bracket equation. Since the matrix exponential of a skew symmetric

matrix is orthogonal it seen that this iteration has the important property of preserving the

spectrum of the iterates. It is shown that for suitable choices of time-steps the double-bracket

algorithm inherits the same equilibria and limit points as the double-bracket flow and displays

linear convergence to its limit. A related algorithm for determining the singular values of an

arbitrary (not necessarily square) matrix is proposed and is shown to be equivalent to the double-

bracket equation applied to an augmented symmetric system. An analysis of convergence

behaviour showing linear convergence to the desired limit points is presented. Associated with

the main algorithms presented for the computation of the eigenvalues or singular values of

matrices are algorithms which compute the full eigenspace decompositions of given matrices.

These algorithms also display linear convergence to the desired limit points.

The chapter is divided into seven sections. In Section 2.1 the double-bracket algorithm is

introduced and the basic convergence results are presented. Section 2.2 deals with choosing

step-size selection schemes, and proposes two valid methods for generating the time-steps

k . Section 2.3 discusses the question of stability and proves that the double-bracket equation

x2-1 The Double-Bracket Algorithm 27

has a unique attractive fixed point under assumptions that both the step-size selection schemes

proposed in Section 2.2 satisfy. The remainder of the chapter deals with computing the singular

values of an arbitrary matrix, Section 2.4 and computing the full spectral decomposition of

symmetric (or arbitrary) matrices Section 2.5. A number of computational issues are briefly

mentioned in Section 2.6 and Section 2.7 considers some remaining open issues.

2.1 The Double-Bracket Algorithm

In this section a brief review of the continuous-time double-bracket equation is given with

emphasis on its interpretation as a gradient flow. The double-bracket algorithm is introduced

and conditions are given which guarantee convergence of the algorithm to the desired limit

point.

Let N and H be real symmetric matrices, and consider the potential function

��H� :� jjH �N jj2 (2.1.1)

� jjH jj2 � jjN jj2 � 2tr�NH��

where the norm used is the Frobenius norm

jjX jj2 :� tr�XTX� �X

x2ij �

with xij the elements of X . Note that ��H�measures the least squares difference between the

elements of H and the elements of N . Let M�H0� be the set of orthogonally similar matrices,

generated by some symmetric initial condition H0 � HT0 � Rn�n. Then

M�H0� � fUTH0U j U � O�n�g� �2�1�2�

where O�n� denotes the group of all n�n real orthogonal matrices. It is shown in Helmke and

Moore (1994b, pg. 48) that M�H0� is a smooth compact Riemannian manifold with explicit

forms given for its tangent space and Riemannian metric. Furthermore, in the articles (Bloch,

Brockett & Ratiu 1990, Chu & Driessel 1990) the gradient of ��H�, with the respect to the


normal Riemannian metric2 on M�H0� (Helmke & Moore 1994b, pg. 50), is shown to be

grad��H� � ��H� �H�N ��. Consider the gradient flow given by the solution of

�H � �grad��H� (2.1.3)

� �H� �H�N �� with H�0� � H0�

which is termed the double-bracket flow (Brockett 1988, Chu & Driessel 1990). Thus, the

double-bracket flow is a gradient flow which acts to decrease, or minimise, the least squares

potential �, on the manifold M�H0�. Note that from (2.1.1), this is equivalent to increasing,

or maximising, tr�NH�. The matrix H0 is termed the initial condition, and the matrix N is

referred to as the target matrix.

The double-bracket algorithm proposed in this chapter is,

Hk�1 � e��k �Hk�N �Hke�k�Hk�N �� 2�1�4�

for arbitrary symmetric n� n matrices H0 and N , and some suitably small scalars k , termed

time-steps. Consider the curve Hk�1�t� � e�t�Hk�N �Hket�Hk �N � where Hk�1�0� � Hk and

Hk�1 � Hk�1� k�, the �k � 1�’th iteration of (2.1.4). Observe that

d

dt�e�t�Hk�N �Hke

t�Hk�N ��

��t�0

� �Hk� �Hk� N ��

and thus, e�t�Hk�N �Hket�Hk�N � is a first approximation of the double-bracket flow at Hk �

M�H0�. It follows that for small k , the solution to (2.1.3) evaluated at time t � k with

H�0� � Hk, is approximately Hk�1 � Hk�1� k�.

It is easily seen from above that stationary points of (2.1.3) will be fixed points of (2.1.4). In

general, (2.1.4) may have more fixed points than just the stationary points of (2.1.3), however,

Proposition 2.1.5 shows that this is not the case for suitable choice of time-step k. The term

equilibrium point is used to refer to fixed points of the algorithm which are also stationary

points of (2.1.3).

To implement (2.1.4) it is necessary to specify the time-steps k . This is accomplished by

2A brief discussion of the derivation of gradient flows on Riemannian manifolds is given in Sections 5.3 and 5.4.


considering functions N : M�H0� � R� and setting k :� N �Hk�. The function N is

termed the step-size selection scheme.

Condition 2.1.1 Let N : M�H0� � R� be a step-size selection scheme for the double-

bracket algorithm on M�H0�. Then N is well defined and continuous on all of M�H0�,

except possibly those points H � M�H0� where HN � NH . Furthermore, there exist real

numbers B� � � 0, such that B � N �H� � for all H �M�H0� where N is well defined.

Remark 2.1.2 The variable step-size selection scheme proposed in this chapter is discontinu-

ous at all the points H �M�H0�, such that �H�N � � 0. �

Remark 2.1.3 Observe that the definition of a step-size selection scheme depends implicitly

on the matrix parameter N . Indeed, N may be thought of as a function in two matrix variables

N and H . �

Condition 2.1.4 Let N be a diagonal n� n matrix with distinct diagonal entries �1 � �2 �

� � � � �n.

Let �1 � �2 � � � � � �r be the eigenvalues of H0 with associated algebraic multiplicities

n1� � � � � nr satisfyingPr

i�1 ni � n. Since H0 is symmetric, the eigenvalues of H0 are all real

and the diagonalisation of H0 is

:�

��1In1 0

.... . .

...

0 �rInr

� � �2�1�5�

where Ini is the ni � ni identity matrix. For generic initial condition H0 and a target matrix

N that satisfies Condition 2.1.4, the continuous-time equation (2.1.3) converges exponentially

fast to (Brockett 1988, Helmke & Moore 1994b). Thus, the eigenvalues of H0 are the

diagonal entries of the limiting value of the solution to (2.1.3). The double-bracket algorithm

behaves similarly to (2.1.3) for small k and, given a suitable step-size selection scheme,

should converge to the same equilibrium as the continuous-time equation.


Proposition 2.1.5 LetH0 andN ben�n real symmetric matrices whereN satisfies Condition

2.1.4. Let��H� be given by (2.1.1) and let N : M�H0�� R� be a step-size selection scheme

that satisfies Condition 2.1.1. For Hk �M�H0�, let k � N�Hk� and define

��Hk� k� :� ��Hk�1�� Hk�� 2�1�6�

where Hk�1 is given by (2.1.4). Suppose

� ��Hk� k� � 0� when �Hk� N � �� 0� (2.1.7)

Then:

a) The iterative equation (2.1.4) defines an isospectral (eigenvalue preserving) recursion

on the manifold M�H0�.

b) The fixed points of (2.1.4) are characterised by matrices H �M�H0� satisfying

�H�N � � 0� �2�1�8�

c) Every solution Hk, for k � 1� 2� � � �, of (2.1.4), converges as k � �, to some H� �M�H0� where �H�� N � � 0.

Proof To prove part a), note that the Lie-bracket �H�N �T � ��H�N � is skew-symmetric. As

the exponential of a skew-symmetric matrix is orthogonal, (2.1.4) is an orthogonal similarity

transformation of Hk and hence is isospectral.

For part b) note that if �Hk� N � � 0, then by direct substitution into (2.1.4) thenHk�1 � Hk .

Thus Hk�l � Hk for l 1, and Hk is a fixed point of (2.1.4). Conversely if �Hk� N � �� 0,

then from (2.1.7), ��Hk� k� �� 0, and thus Hk�1 �� Hk. By inspection, points satisfying

(2.1.8) are stationary points of (2.1.3), and indeed are known to be the only stationary points

of (2.1.3) (Helmke & Moore 1994b, pg. 50). Thus, the fixed points of (2.1.4) are equilibrium

points, in the sense that they are all stationary points of (2.1.3). In order to prove part c) the

following lemma is required.


Lemma 2.1.6 Let N satisfy Condition 2.1.4 and N satisfy Condition 2.1.1 such that the

double-bracket algorithm satisfies (2.1.7). The double-bracket algorithm, (2.1.4), has exactly

n!�Qri�1�ni!� distinct equilibrium points in M�H0�. These equilibrium points are charac-

terised by the matrices �T�, where � is an n � n permutation matrix, a rearrangement of

the rows of the identity matrix, and is given by (2.1.5).

Proof Note that part b) of Proposition 2.1.5 characterises equilibrium points of (2.1.4) as

H �M�H0� such that �H�N � � 0. Evaluating this condition component wise, forH � fhijg,

gives

hij��j � �i� � 0�

and hence by Condition 2.1.4, hij � 0 for i �� j. Using the fact that (2.1.4) is isospectral, it

follows that equilibrium points are diagonal matrices which have the same eigenvalues as H0.

Such matrices are distinct, and can be written in the form �T�, for � an n � n permutation

matrix. A simple counting argument yields the number of matrices which satisfy this condition

to be n!�Qri�1�ni!�.

Consider for a fixed initial condition H0, the sequence fHkg generated by the double-

bracket algorithm. Observe that condition (2.1.7) implies that ��Hk� is strictly monotonic

decreasing for all k where �Hk� N � �� 0. Also, since � is a continuous function on the compact

set M�H0�, then � is bounded from below, and ��Hk� will converge to some non-negative

value ��. As ��Hk�� then ��Hk� k�� 0.

For an arbitrary positive number �, define the open set D� � M�H0�, consisting of all

points of M�H0�, within an � neighbourhood of some equilibrium point of (2.1.4). The set

M�H0��D� is a closed, compact subset ofM�H0� on which the matrix functionH � �H�N �

does not vanish. As a consequence, the difference function (2.1.6) is continuous and strictly

negative on M�H0� � D�, and thus, can be over bounded by some strictly negative number

�1 � 0. Moreover, as��Hk� k�� 0 then there exists a K � K��1� such that for all k � K

then 0 ��Hk� k� � �1. This ensures that Hk � D� for all k � K. In other words, Hk is

converging to some subset of possible equilibrium points.

Imposing the upper bound B on the step-size selection scheme N , Condition 2.1.4,

it follows that N �Hk��Hk� N � � 0 as k � �. Thus, e�N �Hk��Hk�N � � I , the identity

matrix, and hence, e��N �Hk��Hk �N �Hke�N �Hk��Hk�N � � Hk as k � �. As a consequence


jjHk�1 �Hkjj � 0 for k �� and this combined with the distinct nature of the fixed points,

Lemma 2.1.6, and the partial convergence already shown, completes the proof.

2.2 Step-Size Selection

The double-bracket algorithm (2.1.4) requires a suitable step-size selection scheme before it

can be implemented. To generate such a scheme, one can use the potential (2.1.1) as a measure

of the convergence of (2.1.4) at each iteration. Thus, one chooses each time-step to maximise

the absolute change in potential j��jof (2.1.6), such that�� 0. Optimal time-steps can be

determined at each step of the iteration by completing a line search to maximise the absolute

change in potential as the time-step is increased. Line search methods, however, involve high

computational overheads and it is preferable to to obtain a step-size selection scheme in the

form of a scalar relation depending on known values.

Using the Taylor expansion, ��Hk� �� is expressed as a linear term plus a higher order

error term in a general time-step � . By estimating the error term one obtains a mathe-

matically simple function ��U�Hk� �� which is an upper bound to ��Hk� �� for all � .

Choosing a suitable time-step, k, based on minimising��U the actual change in potential,

��Hk� k� � ��U�Hk� k� � 0, will satisfies (2.1.7). Due to the simple nature of the

function ��U , there is an explicit form for the time-step k depending only on Hk and N .

Lemma 2.2.1 For the k’th step of the recursion (2.1.4) the change in potential

��Hk� �� of (2.1.6), for a time-step � is

��Hk� �� 2� jj�Hk� N �jj2� 2� 2tr�NR2�� 2�2�1�

with

R2�� :�Z 1

0�1� s�H ��

k�1�s��ds� �2�2�2�

where H��k�1�� is the second derivative of Hk�1�� with respect to � .

Proof Let Hk�1�� be the �k � 1�’th recursive estimate for an arbitrary time-step � . Thus

Hk�1�� e�� Hk�N �Hke� �Hk�N �. It is easy to verify that the first and second time derivatives

x2-2 Step-Size Selection 33

of Hk�1 are exactly

H �k�1�� Hk�1�� Hk� N ��

H ��k�1�� Hk�1�� Hk� N �� Hk� N ��

Applying Taylor’s theorem, then

Hk�1�� Hk�1�0� � �d

d�Hk�1�0� � � 2

Z 1

0�1� s�H ��

k�1�s��ds� (2.2.3)

� Hk � � �Hk� �Hk� N �� 2R2��

Consider the change in the potential ��H� between the points Hk and Hk�1��,

��Hk� �� Hk�1�� Hk� (2.2.4)

� �2tr�N�Hk�1��Hk��

� �2tr�N�� Hk� �Hk� N �� 2R2��

� �2� jj�Hk� N �jj2� 2� 2tr�NR2��

Observe that for � � 0 then ��Hk� 0� � 0, and also that dd��Hk� ��

��0

�

�2jj�Hk� N �jj2. Thus, for sufficiently small � the error term �2tr�NR2�� becomes neg-

ligible, and ��Hk� �� is strictly negative. Let opt � 0 be the first time for which

dd��Hk� ��

��opt

� 0, then ��Hk� opt� � ��Hk� �� 0 for all strictly positive

� � opt. It is not possible, however, to estimate opt directly from (2.2.4) due to the transcen-

dental nature of the error termR2��. Approximating the error term by a quadratic function in

� allows one to compute an explicit step-size selection scheme based on this estimate.

Lemma 2.2.2 (Constant Step-Size Selection Scheme) The constant time-step

cN �1

4jjH0jj jjN jj �2�2�5�

satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the step-

size selection scheme cN , satisfies (2.1.7).


α αk

∆ψ (Η,α) > ∆ψ(Η,α)

∆ψ (Η,α)

∆ψ (Η,α) < 0U

U

U

Figure 2.2.1: The upper bound on ��Hk� �� viz ��U �Hk� ��

Proof Recall that for the Frobenius norm jtr�XY �j � jjX jj jjY jj (follows from the Schwartz

inequality). Then

��Hk� �� 2� jj�Hk� N �jj2 � 2� 2jtr�NR2��j� �2� jj�Hk� N �jj2 � 2� 2jjN jj jjR2��jj� �2� jj�Hk� N �jj2 � 2� 2jjN jj Z 1

0�1� s�jj��Hk�1�s�� Hk� N �� Hk� N ��jjds

� �2� jj�Hk� N �jj2 � 4� 2jjN jj jjH0jj jj�Hk� N �jj2

�: ��U�Hk� �� (2.2.6)

Thus ��U�Hk� �� is an upper bound for ��Hk� �� and has the property that for sufficiently

small � , it is strictly negative, see Figure 2.2.1. Due to the quadratic form of ��U�Hk� �� in

� , it is immediately clear that ck � cN�Hk� � 1��4jjH0jjjjN jj� is the minimum of (2.2.6).

A direct norm bound of the integral error term is not likely to be a tight estimate of the

error and the function��U is a fairly crude bound for ��. The following more sophisticated

estimate results in a step-size selection scheme.

Lemma 2.2.3 (An Improved Bound for ��Hk� ��) The difference function ��Hk� �� can

be over bounded by

��Hk� �� 2� jj�Hk� N �jj2 �jjH0jj jj�N� �Hk� N ��jj

jj�Hk� N �jj�e2� jj�Hk�N �jj � 1� 2� jj�Hk� N �jj

��

�: ��U�Hk� �� (2.2.7)

x2-2 Step-Size Selection 35

Proof Consider the infinite series expansion for the matrix exponential

eA � I �A�12A2 �

13!A3 � �

It is easily verified that

eABe�A � B � �A�B� �12!�A� �A�B��

13!�A� �A� �A�B�� (2.2.8)

��Xi�0

1i!

adiAB�

Here adiAB � adA �adi�1A B�� ad0

AB � B where adA : Rn�n � Rn�n is the linear map

X � AX �XA. Substituting�� Hk� N � and Hk for A and B in (2.2.8) and comparing with

(2.2.3), gives

�2R2�� Xj�2

1j!

adj�� Hk�N ��Hk��

Considering jtr�NR2��j and using the readily established identity tr�Nadj�AB� �

tr��adjAN�B� gives

j� 2tr�NR2��j �

��Xj�2

1j!

tr�

adj� �Hk�N ��N�Hk

��

�Xj�2

1j!jjadj� �Hk�N ��N�jj jjH0jj

��Xj�2

1j!�2jj� �Hk� N �jj�j�1jjad� �Hk�N ��N�jj jjH0jj

�jjH0jj jjad� �Hk�N ��N�jj

2� jj�Hk� N �jj�Xj�2

1j!�2� jj�Hk� N �jj�j

�jjH0jj jj�N� �Hk� N ��jj

2jj�Hk� N �jj�e2� jj�Hk�N �jj � 1� 2� jj�Hk� N �jj

��

Thus combining this with the first line of (2.2.6) gives (2.2.7).

The variable step-size selection scheme is derived from this estimate of the error term in

the same manner the constant step-size selection scheme was derived in Lemma 2.2.2.


Lemma 2.2.4 ( Variable Step-Size Selection Scheme) The step-size selection scheme �N :

M�H0�� R�

�N�H� �1

2jj�H�N �jj log�jj�H�N �jj2

jjH0jj jj�N� �H�N ��jj � 1�� 2�2�9�

satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the step-

size selection scheme �N , satisfies (2.1.7).

Proof I first show that �N satisfies the requirements of Condition 2.1.1. As the Frobenius

norm is a continuous function then �N is well defined and continuous at all pointsH �M�H0�

for which �H�N � �� 0. When �H�N � � 0 then �N is not well defined. To show that there

exists a positive constant �, such that �N�H� � �, consider the following lower bound,

LN :�1

2jj�Hk� N �jj log�jj�Hk� N �jj

2jj�H0jj jjN jj � 1� (2.2.10)

� 12jj�Hk� N �jj log�

jj�Hk� N �jj22jj�H0jj jjN jj jj�Hk� N �jj � 1�

� 12jj�Hk� N �jj log�

jj�Hk� N �jj2jj�H0jj jj�N� �Hk� N ��jj � 1��

which is just �N . Using L’Hopital’s rule it can be seen that the limit of LN at a point

H � M�H0� where �H�N � � 0 is 1��4jjH0jj jjN jj�. Including these points in the definition

of LN , gives that LN is a continuous, strictly positive, well defined function for allH �M�H0�.

Thus, since M�H0� is compact, there exists a real number � � 0 such that

�N LN � � 0�

on M�H0�� fH� j �H�� N � � 0g.

To show that there exists a real number B � 0, such that �N�H� � B, H � M�H0�, set

�H�N � � X � fxijg. For N given by Condition 2.1.4, then jj�N�X �jj�Pi��j��i � �j�

2x2ij ,

where xii � 0 since �H�N � is skew symmetric. Observe that

jjX jjjj�N�X �jj �

Pi��j x

2ijP

i��j��i � �j�2x2ij

� maxi��j��i � �j��2 �: b

x2-3 Stability Analysis 37

for all choices of X � �XT . It follows that

�N�H� �1

2jjX jj log�jjX jj2

jjH0jj jj�N�X �jj � 1�

� 12jjX jjlog�

jjX jjbjjH0jj � 1�

� b

2jjH0jj �: B

since log�x� 1� � x for x � 0.

Finally, for a matrix Hk � M�H0�, �Hk� N � �� 0, the time-step �N �Hk� � �k � 0

minimises (2.2.7), and from Lemma 2.2.3 it follows that 0 ��U�Hk� �� Hk� ��.

Thus, the double-bracket algorithm, equipped with the step-size selection scheme �N , satisfies

(2.1.7) and the proof is complete.

2.3 Stability Analysis

In this section the stability properties of equilibria of the double-bracket algorithm (2.1.4) are

investigated. It is shown that for generic initial conditions, and any step-size selection scheme

that satisfies Condition 2.1.1 and (2.1.7), a solution fHkg of the double-bracket algorithm

converges to the unique equilibrium point , given by (2.1.5). The algorithm is shown to

converge at least linearly in a neighbourhood of .

Lemma 2.3.1 Let N satisfy Condition (2.1.4) and N be some selection scheme that satisfies

Condition 2.1.1 and (2.1.7). The double-bracket algorithm (2.1.4) has a unique locally asymp-

totically stable equilibrium point , given by (2.1.5). All other equilibrium points of (2.1.4)

are unstable.

Proof It has been shown that the Hessian of the potential function � (at an equilibrium point

in M�H0�) is always non-singular and is only negative definite at the point (cf. Duistermaat

et al. (1983) or Helmke and Moore (1994b, pg. 53)). Since M�H0� is compact then the

local minimum at is also a global minimum. By assumptions on N and N , ��Hk� is

monotonically decreasing. Thus the domain of attraction of contains an open neighbourhood

of , and hence, is a locally asymptotically stable equilibrium point of (2.1.4).


All other equilibrium pointsH� are either saddle points or maxima of � (Helmke & Moore

1994b, pg. 53). Thus for any neighbourhood D of an equilibrium point H� �� , there exists

some H0 � D such that ��H0� � ��H��. It follows that the solution to the double-bracket

algorithm, with initial condition H0, will not converge to H� and thus H� is unstable.

Lemma 2.3.1 is sufficient to conclude that for generic initial conditions the double-bracket

algorithm will converge to the unique matrix . It is difficult to characterise the set of initial

conditions for which the algorithm converges to some unstable equilibrium point H� �� .

For the continuous-time double-bracket flow, however, it is known that the unstable basins of

attraction of such points are of zero measure in M�H0� (Helmke & Moore 1994b, pg. 53).

Lemma 2.3.2 Let N satisfy Condition 2.1.4. Let d � R� be a constant such that 0 � d �

1�2jjH0jj2jjN jj2 and consider the constant step-size selection scheme, dN : M�H0�� R�,

dN �H� � d�

The double-bracket algorithm (2.1.4), equipped with the step-size selection scheme dN , has

a unique locally asymptotically stable equilibrium point , given by (2.1.5). The rate of

convergence of the double-bracket algorithm converges at least linear in a neighbourhood.

Proof Since dN is a constant function, the time-step dk � dN �Hk� � d is constant. Thus,

the map

Hk � e�d�Hk �N �Hked�Hk�N �

is a differentiable map on all M�H0�, and one may consider the linearisation of this map at

the equilibrium point , given by (2.1.5). The tangent space T�M�H0� at consists of those

matrices � �� where � � Skew�n�, the class of skew symmetric matrices (Helmke &

Moore 1994b, pg. 49). It is easily verified that the matrices � T�M�H0� are independently

parameterised by their components ij , where i � j, and �i �� j . Thus, computing the

linearization of the double-bracket algorithm at the point one obtains

k�1 � k � d��kN �Nk�� kN �Nk�� 2�3�1�

x2-3 Stability Analysis 39

for k � T�M�H0�. Expressing this in terms of the linearly independent parameters ij , where

i � j, and �i �� j one has

�ij�k�1 � �1� d��i � �j��i � �j��ij�k� for i� j � 1� � n� �2�3�2�

The eigenvalues of the linearisation (2.3.1) can be read directly from (2.3.2) as 1 � d��i ��j��i � �j�, for i � j and �i �� j . Since �i �j when i � j then if d � 1�2jjH0jj2jjN jj2(where jjX jj2 is the induced matrix 2-norm, the maximum singular value of X) it is easily

verified that j1� d��i � �j��i � �j�j � 1. It follows that is asymptotically stable with

rate of convergence at least linear. The linear scaling factor for the convergence error is

maxi�j��i ��jf1� d��i � �j��i � �j�g.

Remark 2.3.3 As jjN jj2jjH0jj2 � 2jjN jj jjH0jj, the constant step-size selection scheme cN

is an example of such a selection scheme. �

Remark 2.3.4 Let N : M�H0�� R� be a step-size selection scheme that satisfies Condition

2.1.1 and (2.1.7), and is also continuous on all M�H0�. Let be the locally asymptotically

stable equilibrium point given by (2.1.5). Set � � N �� and observe that the linearisation

of the double-bracket algorithm at is given by (2.3.1) with d replaced by �. Recall that the

LN , scheme defined in (2.2.10), is continuous, with limit LN �H�� 1��4jjH0jj jjN jj�. Thus,

is an exponentially asymptotically stable equilibrium point for the double-bracket algorithm

equipped with the step-size selection scheme LN . �

To show that the double-bracket algorithm is exponentially stable at for the �N step-size

selection scheme is technically difficult due to the discontinuous nature of �N at equilibrium

points. A full proof of the following proposition is given in Moore et al. (1994).

Proposition 2.3.5 Let N satisfy Assumption (2.1.4) and �N be the step-size selection scheme

given by Lemma 2.2.4. The iterative algorithm (2.1.4), has a unique linearly attractive equi-

librium point given by (2.1.5).

To give an indication of the behaviour of the double-bracket algorithm two plots of a

simulation have been included, Figures 2.3.1 and 2.3.2. The simulation was run on a random


1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80

Simulation 1

Iterations

Dia

gona

l ele

men

ts o

f es

timat

e

Figure 2.3.1: A plot of the diagonal elements of each iteration, Hk, of the double-bracketalgorithm (2.1.4) run on a 7 � 7 initial conditionH 0 with eigenvalues �1� � � � � 7�. The targetmatrix N was chosen to be diag�1� � � � � 7�.

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80

Simulation 1

Iterations

Pote

ntia

l

Figure 2.3.2: The potential��Hk� � jjHk �N jj2 for the double-bracket algorithm (2.1.4).

7� 7 symmetric initial value matrix with eigenvalues 1� � � � � 7. The target matrix N is chosen

as diag�1� � � � � 7� and as a consequence the minimum potential is �� 0. Figure 2.3.1 is a

plot of the diagonal entries of the recursive estimate Hk. The off-diagonal entries converge to

zero as the diagonal entries converge to the eigenvalues of Hk. Figure 2.3.2 is a plot of the

potential jjHk �N jj2 verses the iteration k. This plot clearly shows the monotonic decreasing

nature of the potential at each step of the algorithm.

The results of Sections 2.1, 2.2 and 2.3 are summarised in the following theorem.

Theorem 2.3.6 Let H0 � HT0 be a real symmetric n� n matrix with eigenvalues �1 � � �

�n. Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection

x2-4 Singular Value Computations 41

(2.2.5) or the variable step-size selection (2.2.9). The double-bracket algorithm


k � N �Hk��

with initial condition H0, has the following properties:

i) The recursion is isospectral.

ii) IfHk is a solution of the double-bracket algorithm, then��Hk� � jjHk�N jj2 is strictly

monotonically decreasing for every k � N where �Hk� N � �� 0.

iii) Fixed points of the recursive equation are characterised by matrices H � M�H0� such

that

�H�N � � 0�

iv) Fixed points of the recursion are exactly the stationary points of the double-bracket

equation. These points are termed equilibrium points.

v) Let Hk, k � 1� 2� � � �, be a solution to the double-bracket algorithm, then Hk converges

to a matrix H� �M�H0�, �H�� N � � 0, an equilibrium point of the recursion.

vi) All equilibrium points of the double-bracket algorithm are strictly unstable, except

� diag��1� � � � � �n� which is locally asymptotically stable.

vii) The rate of convergence of the double-bracket algorithm to the unique asymptotically

stable equilibrium point is at least linear in a neighbourhood of .

2.4 Singular Value Computations

In this section the task of determining the the singular values of an arbitrary matrix is considered.

A singular value decomposition of a matrixH0 � Rm�n, m n, is a matrix decomposition

H0 � V T�U� �2�4�1�


where V � O�m�, U � O�n� and

� �

�BBBBBBB�

�1In1 0...

. . ....

0 �rInr

0�m�n��n

�CCCCCCCA � �2�4�2�

Here �1 � �2 � � � � � �r 0 are the distinct singular values of H0, occurring with

multiplicitiesn1� � � � � nr such thatPr

i�1 ni � n. By convention the singular values of a matrix

are chosen to be non-negative. It should be noted that though such a decomposition always

exists and � is unique, there is no unique choice of orthogonal matrices V and U .

Let S�H0� be the set of all orthogonally congruent matrices to H0,

S�H0� � fV TH0U � Rm�n j V � O�m�� U � O�n�g� �2�4�3�

It is shown Helmke and Moore (1994b, pg. 89) that S�H0� is a smooth compact Riemannian

manifold with explicit forms given for its tangent space and Riemannian metric. Following

the articles (Chu 1986, Chu & Driessel 1990, Helmke & Moore 1990, Helmke & Moore

1994b, Smith 1991) consider the task of calculating the singular values of a matrix H0, by

minimising the least squares cost function � : S�H0� � R�, ��H� � jjH � N jj2. It is

shown Helmke and Moore (1990, 1994b) that � achieves a unique local and global minimum

at the point � � S�H0�. Moreover, in the articles (Helmke & Moore 1990, Helmke & Moore

1994b, Smith 1991) the explicit form for the gradient grad� is calculated. The minimizing

gradient flow is

�H � �grad��H� (2.4.4)

� HfH�Ng� fHT � NTgH�

with H�0� � H0 the initial condition. Here the generalised Lie-bracket

fX� Y g :� XTY � Y TX � �fX� Y gT �

is used.


Condition 2.4.1 Let N be an m� n matrix, with m n,

N �

��

�1 0...

. . ....

0 �n

0�m�n��n

��

where �1 � �2 � � � � � �n � 0 are strictly positive, distinct real numbers.

For generic initial conditions, and a target matrix N that satisfies Condition 2.4.1, it is

known that (2.4.4) converges exponentially fast to � � S�H0� (Helmke & Moore 1990, Smith

1991). For H0 and N constant m� n matrices, the singular value algorithm proposed is

Hk�1 � e��kfHTk�NT gHke

�kfHk�Ng� �2�4�5�

This algorithm is analogous to the double-bracket algorithm eqLB:eq:DB3.

Lemma 2.4.2 Let H0, N be m � n matrices. For any H � Rm�n define a map H � bH �R�m�n��m�n� , where

bH �

�B� 0m�m H

HT 0n�n

�CA � �2�4�6�

For any sequence of real numbers k, k � 1� � � � �� the iterations,


�kfHk�Ng� �2�4�7�

with initial condition H0 and

bHk�1 � e��k�bHk� bN� bHke

�k�bHk� bN�� 2�4�8�

with initial condition bH0 are equivalent.

Proof Consider the iterative solution to (2.4.8), and evaluate the multiplication in the block

form of (2.4.6). This gives two equivalent iterative solutions, one the transpose of the other,

both of which are equivalent to the iterative solution to (2.4.7).


Remark 2.4.3 Note that bH0 and bN are symmetric �m� n�� m� n� matrices, and that as a

result, the iteration (2.4.8) is just the double-bracket algorithm (2.1.4). �

Remark 2.4.4 The equivalence given by this lemma is complete in every way. In particular,

H� is an equilibrium point of (2.4.7) if and only if bH� is an equilibrium point of (2.4.8).

Similarly, Hk � H� if and only if bHk � bH� as k ��. �

This leads one to consider step-size selection schemes for the singular value algorithm

induced by selection schemes which were derived in Section 2.2 for the double-bracket algo-

rithm. Indeed if bN : M�cH0� � R� is a step-size selection scheme for (2.1.4), on M� bH0�,

and Hk � S�H0�, then one can define a time-step k for the singular value algorithm by

k � bN � bHk�� 2�4�9�

Thus, if (2.4.8) equipped with a step-size selection scheme bN , satisfies Condition 2.1.1

and (2.1.7), then from Lemma 2.4.2, (2.4.7) will satisfy similar conditions. For the sake of

simplicity the following development considers only the constant step-size selection scheme

(2.2.5) and the variable step-size selection (2.2.9).

Theorem 2.4.5 Let H0, N be m � n matrices where m n and N satisfies Condition 2.4.1.

Let bN : M�cH0� � R� be either the constant step-size selection (2.2.5), or the variable

step-size selection (2.2.9). The singular value algorithm


�kfHk�Ng�

k � bN � bHk��

with initial condition H0, has the following properties:

i) The singular value algorithm is a self-equivalent (singular value preserving) recursion

on the manifold S�H0�.

ii) If Hk is a solution of the singular value algorithm, then ��Hk� � jjHk �N jj2 is strictly

monotonically decreasing for every k � N where fHk� Ng �� 0 and fHTk � N

Tg �� 0.


iii) Fixed points of the recursive equation are characterised by matrices H � S�H0� such

that

fHk� Ng � 0 and fHTk � N

Tg � 0� �2�4�10�

Fixed points of the recursion are exactly the stationary points of the singular value

gradient flow (2.4.4) and are termed equilibrium points.

iv) Let Hk, k � 1� 2� � � �, be a solution to the singular value algorithm, then Hk converges

to a matrix H� � S�H0�, an equilibrium point of the recursion.

v) All equilibrium points of the double-bracket algorithm are strictly unstable except �,

given by (2.4.2), which is locally asymptotically stable with at least linear rate of

convergence.

Proof To prove part i) note that the generalised Lie-bracket, fX� Y g � �fX� Y gT , is skew

symmetric, and thus (2.4.5) is an orthogonal congruence transformation and preserves the

singular values of Hk. Also note that the potential ��Hk� �12��

bHk�. Moreover, Lemma

2.4.2 shows that the sequence bHk is a solution to the double-bracket algorithm and thus,

from Proposition 2.1.5, 12��

bHk� must be monotonically decreasing for all k � N such that

� bHk� bN � �� 0, which is equivalent to (2.4.10). This proves part ii), and part iii) follows by noting

that if fHTk � N

Tg � 0 and fHk� Ng � 0, then Hk�l � Hk for l � 1� 2� � � �, and Hk is a fixed

point of (2.4.5). Moreover, since ��Hk� is strictly monotonic decreasing for all fHk� Ng �� 0

and fHTk � N

Tg �� 0, then these points can be the only fixed points. It is known that these are

the only stationary points of (2.4.4) (Helmke & Moore 1990, Helmke & Moore 1994b, Smith

1991).

In order to prove iv) one needs the following characterisation of equilibria of the singular

value algorithm.

Lemma 2.4.6 Let N satisfy Condition 2.4.1 and bN be either the constant step-size selec-

tion (2.2.5), or the variable step-size selection (2.2.9). The singular value algorithm (2.4.5)

equipped with time-steps k � bN � bHk�, has exactly 2nn!�Qri�1�ni!� distinct equilibrium


points in S�H0�. Furthermore, these equilibrium points are characterised by the matrices

�B� �T 0n��m�n�

0�m�n��n 0�m�n��m�n�

�CA�S��

where � is an n� n permutation matrix, and S � diag��1� � � � ��1� a sign matrix.

Proof Equilibrium points of (2.4.5) are characterised by the two conditions (2.4.10). Since N

satisfies Condition 2.4.1, then setting H � �hij� one has fH�Ng � 0 is equivalent to

�jhji � �ihij � 0� for i � 1� � � �n� j � 1� � � �n�

Similarly, the condition fHT � NTg � 0 is equivalent to

�jhij � �ihji � 0� for i � 1� � � �n� j � 1� � � �n�

hij�j � 0 for i � n� 1� � � �m� j � 1� � � �n�

By manipulating the relationships, and using the distinct, positive nature of the �i, it is

easily shown that hij � 0 for i �� j. Using the fact that (2.4.5) is self equivalent, the only

possible matrices of this form which have the same singular values as H0 are characterised

as above. A simple counting argument shows that the number distinct equilibrium points is

2nn!�Qri�1�ni!�.

The proof of part iv) is now directly analogous to the proof of part c) Proposition 2.1.5. It

remains only to prove part v), which involves the stability analysis of the equilibrium points

characterised by (2.4.10). It is not possible to directly apply the results obtained in Section 2.3

to the double-bracket algorithm bHk, since the bN does not satisfy Condition 2.1.4. However, for

the constant step-size selection scheme induced by (2.2.5), and using analogous arguments to

those used in Lemma 2.3.1 and 2.3.2, it follows that� is the unique locally attractive equilibrium

point of the singular value algorithm. Similarly, by linearizing (2.4.4) for continuous step-size

selection schemes at the point�, it can be shown that the rate of convergence is at least linear in

a neighbourhood of �. Thus, using Lemma 2.4.2 it follows that b� is the unique exponentially

attractive equilibrium point of the double-bracket algorithm on M�cH0�. To obtain the same

results for the variable step-size selection scheme (2.2.9) one applies Proposition 2.3.5 to the

x2-5 Associated Orthogonal Algorithms 47

double-bracket algorithm on M�cH0� and uses the equivalence given by Lemma 2.4.2 to obtain

the same result for the singular value algorithm (2.4.4). This completes the proof.

Remark 2.4.7 The above theorem holds true for any time-steps k � bN� bHk� induced by a

step-size selection scheme, bN , which satisfies Condition 2.1.1, such that Theorem 2.3.6 holds.

�

2.5 Associated Orthogonal Algorithms

In addition to finding eigenvalues or singular values of a matrix it is often desired to determine

the full eigen-decomposition of a matrix, i.e. the eigenvectors related to each eigenvalue. As-

sociated with the double-bracket algorithm and singular value algorithm there are algorithms

evolving on the set of orthogonal matrices which converge to the matrix of orthonormal eigen-

vectors (for the double-bracket algorithm) and separate matrices of left and right orthonormal

singular direction (for the singular value algorithm). To simplify the subsequent analysis one

imposes a genericity condition on the initial condition H0.

Condition 2.5.1 If H0 � HT0 � Rn�n is a real symmetric matrix then assume that H0 has

distinct eigenvalues �1 � � � � � �n. If H0 � Rm�n, where m n, then assume that H0 has

distinct singular values �1 � � � � � �n � 0.

For a sequence of positive real numbers k, for k � 1� 2� � � �, the associated orthogonal

double-bracket algorithm is

Uk�1 � Uke�k�U

TkH0Uk�N �� U0 � O�n�� 2�5�1�

where H0 � HT0 � Rn�n is symmetric. For an arbitrary initial condition H0 � Rm�n the

associated orthogonal singular value algorithm is

Vk�1 � Vke�kfU

TkHT

0 Vk�NT g� V0 � O�m� (2.5.2)

Uk�1 � Uke�kfV

TkH0Uk�Ng� U0 � O�n��


Note that in each case the exponents of the exponential terms are skew symmetric and thus, the

recursions will remain orthogonal.

Let H0 � HT0 � Rn�n and consider the map g : O�n�� M�H0�, U � UTH0U , which

is a smooth surjection. If Uk is a solution to (2.5.1) observe that

g�Uk�1� � e��k�g�Uk��N �g�Uk�e�k�g�Uk��N �

is the double-bracket algorithm (2.1.4). Thus, g maps the associated orthogonal double-bracket

algorithm with initial condition U0, to the double-bracket algorithm with initial condition

UT0 H0U0, on M�UT

0 H0U0� �M�H0�.

Remark 2.5.2 Consider the potential function : O�n� � R�, �U� � jjUTH0U � N jj2

on the set of orthogonal n� n matrices. Using the standard induced Riemannian metric from

Rn�n on O�n�, the associated orthogonal gradient flow is (Brockett 1988, Chu 1984a, Chu &

Driessel 1990, Helmke & Moore 1994b)

�U � �grad�U� � U �UTH0U�N ��

�

Theorem 2.5.3 Let H0 � HT0 be a real symmetric n�n matrix that satisfies Condition 2.5.1.

Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection

(2.2.5) or the variable step-size selection (2.2.9). The recursion


TkH0Uk�N �� U0 � O�n��

k � N�Hk�

referred to as the associated orthogonal double-bracket algorithm, has the following properties:

i) A solution Uk, k � 1� 2� � � �, to the associated orthogonal double-bracket algorithm

remains orthogonal.

ii) Let �U� � jjUTH0U � N jj2 be a map from O�n� to the set of non-negative reals

R�. Let Uk , k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket

x2-5 Associated Orthogonal Algorithms 49

algorithm. Then �Uk� is strictly monotonically decreasing for every k � N where

�UTk H0Uk � N � �� 0.

iii) Fixed points of the algorithm are characterised by matrices U � O�n� such that

�UTH0U�N � � 0�

There are exactly 2nn! distinct fixed points.

iv) Let Uk, k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket algo-

rithm, then Uk converges to an orthogonal matrix U�, a fixed point of the algorithm.

v) All fixed points of the associated orthogonal double-bracket algorithm are strictly un-

stable, except those 2n points U� � O�n� such that

UT� H0U� � �

where � diag��1� � � � � �n�. Such points U� are locally asymptotically stable with at

least linear rate of convergence and H0 � U�UT� is an eigenspace decomposition of

H0.

Proof Part i) follows directly from the orthogonal nature of e�k �UTkH0Uk�N �. Note that in part

ii) the definition of can be expressed in terms of the map g�U� � UTH0U from O�n� to

M�H0� and the double-bracket potential ��H� � jjH �N jj2 of (2.1.1), i.e.

�Uk� � ��g�Uk��

Observe that g�U0� � UT0 H0U0, and thus, g�Uk� is the solution of the double-bracket algorithm

with initial conditionU T0 H0U0. As the step-size selection scheme N is either (2.2.5) or (2.2.9),

then g�Uk� satisfies (2.1.7). This ensures that part ii) holds.

If Uk is a fixed point of the associated orthogonal double-bracket algorithm with initial

condition UT0 H0U0, then g�Uk� is a fixed point of the double-bracket algorithm. Thus, from

Proposition 2.1.5, �g�Uk�� N � � �UTk H0Uk� N � � 0. Moreover, if �UT

k H0Uk � N � � 0 for

some given k � N, then by inspection Uk�l � Uk for l � 1� 2� � � �, and Uk is a fixed

point of the associated orthogonal double-bracket algorithm. From Lemma 2.1.6 it follows


that if U is a fixed point of the algorithm then UTH0U � �T� for some permutation

matrix �. By inspection any orthogonal matrix W � SU�T , where S is a sign matrix

S � diag��1� � � � ��1�, is also a fixed point of the recursion, and indeed, any two fixed points

are related in this manner. A simple counting argument shows that there are exactly 2nn!

distinct matrices of this form.

To prove iv), note that since g�Uk� is a solution to the double-bracket algorithm, it converges

to a limit point H� � M�H0�, �H�� N � � 0 (Proposition 2.1.5). Thus Uk must converge

to the preimage set of H� via the map g. Condition 2.5.1 ensures that set generated by the

preimage of H� is a finite distinct set, any two elements U 1� and U2

� of which, are related by

U1� � U2

�S, S � diag��1� � � � ��1�. Convergence to a particular element of this preimage

follows since k�UTk H0Uk� N �� 0 as in Proposition 2.1.5.

To prove part v), observe that the dimension of O�n� is the same as the dimension of

M�H0�, due to the genericity Condition 2.5.1. Thus g is locally a diffeomorphism on O�n�,

which forms an exact equivalence between the double-bracket algorithm and the associated

orthogonal double-bracket algorithm. Restricting g to a local region the stability structure of

equilibria are preserved under the map g�1. Thus, all fixed points of the associated orthogonal

double-bracket algorithm are locally unstable except those that map via g to the unique locally

asymptotically stable equilibrium of the double-bracket algorithm recursion. Observe that due

to the monotonicity of �Uk� a locally unstable equilibrium is also globally unstable.

The proof of the equivalent result for the singular value algorithm is completely analogous

to the above proof.

Theorem 2.5.4 Let H0 � Rm�n where m n satisfies Condition 2.5.1. Let N � Rm�n

satisfy Condition 2.4.1. Let the time-step k be given by

k � bN � bH��where bN is either the constant step-size selection (2.2.5), or the variable step-size selection

scheme (2.2.9), on M� bH0�. The recursion

Vk�1 � Vke�kfU

TkHT

0 Vk�NT g� V0 � O�m�

Uk�1 � Uke�kfV

TkH0Uk �Ng� U0 � O�n��

x2-6 Computational Considerations 51

referred to as the associated orthogonal singular value algorithm, has the following properties:

i) Let �Vk� Uk� be a solution to the associated orthogonal singular value algorithm, then

both Vk and Uk remain orthogonal.

ii) Let �V� U� � jjV TH0U �N jj2 be a map from O�m��O�n� to the set of non-negative

reals R�, then �Vk� Uk� is strictly monotonically decreasing for every k � N where

fV Tk H0Uk� Ng �� 0 and fUT

k HT0 Vk� N

Tg �� 0. Moreover, fixed points of the algorithm

are characterised by matrix pairs �V� U� � O�m��O�n� such that

fV TH0U�Ng � 0 and fUTHT0 V�N

Tg � 0�

iii) Let �Vk� Uk�, k � 1� 2� � � �, be a solution to the associated orthogonal singular value

algorithm, then �Vk� Uk� converges to a pair of orthogonal matrices �V�� U��, a fixed

point of the algorithm.

iv) All fixed points of the associated orthogonal singular value algorithm are strictly unsta-

ble, except those points �V�� U�� O�m�� O�n� such that

V T� H0U� � ��

where � � diag��1� � � � � �n� � Rm�n. Each such point �V�� U�� is locally exponentially

asymptotically stable and H0 � V T� �U� is a singular value decomposition of H0.

2.6 Computational Considerations

There are several issues involved in the implementation of the double-bracket algorithm as

a numerical tool which have not been dealt with in the body of this chapter. Design and

implementation of efficient code has not been considered and would depend heavily on the

nature of the hardware on which such a recursion would be run. As each iteration requires

the calculation of a time-step, an exponential and a k � 1 estimate it is likely that it would

be best to consider applications in parallel processing environments. Certainly in a standard

computational environment the exponential calculation would limit the possible areas of useful

application of the algorithms proposed.


It is also possible to consider approximations of the double-bracket algorithm which have

good computational properties. For example, consider a (1,1) Pade approximation to the matrix

exponential

e�k�Hk�N � � 2In � k �Hk� N �

2In � k �Hk� N ��

Such an approach has the advantage that, as �Hk� N � is skew symmetric, the Pade approximation

will be orthogonal, and will preserve the isospectral nature of the double-bracket algorithm.

Similarly an �n� n� Pade approximation of the exponential for any n will also be orthogonal.

There are difficulties involved in obtaining direct step-size selection schemes based on Pade

approximate double-bracket algorithms. Trying to guarantee that the potential � is monotonic

decreasing for such schemes by choosing step-size selection schemes based on the estimation

techniques developed in Section 2.2 yields time-steps which are prohibitively small. A good

heuristic choice of step-size selection scheme, however, can be made based on the selection

schemes given in this chapter and simulations indicate that the Pade approximate double-bracket

algorithm is viable when this is done.

2.7 Open Questions and Further Work

One of the fundamental problems tackled in this chapter is the task of step-size selection. The

best step-size selection scheme developed (2.2.9) is unsatisfactory in several ways; it is not

continuous at critical points of the cost function and it is computationally expensive to evaluate.

A better general understanding of the step-size selection task would be desirable. In particular,

it may be possible to develop linear search techniques that are guaranteed to converge to the

optimal step-size, obviating the need for approximations.

One of the primary motivations for studying the symmetric eigenvalue problem from a dy-

namical systems perspective is the potential for applications to on-line and adaptive processes.

It is instructive to consider how the double-bracket algorithm can be modified to deal with

time-varying data. Subsection 2.7.1 is by no means a comprehensive treatment of this issue,

nevertheless, it provides an indication of how such a task may be approached. To go beyond

the treatment of Subsection 2.7.1 it would be desirable to consider a particular application and

refine the algorithm to provide a useful numerical technique.

x2-7 Open Questions and Further Work 53

2.7.1 Time-Varying Double-Bracket Algorithms

Consider a sequence of ‘input’ matrices Ak � ATk for which an estimate of the eigenvalues

of Ak at each time k is required. One assumes that the spectrum of each Ak is related, for

example the sequence Ak is slowly varying with k. If the sequence Ak is a noisy observation

of some time-varying process or contains occasional large deviations then a sensible algorithm

for estimating the spectrum of Ak�1 would exploit the full data sequence A0� � � � � Ak along

with the new data Ak�1 to generate a new estimate. A gradient descent algorithm achieves this

in a fundamental manner since each new estimate is based on a small change in the previous

estimate which in turn is based on the data sequence up to time k. The issue of constraint

stability is of importance in such situations since the presence of small errors in the constraint

at each step may eventually lead the estimates to stray some distance from the true spectrum.

Given a symmetric matrix H0 � HT0 and a diagonal matrix N � diag��1� � � � � �N�

satisfying Condition 2.1.4 consider the potential

�U� � jjUTH0U �N jj2�

In Section 2.5 the relationship UTH0U � M�H0� was exploited to display the connections

between the double-bracket algorithm and the associated orthogonal algorithm. However, it is

also possible to rewrite the potential as

�U� � jjUNUT �H0jj2F �

Similarly, the associated orthogonal algorithm itself can be rewritten with the matrixNmodified

by an orthogonal congruency transformation


TkH0Uk�N �

� e�k�H0�UkNUTk�Uk �

The advantage of this formulation is the fact that matrixH0 appears explicitly in the algorithm.

The time-varying associated orthogonal double-bracket algorithm is defined to be

Uk�1 � e�k�Ak�UkNkUTk�Uk� U0 � I� �2�7�1�


An estimate of the spectral decomposition of Ak�1 is given by Hk�1 � UTk�1Ak�1Uk�1.

Observe that the eigenvector decomposition of Ak�1 is derived from the data sequence up to

timek and is applied toAk�1 to approximate a spectral decompositionAk�1 � Uk�1Hk�1UTk�1

where it is hoped that Hk�1 is nearly diagonal.

IfAk � H0 is constant it is easily seen that the time-varying associated orthogonal algorithm

reduces to the standard associated orthogonal algorithm. Also each step of the time-varying

algorithm will reduce the potential jjAk � Uk�1NUTk�1jj � jjAk � UkNUT

k jj. Thus, as long

as the sequence of matrices Ak does not vary too quickly with time the proposed algorithm

should converge to and track the spectral decomposition of Ak .

Remark 2.7.1 A time-varying dual singular value decomposition algorithm is fully analogous

to the development given above. �

Remark 2.7.2 If the sequence Ak is a stationary stochastic process it may be sensible to

replace the driving term Ak in (2.7.1) by Bk �1n

Pnk�1 Ak . �

Chapter 3

Gradient Algorithms for Principal

Component Analysis

The problem of principal component analysis of a symmetric matrixN � NT is that of finding

an eigenspace of specified dimension p 1 which corresponds to the maximal p eigenvalues of

N . There are a number of classical algorithms available for computing dominant eigenspaces

(principal components) of a symmetric matrix. A good reference for standard numerical

methods is Golub and Van Loan (1989).

There has been considerable interest in the last decade in using dynamical systems to solve

linear algebra problems (cf. the review (Chu 1988) and the recent monograph (Helmke &

Moore 1994b)). It is desirable to consider the relationship between such methods and classical

algebraic methods. For example, Deift et al. (1983) investigated a matrix differential equation

based on the Toda flow, the solution of which (evaluated at integer times) is exactly the sequence

of iterates generated by the standard QR algorithm. In general, dynamical system solutions of

linear algebra problems do not interpolate classical methods exactly. Discrete computational

methods based on dynamical system solutions to a given problem provide a way of comparing

classical algorithms with dynamical system methods. Recent work on developing numerical

methods based on dynamical systems insight is contained Brockett (1993) and Moore et al.

(1994).

Concentrating on the problem of principal component analysis, Ammar and Martin (1986)

55

56 Gradient Algorithms for Principal Component Analysis Chapter 3

have studied the power method (for determining the dominant p-dimensional eigenspace of

a symmetric matrix) as a recursion on the Grassmannian manifold Gp�Rn�, the set of all p-

dimensional subspaces of Rn. Using local coordinate charts on Gp�Rn� Ammar and Martin

(1986) show that the power method is closely related to the solution of a matrix Riccati

differential equation. Unfortunately, the solution to a matrix Riccati equation may diverge to

infinity in finite time. Such solutions correspond to solutions that do not remain in the original

local coordinate chart. Principal component analysis has also been studied by Oja (1982, 1989)

in relation to understanding the learning performance of a single layer neural network with n

inputs and p neurons. Oja’s analysis involves computing the limiting solution of an explicit

matrix differential equation evolving on Rn�p (there is no requirement for local coordinate

representations). The evolution of the system corresponds to the ‘learning’ procedure of the

neural network while the columns of the limiting solution span the principal component of the

covariance matrix N � EfukuTk g (where EfukuTk g is the expectation of ukuTk ) of the vector

random process uk � Rn, k � 1� 2� � � �, with which the network was ‘trained’. Recent work by

Yan et al. (1994) has provided a rigourous analysis of the learning equation proposed by Oja.

Not surprisingly, it is seen that the solution to Oja’s learning equations is closely related to the

solution of a Riccati differential equation.

In this chapter I investigate the properties of Oja’s learning equation restricted to the Stiefel

manifold (the set of all n � p real matrices with orthonormal columns). Explicit proofs of

convergence for the flow are presented which extend the results of Yan et al. (1994) and Helmke

and Moore (1994b, pg. 26) so that no genericity assumptionis required on the eigenvalues ofN .

The homogeneous nature of the Stiefel manifold is exploited to develop an explicit numerical

method (a discrete-time system evolving on the Stiefel manifold) for principal component

analysis. The method proposed is a gradient descent algorithm modified to evolve explicitly

on St(p� n). A step-size must be selected for each iteration and a suitable selection scheme is

proposed. Proofs of convergence for the proposed algorithm are given as well as modifications

and observations aimed at reducing the computational cost of implementing the algorithm on

a digital computer. The discrete method proposed is similar to the classical power method and

steepest ascent methods for determining the dominant p-eigenspace of a matrix N . Indeed, in

the case where p � 1 (for a particular choice of time-step) the discretization is shown to be the

power method. When p � 1, however, there are subtle differences between the methods.

x3-1 Continuous-Time Gradient Flow 57

This chapter is based on the journal paper (Mahony et al. 1994). Applications of the same

ideas have also been considered in the field of linear programming (Mahony & Moore 1992).

The chapter is organised into five sections including the introduction. Section 3.1 reviews

the derivation of the continuous-time matrix differential equation considered and gives a general

proof of convergence. In Section 3.2 a discrete-time iteration based on the results in Section 3.1

is proposed along with a suitable choice of time-step. Section 3.3 considers two modifications of

the scheme to reduce the computational cost of implementing the proposed numerical algorithm.

Section 3.4 considers the relationship of the proposed algorithm to classical methods while

Section 3.5 indicates further possibilities arising from this development.

3.1 Continuous-Time Gradient Flow

In this section a dynamical systems solution to the problem of finding the principal component

of a matrix is developed. The approach is based on computing the gradient flow associated

with a generalised Rayleigh quotient function. The reader is referred to Warner (1983) for

technical details on Lie-groups and homogeneous spaces.

LetN � NT be a real symmetricn�nmatrix with eigenvalues�1 �2 � � � �n and an

associated set of orthonormal eigenvectors v1� � � � � vn. A maximal p-dimensional eigenspace,

or maximal p-eigenspace of N is spfv1� � � � � vpg the subspace ofRn spanned by fv1� � � � � vpg.

If �p � �p�1 then the maximal p-eigenspace of N is unique. If �p � �p�1 � � �p�r,

for some r � 0, then any subspace spfv1� � � � � vp�1� wg where w � spfvp� vp�1� � � � � vp�rg is

a maximal p-eigenspace of N .

For p an integer with 1 � p � n, let

St(p� n) � fX � Rn�p j XTX � Ipg� �3�1�1�

where Ip is the p�p identity matrix, denote the Stiefel manifold of real orthogonaln�pmatrices.

ForX � St(p� n), the columns ofX are orthonormal basis vectors for a p-dimensional subspace

of Rn.

Lemma 3.1.1 The Stiefel manifold St(p� n) is a smooth compact np� 12p�p� 1�-dimensional


submanifold ofRn�p. The tangent space of St(p� n), at a point X is given by

TX St(p� n) � f�X �X� j � � Sk�n�� Sk�p�g� �3�1�2�

whereSk�n�, Sk�p� are the set ofn�n (respectively p�p) skew symmetric matricesA � �AT ,

A � Rn�n.

Proof It can be shown that St(p� n) is a regular1 level set of the function X � XTX � Ip

(Helmke & Moore 1994b, pg. 25). In this chapter, however, the homogeneous structure of

St(p� n) is important and it is best to introduce this structure here. Define G � O�n�� O�p�

to be the topological product of the set of n � n and p � p real orthogonal matrices O�n� �

fU � Rn j UTU � UUT � Ing. Observe that G is a compact Lie-group (Helmke & Moore

1994b, pg. 348) with group multiplication given by matrix multiplication �U1� V1� �U2� V2� �

�U1U2� V1V2�. Define a map � : G� St(p� n) � St(p� n)

��U� V �� X� :� UXV T � �3�1�3�

It is easily verified that � is a smooth, transitive, group action on St(p� n). Since G is compact

it follows that St(p� n) is a compact embedded submanifold ofRn�p (Helmke & Moore 1994b,

pg. 352). The tangent space of St(p� n) at a point X � St(p� n) is given by the image of the

linearization of �X : G� St(p� n), �X�U� :� ��U�X�, at the identity element of G (Gibson

1979, pg. 75). Recall that the tangent space of O�n� at the identity is TInO�n� � Sk�n�

(Helmke & Moore 1994b, pg. 349) and consequently that the tangent space at the identity of

G is T�In�Ip�G � Sk�n�� Sk�p�. Computing the linearization of �X gives

D�X j�In�Ip�� X �X��

where D�X j�In�Ip�� is the Frechet derivative of �X at �In� Ip� in direction �� T�In�Ip�G.

1Given a function f : M � N between two smooth manifolds, a regular point p � M is a point where thetangent map Tpf : TpM � Tf�p�N is surjective. Given q � N let U � fp � M j f�p� � qg then U is knownas a regular level set if for each p � U then Tpf is surjective. It can be shown (using the inverse function theorem)that regular level sets are embedded submanifolds of M (Hirsch 1976, pg. 22).


The Euclidean inner product onRn�n �Rp�p is

h��1��1�� 2��2�i � tr��T1 �2� � tr��T

1 �2��

This induces a non-degenerate inner product on T�In�Ip�G. Given X � St(p� n) then the

linearization T�In�Ip��X decomposes the identity tangent space into

T�In�Ip�G � kerT�In�Ip��X � dom T�In�Ip��X �

where kerT�In�Ip��X is the kernel of T�In�Ip��X and

dom T�In�Ip��X � f��1��1� � T�In�Ip�G j h��1��1�� i� 0� �� kerT�In�Ip��Xg

is the domain of T�In�Ip��X (the subspace orthogonal to kerT�In�Ip��X using the Euclidean

inner product provided on T�In�Ip�G). By construction, T�In�Ip��X restricts to a vector space

isomorphism T��In�Ip�

�X ,

T��In�Ip��X : dom T�In�Ip��X � TX St(p� n)

T��In�Ip��X�X� :� T�In�Ip��X�X��

The normal Riemannian metric (cf. Section 5.3 or Helmke and Moore (1994b, pg. 52)) on

St(p� n) is the non-degenerate bilinear map on each tangent space

hh�1X �X�1��2X �X�2ii � tr��1 �T��2 � � tr��1 �

T��2 � �3�1�4�

where �iX �X�i � TX St(p� n) for i � 1� 2 and

��i��i� � ��i�� i�� i ��i �

is the decomposition of ��i��i� into components in kerT�In�Ip��X and dom T�In�Ip��X re-

spectively. It is easily verified that hh� ii varies smoothly with X and defines a Riemannian

metric.

Remark 3.1.2 It can be shown that for p � 1, St�1� n� � fx � Rn j jjxjj � 1g � Sn�1, the

�n� 1�-dimensional sphere inRn. The tangent space of Sn�1 is TxSn�1 � f � Rn j Tx �


0g, and the normal metric is hh� �ii� T�, for , � in TxSn�1 (Helmke & Moore 1994b, pg.

25). �

The classical Rayleigh quotient is the map rN : Rn � f0g � R,

rN�x� �xTNx

xTx�

This is generalised to the Stiefel manifold St(p� n) as a function termed the generalised Rayleigh

quotient

RN : St(p� n) � R� RN�X� � tr�XTNX�� 3�1�5�

The Ky-Fan minimax Principle states (Horn & Johnson 1985, pg. 191)

maxX St(p� n)

RN�X� � �1 � � � �� p

minX St(p� n)

RN�X� � �n�1�p � � � �� n�

Moreover, if X � St(p� n), such that RN�X� �Pp

j�1 �i, then the columns of X will generate

a basis for a maximal p-dimensional eigenspace of N .

Theorem 3.1.3 Given N � NT a real symmetric n � n matrix and p be an integer with

1 � p � n. Denote the eigenvalues of N by �1 � � � �q with algebraic multiplicities

n1� � � � � nq such thatPq

i�1 ni � n. For X � St(p� n), define the generalised Rayleigh quotient

RN : St(p� n) � R, RN�X� � tr�XTNX�. Then,

i) The gradient of RN�X�, on the Stiefel manifold St(p� n), with respect to the normal

Riemannian metric (3.1.4), is

gradRN �X� � �In �XXT�NX � �XXT � N �X� �3�1�6�

ii) The critical points of RN�X� on St(p� n) are characterised by

�XXT � N � � 0

and correspond to pointsX � St(p� n), such that the columns ofX span a p-dimensional

eigenspace of N .


iii) For all initial conditions X0 � St(p� n), the solution X�t� � St(p� n) of

d

dtX � gradRN�X��

� �In �XXT�NX X�0� � X0 (3.1.7)

exists for all t � Rand converges to some matrix X� � St(p� n) as t��. For almost

all initial conditions the solutionX�t� of (3.1.7) converges exponentially fast to a matrix

whose columns form a basis for the maximal p-eigenspace of N .

iv) When p � 1 the exact solution to (3.1.7) is given by

x�t� �etNx0

jjetNx0jj �3�1�8�

where x0 � Sn�1 � St�1� n�.

Proof The gradient of RN is computed using the identities

�i�� DRN jX�� hhgradRN�X�� ii� � TX St(p� n)

�ii�� gradRN �X� � TX St(p� n)�

where DRN jX�� is the Frechet derivative of RN�X� in direction � TX St(p� n) evaluated

at the point X � St(p� n). Computing the Frechet derivative of RN in direction �X �X� �TX St(p� n) gives

DRN jX��X �X�� 2tr�XTN��X �X��

� 2tr�XXTN�� 2tr�XTNX��

Observe that tr�XTNX�� 0 since XTNX is symmetric and � is skew symmetric. Simi-

larly, only the skew symmetric part of XXTN contributes to tr�XXTN��. Thus,

DRN jX��X �X�� tr��XXT � N ��

� hh�N�XXT �X��X �X��ii�

using the Riemannian metric (3.1.4). The second line follows since any component of


�N�XXT � that lies in kerT�In�Ip��X does not contribute to the value of tr��XXT � N ��

since one may choose � � dom T�In�Ip��X and of course �N�XXT� � Sk�n� which ensures

�N�XXT �X � TX St(p� n). This proves part i).

At critical points of RN the gradient gradRN�X� is zero, �N�XXT �X � 0. Consider the

orthogonal change of coordinates U � O�n�

X� � UX �

�B� Ip

0�n�p�p�

�CAand

N� � UNUT �

�B� N�11 N�

12

N�21 N�

22

�CA �

where N�11 � Rp�p, N�

12 � �N�21�

T � Rp��n�p� and N�22 � R�n�p��n�p�. Observe that

U �N�XXT �X � �N�� X��X��T �X�

�

�B� 0 �N�12

N�21 0

�CAX� �

�B� 0

N�21

�CA� 0

since �N�XXT�X � 0. Consequently, N�21 � 0 and thus �N �� X��X��T � � 0. It follows that

�N�XXT � � 0. Observe that gradRN�X� � 0 �� NX � X�XTNX� and the columns of

X form a basis for a p-eigenspace of N .

Infinite time existence of solutions to (3.1.7) follows from the compact nature of St(p� n).

By applying La’Salles invariance principle, it is easily verified that X�t� converges to a level

set of RN for which gradRN�X� � 0. These sets are termed critical level sets and denoted

Lr.

Lemma 3.1.4 Given N andRN as above. The critical level sets of RN in St(p� n) are the sets

Lr � fX � St(p� n) j RN�X� �qXi�1

ri�i� gradRN�X� � 0g�

which are indexed by vectors r � �r1� r2� � � � � rq�, such that each ri is an integer between zero

and ni inclusive �0 � ri � ni� and the sumPq

i�1 ri � p. For any X � Lr then the columns


of X span an eigenspace of N associated with the union of ri eigenvalues �i for i � 1� � � � � q.

Each Lr is an embedded submanifold of St(p� n). The tangent space of Lr is given by

TXLr � f�X �X� j � � Sk�n�� Sk�p�� and �N� �� XXT �� 0g� �3�1�9�

Proof The columns of a critical point X � St(p� n) of RN span a p-eigenspace of N and

thus, the eigenvalues of XTNX are a subset of p eigenvalues of N . Index this subset by

a vector r � �r1� r2� � � � � rq�, such that each ri is an integer between zero and ni inclusive

�0 � ri � ni�, the sumPq

i�1 ri � p and each ri represents the algebraic multiplicity of �i as

an eigenvalue of XTNX . It follows directly that RN�X� �Pq

i�1 ri�i and thus the collection

of sets Lr are the critical level sets of RN .

Since N is symmetric there exists U� � O�n� such that N � UT� U� and

�

�BBBB��1In1 0 0

0. . . 0

0 0 �qInq

�CCCCA �

with Ini the ni � ni identity matrix. To show that the critical level sets Lr are embedded

submanifolds of St(p� n) it is convenient to consider the problem where N is replaced by

directly. In this case the critical level sets Lr� of R� on St(p� n) are exactly Lr� � UT� Lr.

The map X � U�X is a diffeomorphism of St(p� n) into itself which preserves submanifold

structure.

LetH � O�n1��O�n2�� O�nq��O�p� and observe thatH is a compact Lie group.

Given an arbitrary index r consider the map � : H � Lr� � Lr

�,

��U1� U2� � � �Uq� V �� X� � UXV T � where U �

�BBBB�U1 0 0

0. . . 0

0 0 Uq

�CCCCA �

Observe that UTU � In and consequently R��U� V �� X�� R��X�. Moreover,

gradR��U� V �� X�� UXXTUT �UXV T

� UT �� XXT �XV T � 0


since it is assumed that gradR��X� � �� XXT�X � 0. It follows that� is a group action ofH

onLr

�. IfX andY are both elements ofLr� thenXTX andY TY have the same eigenvalues

and are orthogonally similar, i.e. there exists V � O�p� such that V TXTXV � Y TY . By

inspection one can find U � �U1� U2� � � �Uq� such that UXV � Y which shows that � is a

transitive group action onLr

�. It follows thatLr

� is itself a homogeneous space (with compact

Lie transformation group) and hence is an embedded submanifold of St(p� n) (Helmke &

Moore 1994b, pg. 352). Since X � U�X is a diffeomorphism of St(p� n) into itself this

shows that Lr � U�Lr

� is also an embedded submanifold of St(p� n).

Observe that any curve Y �t� � U�t�XV T �t�, Y �0� � X , lying in Lr will satisfy

�N� Y �t�Y �t�T �Y �t� � 0. Similarly, it is easily verified that any curve (passing through

Y �0� � X � Lr) satisfying this equality must lie in Lr. Thus, the tangent space TXLr is

given by the equivalence classes of the derivatives at time t � 0, of curves Y �t� such that

�N� Y �t�Y �t�T �Y �t� � 0. Let �U�0� � � � Sk�n� and �V �0� � � � Sk�p� then

d

dt

��N� Y �t�Y �t�T �Y �t�

��t�o

� �N��XXT �XXT��X � �N�X�XT �X�XT �X

� �N�XXT ��X � �N�XXT �X�

� �N� �� XXT��

since �N�XXT � � 0 (cf. part ii) Theorem 3.1.3). But this is just the definition (3.1.9) and the

result is proved.

Now at a critical point of RN the Hessian HRN is a well defined bilinear map from

TX St(p� n) to the reals (Helmke & Moore 1994b, pg. 344). Let ��1X �X�1� � TX St(p� n)

and ��2X �X�2� � TX St(p� n) be arbitrary then

HRN ��1X �X�1��2X �X�2� � D�1X�X�1

�D�2X�X�2RN �X�

� D�1X�X�1tr��XXT � N ��2�

� tr��1� XXT �� N ��2��

Observe that ��1� XXT � is skew symmetric since XXT is symmetric and �1 is skew sym-

metric. Similarly, ��1� XXT �� N � is skew symmetric. Since �1 and �2 are arbitrary then

HRN is degenerate in exactly those tangent direction ��X � X�� TX St(p� n) for which

x3-2 A Gradient Descent Algorithm 65

�� XXT�� N � � 0. But this corresponds exactly to (3.1.9) and one concludes that the Hessian

HRN degenerates only on the tangent space of Lr. It is now possible to apply Lemma 5.5.2 to

complete the proof of part iii).

Part iv) of the theorem is verified by explicitly evaluating the derivative of (3.1.8).

Remark 3.1.5 In the case 1 � p � n no exact solution to (3.1.7) is known, however, for X�t�

a solution to (3.1.7) the solution for H�t� � X�t�X�t�T is known since

�H�t� � �XXT �X �XT

� NXXT �XXTN � 2XXTNXXT

� NH�t� �H�t�N � 2H�t�NH�t�� (3.1.10)

H�0� � X0XT0 and this equation is a Riccati differential equation (Yan, Helmke & Moore

1994). �

3.2 A Gradient Descent Algorithm

In this section a numerical algorithm for solving (3.1.7) is proposed. The algorithm is based

on a gradient descent algorithm modified to ensure that each iteration lies in St(p� n).

Let X0 � St(p� n) and consider the recursive algorithm generated by

Xk�1 � e��k�XkXTk�N �Xk� �3�2�1�

for a sequence of positive real numbers k, termed time-steps. The algorithm generated by

(3.2.1) is referred to as the Rayleigh gradient algorithm. The Lie-bracket �XkXTk � N � is skew

symmetric and consequently e��k�XkXTk�N � is orthogonal and Xk�1 � St(p� n). Observe also

thatd

d�e�� XkX

Tk�N �Xk

��0

� �In �XkXTk �NXk � gradRN�Xk��

the gradient of RN at Xk. Thus, e�� XkXTk�N �Xk represents a curve in St(p� n), passing

through Xk at time � � 0, and with first derivative equal to gradRN�X�. The linearization of


Xk�1�� e�� XkXTk�N �Xk around � � 0 is

Xk�1�� Xk � �gradRN�Xk� � �higher order terms��

The higher order terms modify the basic gradient descent algorithm onRn�p to ensure that the

interpolation occurs along curves in St(p� n). For suitably small time-steps k, it is clear that

(3.2.1) will closely approximate the gradient descent algorithm onRn�p.

To implement the Rayleigh gradient algorithm it is necessary to choose a time-step k ,

for each step of the recursion. A convenient criteria for determining suitable time-steps is to

maximise the change in potential

�RN�Xk� k� � RN�Xk�1��RN�Xk�� 3�2�2�

It is possible to use line search techniques to determine the optimal time-step for each iteration

of the algorithm. Completing a line search at each step of the iteration, however, is compu-

tationally expensive and often results in worse stability properties for the overall algorithm.

Instead, a simple deterministic formulae for the time-step based on maximising a lower bound

�RlN�Xk� �� for (3.2.2) is provided.

Lemma 3.2.1 For any Xk � St(p� n) such that gradRN�Xk� �� 0, the recursive estimate

Xk�1 � e��k�XkXTk�N �Xk, where

k �jj�XkX

Tk � N �jj2

2ppjjN �XkXT

k � N �2jj � �3�2�3�

satisfies�RN�Xk� k� � RN�Xk�1�� RN�Xk� � 0.

Proof Denote Xk�1�� e�� XkXTk�N �Xk for an arbitrary time-step � . Direct calculations

show

�R�N�Xk� �� 2tr�XT

k�1��N �XkXTk � N �Xk�1��

�R��N�Xk� �� 4tr�XT

k�1��N �XkXTk � N �2Xk�1��

x3-2 A Gradient Descent Algorithm 67

Taylor’s formula for �RN�Xk� �� gives

�RN�Xk� �� 2� tr�XTk N �XkX

Tk � N �Xk�

� 4� 2Z 1

0tr�XT

k�1��N �XkXTk � N �2Xk�1��1� s�ds

2� jj�XkXTk � N �jj2

� 4� 2Z 1

0jjXk�1��X

Tk�1��jjjjN �XkX

Tk � N �2jj�1� s�ds

� 2� jj�XkXTk � N �jj2� 2� 2ppjjN �XkX

Tk � N �2jj �: �Rl

N�Xk� ��

The quadratic nature of RlN�Xk� �� yields a unique maximum occurring at � � k given by

(3.2.3). Observe that if gradRN�Xk� �� 0 then jj�XkXTk � N �jj2 �� 0 and thus Rl

N�Xk� �� 0.

The result follows since �RN�Xk� �� RlN�Xk� �� 0.

Theorem 3.2.2 Given N � NT a real symmetric n � n matrix and p be an integer with

1 � p � n. Denote the eigenvalues of N by �1 � � � �n. For a given estimate

Xk � St(p� n), let k be given by (3.2.3). The Rayleigh gradient algorithm

Xk�1 � e��k�XkXTk�N �Xk�

has the following properties.

i) The algorithm defines an iteration on St(p� n).

ii) Fixed points of the algorithm are critical points of RN , X � St(p� n) such that

�XXT � N � � 0. The columns of a fixed point of (3.2.1) form a basis for a p-dimensional

eigenspace of N .

iii) If Xk, for k � 1� 2� � � �, is a solution to the algorithm then the real sequence RN�Xk�

is strictly monotonic increasing unless there is some k � N with Xk a fixed point of the

algorithm.

iv) Let Xk, for k � 1� 2� � � �, be a solution to the algorithm, then Xk converges to a critical

level set of RN on St(p� n).

v) All critical level sets of RN are unstable except the set for which the Rayleigh quotient

is maximised. The columns of an element of the maximal critical level set form a basis

for the maximal eigenspace of N .


Proof Part i) follows from the observation that e��k�XkXTk�N � is orthogonal. Part ii) is a direct

consequence of Lemma 3.2.1 (since �RN�Xk� k� � 0 if and only if Xk is a fixed point) and

Theorem 3.1.3. Part iii) also follows directly from Lemma 3.2.1.

To prove part iv) observe that since St(p� n) is a compact set, RN�Xk� is a bounded

monotonically increasing sequence which must converge. As a consequence Xk converges

to some level set of RN such that for any X in this set �RN�X� �X�� 0. Lemma 3.2.1

ensures that any X in this set is a fixed point of the recursion.

IfX is a fixed point of the recursion whose columns do not span the maximal p-dimensional

subspace of N then it is clear that there exists an orthogonal matrix U � O�n�, with jjU � Injjarbitrarily small and such that RN�UX� � RN�X�. As a consequence, the initial condition

X0 � UX (jjX0�X jj small) will give rise to a sequence of matricesXk that diverges from the

level set containing X , Lemma 3.2.1. This proves the first statement of v) while the attractive

nature of the remaining fixed points follows from La’Salle’s principle of invariance along with

the Lyapunov function V �X� ��Pp

i�1 �i �RN�X�.

Remark 3.2.3 It is difficult to characterise the exact basin of attraction for the set of matrices

whose columns span the maximal p-eigenspace of N . It is conjectured that the attractive basin

for this set is all of St(p� n) except for other critical points. �

Remark 3.2.4 For a fixed initial condition X0 � St(p� n) let Xk be the solution to (3.2.1).

Define Hk � XkXTk and observe

Hk�1 � e��k�Hk�N �Hke�k�Hk�N �� 3�2�4�

Thus, Hk can be written as a recursion on the set of symmetric rank p projection matrices

fH � Rn�n j H � HT � H2 � H� rank H � pg. The algorithm generated in this manner is

known as the double-bracket algorithm (cf. Chapter 2), a discretization of the continuous-time

double-bracket equation (3.1.10) �

x3-3 Computational Considerations 69

3.3 Computational Considerations

In this section two issues related to implementing (3.2.1) is a digital environment are discussed.

Results in both the following subsections are aimed at reducing the computational cost asso-

ciated with estimating the matrix exponential e��k �XkX�k�N �, a transcendental n � n matrix

function. The result presented in Subsection 3.3.1 is also important in Section 3.4.

3.3.1 An Equivalent Formulation

To implement (3.2.1) on conventional computer architecture the main computational cost for

each step of the algorithm lies in computing the n � n matrix exponential e��k�XkX�k�N �. The

following result provides an equivalent formulation of the algorithm which involves the related

p� p transcendental matrix functions “cos” and “sinc”.

Define the matrix function sinc : Rp�p � Rp�p by the convergent infinite sum

sinc�A� � Ip � A2

3!�A4

5!� A6

7!� �

Observe that Asinc�A� � sin�A� and thus, if A is invertible, sinc�A� � A�1 sin�A�. Define

the matrix function cos�A� by an analogous power series expansion. The matrix functions cos

and sinc are related by cos2�A� � Ip �A2sinc2�A�.

Lemma 3.3.1 GivenN � NT a real symmetric n�n matrix with eigenvalues �1 � � � �n,

let k, for k � 1� 2� � � �, be a sequence of real positive numbers. For X0 � St�p� n� an initial

condition that is not a critical point of RN�X�, then,

Xk�1 � e��k�XkXTk�N �Xk�

� Xk

�cos� kYk�� kX

Tk NXksinc� kYk�

�� kNXksinc� kYk�� (3.3.1)

where the power expansions for cos� kYk� and sinc� kYk� are determined by the positive

semi-definite matrix Y 2k � Rp�p

Y 2k � XT

k N2Xk � �XT

k NXk�2� �3�3�2�


Remark 3.3.2 The matrix Yk need not be explicitly calculated as the power series expansions

of sinc and cos depend only on Y 2k . �

Proof The proof follows from a power series expansion of e��k �XkXTk�N �Xk,

Xk�1 �

��Xl�0

1l!�� k�XkX

Tk � N ��l

�Xk �3�3�3�

Simple algebraic manipulations lead to the relation,

�XkXTk � N �2Xk � �XkY

2k �3�3�4�

where Y 2k is defined by (3.3.2). Pre-multiplying (3.3.4) by�XT

k provides an alternative formula

for Y 2k

Y 2k � XT

k �XkXTk � N �T �XkX

Tk � N �Xk�

which is positive semi-definite.

Using (3.3.4) it is possible to rewrite (3.3.3) as a power series in ��Y 2k �

Xk�1 ��Xm�0

�� k�2m

�2m�!Xk��Y 2

k �m �

�� k�2m�1

�2m� 1�!

�Xk�X

Tk NXk��NXk

��Y 2

k �m

��

�3�3�5�

where the first and second terms in the summation follow from the odd and the even power

powers of �XkXTk � N �lXk respectively. Rewriting this as two separate power series in ��Y 2

k �

Xk�1 � Xk

�Xm�0

�� k�2m

�2m�!��Y 2

k �m � k

�Xk�X

Tk NXk��NXk

� �Xm�0

�� k�2m

�2m� 1�!��Y 2

k �m

� Xk cos� kYk�� k�Xk�X

Tk NXk��NXk

�sinc� kYk��

and the result follows by rearranging terms.

3.3.2 Pade Approximations of the Exponential

It is also of interest to consider approximate methods for calculating matrix exponentials. In

particular, one is interested in methods that will not violate the constraint Xk�1 � St(p� n). A

standard approximation used for calculating the exponential function is a Pade approximation

x3-4 Comparison with Classical Algorithms 71

of order �n�m� where n � 0 and m � 0 are integers (Golub & Van Loan 1989, pg. 557). For

example, a (1,1) Pade approximation of the exponential is

e��k �XkX�k�N � � �In �

k2�XkX

�k� N ��1�In � k

2�XkX

�k� N ��

A key observation is that when n � m and the exponent is skew symmetric the resulting Pade

approximate is orthogonal. Thus,

Xk�1 � �In � k2�XkX

�k� N ��1�In � k

2�XkX

�k� N ��Xk� �3�3�6�

with initial condition X0 � St(p� n), defines an iteration on St(p� n) which approximates the

Rayleigh gradient algorithm (3.2.1). Of course, in practise one would use an algorithm such as

Gaussian elimination (Golub & Van Loan 1989, pg. 92) to solve the linear system equations

�In � k2�XkX

�k� N ��Xk�1 � �In � k

2�XkX

�k� N ��Xk

for Xk�1 rather than computing the inverse explicitly.

The algorithm defined by (3.3.6) can also be rewritten in a similar form to that obtained in

Lemma 3.3.1. Consider the power series expansion

�In � k2�XkX

�k� N ��1 �

�Xi�0

�� k

2�XkX

�k� N �

�i�

From here it is easily shown that

Xk�1 � �Xk ��2Xk � k�Xk�X

�kNXk��NXk�

�Ip �

2k

4Y 2k �

�1� �3�3�7�

where Y 2k � Rp�p is given by (3.3.2).

3.4 Comparison with Classical Algorithms

In this section the relationship between the Rayleigh gradient algorithm (3.2.1) and some classi-

cal algorithms for determining the maximal eigenspace of a symmetric matrix are investigated.

A good discussion of the power method and the steepest ascent method for determining a single


maximal eigenvalue of a symmetric matrix is given by Faddeev and Faddeeva (1963). Practical

issues arising in implementing these algorithms along with direct generalizations to eigenspace

methods are covered by Golub and Van Loan (1989).

3.4.1 The Power Method

In this subsection the algorithm (3.2.1) in the case where p � 1 is considered. It is shown that

for a certain choice of time-step k the algorithm (3.2.1) is the classical power method.

Recall that St�1� n� � Sn�1 the �n� 1�-dimensional sphere inRn.

Theorem 3.4.1 GivenN � NT a real symmetricn�nmatrix with eigenvalues�1 � � � �n.

For xk � Sn�1 let k be given by

k �y2

2p

2jjN �xxT � N �2jj � �3�4�1�

where yk � R is given by

yk ��xTkN

2xk � �xTkNxk�2� 1

2� �3�4�2�

For x0 � St�1� n� � Sn�1 an arbitrary initial condition then:

i) The formulae

xk�1 � e��k �xkxTk�N �xk�

defines a recursive algorithm on Sn�1.

ii) Fixed points of the rank-1 Rayleigh gradient algorithm are the critical points of rN on

Sn�1, and are exactly the eigenvectors of N .

iii) If xk , for k � 1� 2� � � � is a solution to the Rayleigh gradient algorithm, then the real

sequence rN�xk� is strictly monotonic increasing, unless xk is an eigenvector of N .

iv) For a given xk � Sn�1 which is not an eigenvector of N , then yk �� 0 and

xk�1 �

�cos� kyk�� xTkNxk

sin� kyk�yk

�xk �

sin� kyk�yk

Nxk� �3�4�3�


v) Let xk , for k � 1� 2� � � � be a solution to the rank-1 Rayleigh gradient algorithm, then xk

converges to an eigenvector of N .

vi) All eigenvectors of N , considered as fixed points of (3.4.3) are unstable, except the

eigenvector corresponding to the maximal eigenvalue �1, which is exponentially stable.

Proof Parts i)-iii) follow directly from Theorem 3.2.2. To see part iv) observe that yk �

jjgradrN�xk�jj and yk � 0 if and only if gradrN�xk� � 0 and xk is an eigenvector of N .

The recursive iteration (3.4.3) now follows directly from Lemma 3.3.1, with the substitution

sinc� kyk� �sin��kyk��kyk

. Parts v) and vi) again follow directly from Theorem 3.2.2.

Remark 3.4.2 Equation (3.4.3) involves only Nxk, xTkNxk and �Nxk�T �Nxk� vector com-

putations. This structure is especially of interest when sparse or structured matrices N are

considered. �

A geodesic (or great circle) on Sn�1, passing through x at time t � 0, can be written

��t� � cos�t�x� sin�t�V �3�4�4�

where V � ��0� is a unit vector orthogonal to x. Choosing V �xk� �gradrN�xk�jjgradrN�xk�jj

, x � xk

and evaluating ��t� at time t � k jjgradrN�xk�jj gives (3.4.3). Thus, (3.4.3) is a geodesic

interpolation of (3.1.8) the solution to the rank-1 Rayleigh gradient flow (3.1.7).

For a symmetric n� n matrix N � NT the classical power method is computed using the

recursive formula (Golub & Van Loan 1989, pg. 351)

zk � Nxk (3.4.5)

xk�1 �zkjjzkjj �

The renormalisation operation is necessary if the algorithm is to be numerically stable. The

following lemma shows that for N positive semi-definite and a particular choice of k the

rank-1 Rayleigh gradient algorithm (3.4.3) is exactly the power method (3.4.5).

Lemma 3.4.3 Given N � NT a positive semi-definite n � n matrix. For xk � Sn�1 (not an


kx

grad rN ( )xk

N kxk+1x

Sn-1

sp { kx N kx, }

0

Figure 3.4.1: The geometric relationship between the power method iterate and the the iterategenerated by (3.4.3).

eigenvector of N ) then jjgradrN�xk�jj � jjNxjj. Let k be given by

k �1

jjgradrN�xk�jj sin�1� jjgradrN�xk�jj

jjNxkjj�

�3�4�6�

where sin�1�jjgradrN �xk�jj

jjNxkjj

�� 0� ��2�. Then

NxkjjNxkjj �

�cos� kyk�� xTkNxk

sin� kyk�yk

�xk �

sin� kyk�yk

Nxk�

where yk is given by (3.4.2).

Proof Observe that jjgradrN�xk�jj2 � y2k � jjNxkjj2��xTkNx2�2 0 and thus jjgradrN�xk�jj

� jjNxkjj. Consider the 2-dimensional linear subspace spfxk� Nxkg ofRn. The new estimate

xk�1 generated using either (3.4.3) or (3.4.5) will lie in spfxk� Nxkg (cf. Figure 3.4.1). Setting

NxkjjNxkjj �

�cos��yk�� xTkNxk

sin��yk�yk

�xk �

sin��yk�yk

Nxk�

for � � 0 and observing that xk and Nxk are linearly independent then

cos��yk�� xTkNxksin��yk�

yk� 0

andsin��yk�

yk�

1jjNxkjj �

Since N � 0 is positive definite then a real solution to the first relation exists for which

�yk � �0� ��2�. The time-step value is now obtained by computing the smallest positive root

of the second relation.


Choosing N � 0 positive definite in Lemma 3.4.3 ensures that (3.4.3) and (3.4.5) con-

verge ‘generically’ to the same eigenvector. Conversely, if N is symmetric with eigenvalues

�1 � � �n, 0 � �n and j�nj � j�ij then the power method will converge to the eigen-

vector associated with �n while (3.4.3) (equipped with time-step (3.4.1) ) will converge to the

eigenvector associated with �1. Nevertheless, one may still choose k using (3.4.6), with the

inverse sin operation chosen to lie in the interval

sin�1� jjgradrN�xk�jj

jjNxkjj�� 2� ��

such that (3.4.3) and (3.4.5) are equivalent. In this case the geodesics corresponding to each

iteration of (3.4.3) are describing great circles travelling almost from pole to pole of the sphere.

3.4.2 The Steepest Ascent Algorithm

The gradient ascent algorithm for the Rayleigh quotient rN is the recursion (Faddeev &

Faddeeva 1963, pg. 430)

zk � xk � skgradrN�xk� (3.4.7)

xk�1 �zkjjzkjj

where sk � 0 is a real number termed the step-size. It is easily verified that the k� 1’th iterate

of (3.4.7) will also lie on the 2-dimensional linear subspace spfxk� Nxkg ofRn. Indeed for xk

not an eigenvector of N , (3.4.3) and (3.4.7) are equivalent when

sk �1y2k

�1

cos� kyk�� 1

�� 3�4�8�

The optimal step-size for the steepest ascent algorithm (i.e. rN�xk�1�soptk �� rN�xk�1�sk��

for any sk � R) is (Faddeev & Faddeeva 1963, pg. 433)

soptk � 2

�rN �xk�� rN�gradrN�xk�� f�rN�xk�� rN �gradrN �xk��

2 � 4jjrN�xk�jjg12

��1�

�3�4�9�


It follows directly that the optimal time-step selection for (3.4.3) is given by

k �1yk

cos�1

�1

1 � �soptk �2y2

k

��

Substituting directly into (3.4.3) and analytically computing the composition of cos and sin

with cos�1 gives

xk�1 �1

1 � �soptk �2y2

k

��1� s

optk xTkNxk

q2 � �s

optk �2y2

k

�xk �

�s

optk

q2 � �s

optk �2y2

k

�Nxk

��3�4�10�

with soptk given by (3.4.9). This recursion provides an optimal steepest ascent algorithm with

scaling factor 11��sopt

k�2y2

k

which converges to one as xk converges to an eigenvector of N .

3.4.3 The Generalised Power Method

In both the power method and the steepest ascent algorithm the rescaling operation preserves

the computational stability of the calculation. To generalise classical methods to the case where

p � 1, (i.e. Xk � St(p� n)), one must decide on a procedure to renormalise new estimates to

lie on St(p� n). Thus, a generalised power method may be written abstractly

Zk � NXk (3.4.11)

Xk�1 � rescale�Zk��

Since the span of the columns ofXk (denoted sp�Xk�) is the quantity in which one is interested

the rescaling operation is usually computed by generating an orthonormal basis for sp�Zk� (i.e.

using the Gram-Schmidt algorithm (Golub & Van Loan 1989, pg. 218))) Thus, Xk�1 � ZkG,

XTk�1Xk�1 � Ip and where G � Rp�p contains the coefficients which orthonormalise the

columns of Zk. When Zk is full rank thenG is invertible and the factorizationZk � Xk�1G�1

can be computed as a QR factorisation of Zk (Golub & Van Loan 1989, pg. 211). The matrix

G depends on the particular algorithm employed in computing an orthonormal basis for Zk.

When N � 0 is positive definite the power method will act to maximise the generalised

Rayleigh quotientRN (3.1.5). Different choices of G in the rescaling operation, however, will

affect the performance of the power method with respect to the relative change in RN at each


iteration. The optimal choice of G (for maximising the increase in Rayleigh quotient) for the

k’th step of (3.4.11) is given by a solution of the optimization problem

max

fG � Rp�p j GTZTk ZkG � Ipg

tr�GTZT

k NZkG��

where Zk � NXk. The cost criterion tr�GTZT

k NZkG�� RZT

kNZk

�G� is a Rayleigh

quotient while the constraint set is similar in structure to St(p� n). Indeed, it appears that this

optimization problem is qualitatively the same as explicitly solving for the principal components

of N .

One may still hope to obtain a similar result to Lemma 3.4.3 relating the generalised power

method to the Rayleigh gradient algorithm (3.2.1). Unfortunately, this not the case except in

non-generic cases.

Lemma 3.4.4 Given N � NT a symmetric n�n matrix. For any Xk � St(p� n) let Yk be the

unique symmetric, positive semi-definite square root of Y 2k � XT

k N2Xk� �XT

k NXk�2. There

exists a matrix G � Rp�p and scalar k � 0 such that

NXkG � e��k �XkXTk�N �Xk� �3�4�12�

if and only if one can solve

sin2� kYk�XTk NXk � cos� kYk� sin� kYk�Yk � �3�4�13�

for k.

Proof Assume that there exists a matrix G and a scalar k � 0 such that (3.4.12) holds.

Observe that rank�e��k�XkXTk�N �Xk� � p and thus rank�NXk� � p. Similarly G � Rp�p is

non-singular.

Pre-multiplying (3.4.12) by GTXTk N and using the constraint relation GTXT

k N2XkG �

Ip gives

Ip � GTXTk Ne��k�XkX

Tk�N �Xk�


Since one need only consider the case where G is invertible it follows that

G�1 � XTk e

�k�XkXTk�N �NXk�

Lengthy matrix manipulations yield

XTk �XkX

Tk � N �2lNXk � ��1�lY 2l

k XTk NXk� for l � 0� 1� � � � �

and

XTk �XkX

Tk � N �2l�1NXk � ��1�lY 2l�2

k for l � 0� 1� � � � �

Expanding e�k�XkXTk�N � as a power series inY 2

k and then grouping terms suitably (cf. Subsection

3.3.1) one obtains

G�1 � cos� kYk�XTk NXk � sin� kYk�Yk�

Using (3.3.1) for e��k�XkXTk�N �Xk then (3.4.12) becomes

NXk � e��k �XkXTk�N �XkG

�1

��Xk cos� Yk�� k �XkX

Tk � N �Xksinc� kYk�

��cos� kYk�XT

k NXk � sin� kYk�Yk��

Pre-multiplying this by XTk yields

XTk NXk � cos2� kYk�X

Tk NXT

k � cos� kYk� sin� kYk�Yk�

and thus

sin2� kYk�XTk NXT

k � cos� kYk� sin� kYk�Yk�

This shows that (3.4.12) implies (3.4.13). If k solves (3.4.13) then defining G�1 �

XTk e

�k�XkXTk�N �NXk ensures (3.4.12) also holds which completes the proof.

Writing Yk �Pp

i�1 �iyiyTi where fy1� � � � � ypg is a set of orthonormal eigenvectors for Yk ,

whose eigenvalues are denoted �i 0 for i � 1� � � � � p, then (3.4.13) becomes

pXi�1

sin2� k�i�yiyTi X

Tk NXk �

pXi�1

�i cos� k�i� sin� k�i�yiyTi �

Fixing i and pre-multiplying by yTi while also post-multiplying by yi gives the following p


equations for k

either sin� k�i� � 0 or cot� k�i� �1�iyTi X

Tk NXkyi�

for i � 1� � � � � p. It follows that either from the first relation k�i � m� for some integer m or

from the second relation

cot� k� ��i � cot��i�yTi X

Tk NXkyi

�i cot��i� � yTi XTk NXkyi

�

for each i � 1� � � � � p. One can easily confirm from this that the p equations will fail to have a

consistent solution for arbitrary choices of Xk and N . Thus, generically the Rayleigh gradient

algorithm (3.2.1) does not correspond to the power generalised method (3.4.11) for any choice

of rescaling operation or time-step selection.


There remains the issue of characterising the basin of attraction for the Rayleigh gradient

algorithm. Simulations indicate that the only points which are not contained in this set are

the non-minimal critical point of the generalised Rayleigh quotient, however, proving this is

likely to be very difficult. Another area where further insight would be desirable is in the

implementation of the (1,1) Pade approximate algorithm (3.3.6). It would seem likely that

for the time-steps generated by (3.2.3) the (1,1) Pade approximate algorithm would inherit all

the properties of the gradient descent algorithm. This appears to be the case in the simulation

studies undertaken.

In the earlier comparison between the Rayleigh gradient algorithm and classical numerical

linear algebra algorithms no account was taken of the various inverse shift algorithms which

tend to be the accepted computational methods. Incorporating the idea of origin shifts into

dynamical systems solutions of such linear algebra problems is an important question that has

not yet been satisfactorily understood.

In Subsection 3.4.1 it was shown that the rank-1 Rayleigh gradient algorithm is closely

related to the power method. Also related to the power method is an inverse shift algorithm


known as the Rayleigh iteration. Let N � NT � Rn�n be a symmetric matrix and xk � Rn

be some vector which is not an eigenvector of N then a single step of the Rayleigh iteration is

�k �xTkNxk

xTk ck

zk�1 � �N � �kIn��1xk

xk�1 �zk�1

jjzk�1jj �

The Rayleigh iteration converges cubically in a local neighbourhood of any eigenvector of N

(Parlett 1974). By comparing the Rayleigh iteration to the power method and the Rayleigh

gradient algorithm, one is lead to consider an ordinary differential equation of the form

�x � �N � rN�x�In��1x� xT �N � rN �x�In�

�1x x�

where rN�x� � xTNxxx is the Rayleigh quotient. In the vicinity of an eigenvector of N this

differential equation becomes singular and displays finite time convergence to the eigenvector

of N corresponding to the eigenvalue �i� which is the smallest eigenvalue of N such that

�i� � rN�x0�. The connection between singularly perturbed dynamical systems and shifted

numerical linear algebra methods is of considerable interest. There is also a connection to

the theory of differential/algebraic systems. For example the ordinary differential equation

mentioned above is equivalent to the differential/algebraic system

�x � z � �xTz�x

0 � x� �N � rN�x�In�z�

Chapter 4

Pole Placement for Symmetric

Realisations

A classical problem in systems theory is that of pole placement or eigenvalue assignment of

linear systems via constant gain output feedback. This is clearly a difficult task and despite a

number of important results, (cf. Byrnes (1989) for an excellent survey), a complete solution

giving necessary and sufficient conditions for a solution to exist has not been developed. It

has recently been shown that (strictly proper) linear systems with mp � n can be assigned

arbitrary poles using real output feedback (Wang 1992). Here n denotes the McMillan degree

of a system havingm inputs and p outputs. Of course if mp � n for a given linear system then

generic pole assignment is impossible, even when complex feedback gain is allowed (Hermann

& Martin 1977). The case mp � n remains unresolved, though a number of interesting results

are available (Hermann & Martin 1977, Willems & Hesselink 1978, Brockett & Byrnes 1981).

Present results do not apply to output feedback systems with symmetries or structured feedback

systems. More generally, one is also interested in situations where an optimal state feedback

gain is sought such that the closed loop response of the system is a best approximation of a

desired response, though the exact response may be unobtainable. In such cases one would

still hope to find a constructive method to compute the optimal feedback that achieves the

best approximation. The problem appears to be too difficult to tackle directly, however, and

algorithmic solutions are an attractive alternative.

81

82 Pole Placement for Symmetric Realisations Chapter 4

The development given in this chapter is loosely related to a number of recent articles. In

particular, Brockett (1989a) considers a least squares matching task, motivated by problems

in computer vision algorithms, that is related to the system approximation problem, though

his work does not include the effects of feedback. There is also an article by Chu (1992) in

which dynamical system methods are developed for solving inverse singular value problems,

a topic that is closely related to the pole placement question. The simultaneous multiple

system assignment problem considered is a generalisation of the single system problem and is

reminiscent of Chu’s (1991a) approach to simultaneous reduction of several real matrices.

In this chapter I consider a structured class of systems (those with symmetric state space

realisations) for which, to my knowledge, no previous pole placement results are available.

The assumption of symmetry of the realisation, besides having a natural network theoretic

interpretation, simplifies the geometric analysis considerably. It is shown that a symmetric

state space realisation can be assigned arbitrary (real) poles via output feedback if and only if

there are at least as many system inputs as states. This result is surprising since a naive counting

argument (comparing the number of free variables 12m�m� 1� of symmetric output feedback

gain to the number of poles n of a symmetric realization having m inputs and n states) would

suggest that 12m�m�1� n is sufficient for pole placement. To investigate the problem further

gradient flows of least squares cost criteria (functions of the matrix entries of realisations) are

derived on smooth manifolds of output feedback equivalent symmetric realisations. Limiting

solutions to these flows occur at minima of the cost criteria and relate directly to finding optimal

feedback gains for system assignment and pole placement problems. Cost criteria are proposed

for solving the tasks of system assignment, pole placement, and simultaneous multiple system

assignment.

The material presented in this chapter is based on the articles (Mahony & Helmke 1993,

Mahony et al. 1993). The theoretical material contained in Sections 4.1 to 4.4 along with the

simulations in Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the

numerical method proposed in Section 4.6 was presented at the 1993 Conference on Decision

and Control (Mahony et al. 1993). Much of the material presented in this chapter was developed

in conjunction with the results in the monograph (Helmke & Moore 1994b, Section 5.3), which

focusses on general linear systems.

The chapter is divided into seven sections. In Section 4.1 the specific problems considered

x4-1 Statement of the Problem 83

in the sequel are formulated and necessary conditions for generic pole placement and system

assignment are given. Section 4.2 develops the geometry of the set of symmetric state space

systems necessary for the development in later sections. In Section 4.3, a dynamical systems

approach to computing systems assignment problems for the class of symmetric state space

realizations is proposed while Section 4.4 applies the previous results to the pole placement and

the simultaneous multiple system assignment problems. A number of numerical investigations

are given in Section 4.4 which substantiate the theory presented in Sections 4.1 to 4.4. In Section

4.6 a numerical algorithm for computing feedback gains for the pole placement problem is

presented. The chapter concludes with a discussion of open questions and future work in

Section 4.7.

4.1 Statement of the Problem

In this section a brief review of symmetric systems is presented before the precise formulations

of the problems considered in the sequel are given and a pole placement result for symmetric

state space realizations is proved. The reader is referred to Anderson and Vongpanitlerd (1973)

for background material network theory.

A symmetric transfer function is a proper rational matrix function G�s� � Rm�m such that

G�s� � G�s�T �

For any such transfer function there exists a minimal signature symmetric realisation (Anderson

& Vongpanitlerd 1973, pg. 324)

�x � Ax �Bu�

y � Cx�

of G�s� such that �AIpq�T � AIpq and CT � IpqB, with Ipq � diag�Ip��Iq�, a diagonal

matrix with its first p diagonal entries 1 and the remaining diagonal entries -1. A signature sym-

metric realisation is a dynamical model of an electrical network constructed from p capacitors

and q inductors and any number of resistors.


Static linear symmetric output feedback is introduced to a state space model via a feedback

law

u � Ky � v� K � KT �

leading to the “closed loop” system

�x � �A�BKC�x� Bv�

y � BTx��4�1�1�

In particular, symmetric output feedback, where K � KT � Rm�m, preserves the structure of

signature symmetric realisations and is the only output feedback transformation that has this

property.

A symmetric state space system (also symmetric realisation) is a linear dynamical system

�x � Ax �Bu� A � AT (4.1.2)

y � BT x� (4.1.3)

with x � Rn, u� y � Rm, A � Rn�n, B � Rn�m . Without loss of generality assume

that m � n, B is full rank and BTB � Im the m � m identity matrix. Symmetric state

space systems correspond to linear models of electrical RC-networks, constructed entirely

of capacitors and resistors. The networks are characterised by the property that the Cauchy-

Maslov1 index coincides with the McMillan degree. The matrix pair �A�B� � S�n��O�n�m�,where S�n� � fX � Rn�n j X � XTg the set of symmetric n � n matrices and O�n�m� �

fY � Rn�m j XTX � Img is used to represent a linear system of the form (4.1.2) and (4.1.3).

The set O�n�m� is the Stiefel manifold (a smoothnm� 12m�m� 1� dimensional submanifold

of Rn�m) of n�m matrices with orthonormal columns (cf. Lemma 3.1.1).

Two symmetric state space systems �A1� B1� and �A2� B2� are called output feedback

equivalent if

�A2� B2� � � �A1 �B1KBT1 �

T � B1� �4�1�4�

1The Cauchy-Maslov index for a real rational transfer function z�s� is defined as the number of jumps of z�s�from �� to �� less the number from �� to��. Bitmead and Anderson (1977) generalise the Cauchy-Maslovindex to real symmetric rational matrix transfer functions and show that it is equal to p � q, the signature of Ipq(Bitmead & Anderson 1977, Corollary 3.3).


holds for � O�n� � fU � Rn�n j UTU � Ing the set of n � n orthogonal matrices and

K � S�m� the set of symmetric m�m matrices. Thus the system �A2� B2� is obtained from

�A1� B1� using an orthogonal change of basis � O�n� in the state spaceRn and a symmetric

feedback transformation K � S�m�. It is easily verified that output feedback equivalence is

an equivalence relation on the set of symmetric state space systems.

Consider the following problem for the class of symmetric state space systems.

Problem A Given �A�B� � S�n� � O�n�m� a symmetric state space system let �F�G� �S�n��O�n�m� be a symmetric state space system which possesses the desired system structure.

Consider the potential

� : Rn�n �O�n�m�� R�

��A�B� :� jjA� F jj2 � 2jjB � Gjj2�

where jjX jj2 � tr�XTX� is the Frobenius matrix norm. Find a symmetric state space system

�Amin� Bmin� which minimises � over the set of all output feedback equivalent systems to

�A�B�. Equivalently, find a pair of matrices � min� Kmin� � O�n�� S�m� such that

�� K� :� jj �A�BKBT � T � F jj2 � 2jj B � Gjj2�

is minimised over O�n�� S�m�. �

Such a formulation is particularly of interest when structural properties of the desired

realisations are specified. For example, one may wish to choose the “target system” �F�G�

with certain structural zeros. If an exact solution to the system assignment problem exists (i.e.

��Amin� Bmin� � 0) it is easily seen that �Amin� Bmin� will have the same structural zeros as

�F�G�. For general linear systems it is known that the system assignment problem (for general

feedback) is generically solvable only if there are as many inputs and as many outputs as states.

It is not surprising that this is the case for symmetric systems also.

Lemma 4.1.1 Let n and m be integers, n m, and let �F�G� � S�n��O�n�m�. Consider

matrix pairs �A�B� � S�n��O�n�m�.

a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices


� O�n� and K � S�m� such that

�A �BKBT � T � F� B � G�

b) If m � n then the set of �A�B� � S�n� � O�n�m� for which an exact solution to the

system assignment problem exists is measure zero in S�n��O�n�m�. (I.e. for almost all

systems �A�B� � S�n�� O�n�m� no exact solution to the system assignment problem

exists.)

Proof If m � n then O�n�m� � O�n� and BT � B�1. For any �A�B� � S�n� � O�n�

choose � � K� � �GBT � GTFG �BTAB�. Thus,

�A�BKBT � T � GBTABGT �GBTB�GTFG� BTAB�BTBGT � F

and B � GBTB � G.

To prove part b) observe that since output feedback equivalence is an equivalence relation

the set of systems for which the system assignment problem is solvable are exactly those

systems which are output feedback equivalent to �F�G�. Consider the set

F�F�G� � f� �F � BGBT � T � G� j � � K� � O�n�� S�m�g � S�n��O�n�m��

It is shown in Section 4.2 (Lemma 4.2.1) that F�F�G� is a smooth submanifold of S�n� �O�n�m�. But F�F�G� is the image of O�n� � S�m� via the continuous map � � K� �� F �BGBT � T � G� and necessarily has dimension at most dimO�n��S�m� � 1

2n�n�1�� 1

2m�m�1�. The dimension ofS�n��O�n�m�however is 12n�n�1��nm� 1

2m�m�1��

(Helmke & Moore 1994b, pg. 24). Thus,

dimS�n��O�n�m�� dimO�n�� S�m� � �n�m��m� 1��

which is strictly positive for 0 � m � n. Thus, for m � n the set F�F�G� is a submanifold

of S�n�� O�n�m� of non-zero co-dimension and therefore has measure zero.

A similar task to Problem A is that of pole placement for symmetric state space realizations.

The pole placement task for symmetric systems is; given an arbitrary set of numbers s1


� � � sn in R and an initial m � m symmetric transfer function G�s� � GT �s� with a

symmetric realisation, find a symmetric matrix K � S�m� such that the poles of GK�s� �

�Im�G�s�K��1G�s� are exactly s1� � � � � sn. Rather than tackle this problem directly, consider

the following variant of the problem.

Problem B Given �A�B� � S�n�� O�n�m� a symmetric state space system let F � S�n�

be a symmetric matrix. Define

��A�B� :� jjA� F jj2�� K� :� jj �A�BKBT � T � F jj2�

Find a symmetric state space system �Amin� Bmin� which minimises� over the set of all output

feedback equivalent systems to �A�B�. Respectively, find a pair of matrices � min� Kmin� �O�n�� S�m� which minimises over O�n�� S�m�. �

Problem B minimises a cost criterion that assigns the full eigenstructure of the closed loop

system. Two symmetric matrices have the same eigenstructure (up to orthogonal similarity

transformation) if and only if they have the same eigenvalues (since any symmetric matrix may

be diagonalised via an orthogonal similarity transformation). Thus, Problem B is equivalent

to solving the pole placement problem for symmetric systems (assigning the eigenvalues of

the closed loop system). The advantage of considering Problem B rather than a standard

formulation of the pole placement task lies in the smooth nature of the optimization problem

obtained.

It is of interest to consider generic conditions on symmetric state space systems for the

existence of an exact solution to Problem B (i.e. the existence of � min� Kmin� such that

� min� Kmin� � 0). This is exactly the classical pole placement question about which much

is known for general linear systems (Byrnes 1989, Wang 1992). The following result answers

(at least in part) this question for symmetric state space systems. It is interesting to note that

the necessary conditions for “generic” pole placement for symmetric state space systems are

much stronger than those for general linear systems.

Lemma 4.1.2 Let n and m be integers, n m, and let F � S�n� be a real symmetric matrix.

Consider matrix pairs �A�B� � S�n��O�n�m�.


a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices

� O�n� and K � KT � Rm�m such that

�A�BKBT � T � F� �4�1�5�

b) If m � n then there exists an open set of matrix pairs �A�B� � S�n�� O�n�m� of the

above form such that eigenstructure assignment (to the matrix F ) is impossible.

Proof Part a) follows directly from Lemma 4.1.1.

Observe that the set of matrix pairs f�A�B� j A � BBTABBT g �� S�n� � O�n�m� is

Zariski closed in S�n��O�n�m� and consequently of measure zero (cf. Martin and Hermann

(1977a) for a basic discussion Zariski closed sets). There exists a matrix pair �A�B� �S�n�� O�n�m� and matrices � O�n� and K � KT � Rm�m such that (4.1.5) is satisfied

and A �� BBTABBT or else part b) is trivially true. Direct manipulations of (4.1.5), recalling

that BTB � Im, yield

K � BT � TF �A�B�

Substituting this back into (4.1.5) gives

TF � �A� BBTABBT � �BBT TF BBT �

Observe that

tr��A�BBTABBT �TBBT TF BBT

�� tr

��BBT �A� BBTABBT �BBT � TF

�� 0�

and taking the squared Frobenius norm of TF gives

jjF jj2 � jj�A�BBTABBT �jj2 � jjBBT TF BBT jj2�

recalling that the Frobenius norm is invariant under orthogonal transformations. It follows

directly that jjF jj2 jj�A�BBTABBT �jj2.

Since �A�B� was chosen deliberately such that A �� BBTABBT one may consider the


related matrix pair �A�� B�� A�B�, where � ��

jjF jj2�1jjA�BBTABBT jj2

� 12 . By construction

jj�A� �B�B�TA�B�B�T �jj2 � jjF jj2 � 1 � jjF jj2

and no solution to the eigenstructure assignment problem exists for the system �A�� B��.

Moreover, the map �A�B� � jj�A � BBTABBT �jj2 is continuous and it follows that there

is an open neighbourhood of systems around �A�� B�� for which the eigenstructure assignment

task cannot be solved.

Remark 4.1.3 It follows directly from the proof of Lemma 4.1.2 that eigenstructure assignment

of a symmetric state space system �A�B� � S�n��O�n�m� to an arbitrary closed loop matrix

F � S�n� is possible only if

jjF jj2 jjA�BBTABBT jj2�

�

Remark 4.1.4 One may weaken the hypothesis of Lemma 4.1.2 considerably to deal with

matrix pairs �A�B� � S�n�� Rn�m, for which B is not constrained to satisfy BTB � Im

and for which m may be greater than n. The analogous statement is that eigenstructure

assignment is generically possible if and only if rankB n. The proof is similar to that given

above observing that the projection operator BBT (for BTB � Im) is related to the general

projection operator B�BTB�yBT , where y represents the pseudo-inverse of a matrix. For

example, the feedback matrix yielding exact system assignment for rankB n is

K � �BTB�yBT �F � A�B�BTB�y�

�

A further problem considered is that of simultaneous multiple system assignment. This is a

difficult problem about which very little is presently known. The approach taken is to consider

a generalisation of the cost criterion � for a single system.


Problem C For any integer N � N let �A1� B1�� AN � BN� and �F1� G1�� FN � GN�

be two sets of N symmetric state space systems. Define

�N� � K� :�NXi�1

jj �Ai � BiKBTi �

T � Fijj2 � 2NXi�1

jj Bi �Gijj2�

Find a pair of matrices � min� Kmin� � O�n��S�m�which minimises�N over O�n��S�m�.�

4.2 Geometry of Output Feedback Orbits

It is necessary to briefly review the Riemannian geometry of the spaces on which the optimiz-

ation problems stated in Section 4.1 are posed. The reader is referred to Helgason (1978) and

the development in Chapter 5 for technical details on Lie-groups and homogeneous spaces and,

Helmke and Moore (1994b) for a development of dynamical systems methods for optimization

along with applications in linear systems theory.

The set O�n�� S�m� forms a Lie group under the group operation � 1� K1� � 2� K2� �

� 1 2� K1�K2�. It is known as the output feedback group for symmetric state space systems.

The tangent spaces of O�n�� S�m� are

T��K��O�n�� S�m�� f�� j � � Sk�n�� S�m�g�

where Sk�n� � f� � Rn�n j � � ��T g the set of n � n skew symmetric matrices. The

Euclidean inner product onRn�n �Rn�m is given by

h�A�B�� X� Y �i � tr�ATX� � tr�BTY �� 4�2�1�

By restriction, this induces a non-degenerate inner product on the tangent space T�In�0��O�n��S�m�� Sk�n� � S�m�. The Riemannian metric considered on O�n� � S�m� is the right

invariant group metric

h��1 �1�� 2 �2�i � 2tr��T1 �2� � 2tr�T1 2��

x4-2 Geometry of Output Feedback Orbits 91

The right invariant group metric is generated by the induced inner product onT�In �0��O�n��S�m��, mapped to each tangent space by the linearization of the diffeomorphism �� k� �� k�K� for � � K� � �O�n��S�m��. It is readily verified that this defines a Riemannian

metric which corresponds, up to a scaling factor, to the induced Riemannian metric on O�n��S�m� considered as a submanifold of Rn�n �Rn�m. The scaling factor 2 serves to simplify

the algebraic expressions obtained in the sequel.

Let �A�B� � S�n�� O�n�m� be a symmetric state space system. The symmetric output

feedback orbit of �A�B� is the set

F�A�B� � f� �A�BTKB� T � B� j � O�n�� K � S�m�g� �4�2�2�

of all symmetric realisations that are output feedback equivalent to �A�B�. Observe that no

assumption on the controllability of the matrix pair �A�B� is made.

Lemma 4.2.1 The symmetric output feedback orbit F�A�B� is a smooth submanifold of

S�n�� O�n�m� with tangent space at a point �A�B� given by

T�A�B�F�A�B� � f�� A� �BBT ��B� j � � Sk�n�� S�m�g� �4�2�3�

Proof The set F�A�B� is an orbit of the smooth semi-algebraic group action

� : �O�n�� S�m�� S�n��O�n�m�� S�n��O�n�m��

�� K�� A�B�� :� � �A�BTKB� T � B�� (4.2.4)

It follows that F�A�B� is a smooth submanifold of S�n� � Rn�m (cf. Proposition 5.2.2 or

Gibson (1979, Appendix B)). For an arbitrary matrix pair �A�B� the map

f� � K� :� � �A� BKBT � T � B�

is a smooth submersion of O�n� � S�m� onto F�A�B� (Gibson 1979, pg. 74). The tangent

space of F�A�B� at �A�B� is the range of the linearization of f at �In� 0�


T�In�0�f : T�In�0��O�n�� S�m�� T�A�B�F�A�B�� A� � BBT ��B�� Sk�n�� S�m��

The spaceF�A�B� is also a Riemannian manifold when equipped with the so-called normal

metric (cf. Section 5.3). Fix �A�B� � S�n�� O�n�m� a symmetric state space system and

consider the map f� � K� :� � �A� BKBT � T � B�. The tangent map T�In�0�f induces

a decomposition

T�In�0��O�n�� S�m�� kerT�In�0�f � dom T�In�0�f�

where

kerT�In�0�f � f�� Sk�n�� S�m� j �A�� B�BT ��B � 0g

is the kernel of T�In�0�f and

dom T�In�0�f � f�� Sk�n�� S�m� j tr��T�� 0� tr��T�� 0�

for all �� kerT�In�0�fg

is the orthogonal complement of the kernel with respect to the Euclidean inner product

(4.2.1). Formally, the normal Riemannian metric on F�A�B� is the inner product (4.2.1) on

T�In�0��O�n��S�m�� restricted to dom T�In�0�f and induced onT�A�B�F�A�B� via the isomor-

phism T�In�0�f� : dom T�In�0�f � T�A�B�F�A�B�, the restriction of T�In�0�f to dom T�In�0�f .

Thus, for two tangent vectors ��i� A��BiBT ��iB� � T�A�B�F�A�B�, i � 1� 2, the normal

Riemannian metric is computed as

hh��1� A��B1BT ��1B�� 2� A��B2B

T ��2B�ii � 2tr��1 �T��2 � � 2tr��1 �

T�2 ��

Here ��i�i� � ��i�� i�� i ��i � � kerT�In�0�f � dom T�In�0�f for i � 1� 2. It is

readily verified that this construction defines a Riemannian metric on F�A�B�.

x4-3 Least Squares System Assignment 93

4.3 Least Squares System Assignment

In this section Problem A is considered, i.e. the question of computing a symmetric state space

linear system in a given orbit F�A�B� that most closely approximates a given “target” system

in a least squares sense. A brief analysis of the cost functions � and � is given leading

to existence results for global minima. Gradient flows of the cost functions are derived and

existence results for their solutions are given.

Lemma 4.3.1 Let �F�G�� A�B� � S�n��O�n�m� be symmetric state space linear systems.

a) The function � : O�n�� S�m�� R,

�� K� :� jj �A� BKBT � T � F jj2 � 2jj B � Gjj2�

has compact sublevel sets. I.e. the sets f� � K� � O�n� � S�m� j �� K� � g for

any 0, are compact subsets of O�n�� S�m�.

b) The function� : F�A�B�� R,

��A�B� :� jjA� F jj2 � 2jjB � Gjj2�

has compact sublevel sets.

Proof The triangle inequality yields both

jjKjj2 � jjBKBT jj2 � 2�jjA� BKBT jj2 � jjAjj2�

and

jjA�BKBT jj2 � jj �A� BKBT � T jj2 � 2�jj �A�BKBT � T � F jj2 � jjF jj2��

Thus, for � � K� � O�n�� S�m� one has

jjKjj2 � 2�2�jj �A� BKBT � T � F jj2 � jjF jj2� � jjAjj2�� 4�jj �A�BKBT � T � F jj2 � 2jj B �Gjj2� � 4jjF jj2 � 2jjAjj2�


� 4�� K� � 4jjF jj2 � 2jjAjj2�

where a factor of 8jj B�Gjj2 is added to the middle line to give the correct terms for the cost

�. Thus, for � � K� � O�n�� S�m�, satisfying �� K� � , one has

jj� � K�jj2 � jj jj2 � jjKjj2

� tr� T � � 4�� K�� 4jjF jj2 � 2jjAjj2

� n� 4 � 4jjF jj2 � 2jjAjj2�

and the sublevel sets of � are bounded. Since � is continuous the sublevel sets are closed

and compactness follows directly (Munkres 1975, pg. 174). Part b) follows by observing that

� � � � f , where f� � K� � � �A�BKBT � T � B� for given �A�B�. Thus, the sublevel

sets of� are exactly the images of the corresponding sublevel sets of� via the continuous map

f . Since continuous images of compact sets are themselves compact (Munkres 1975, pg. 167)

the proof is complete.

Corollary 4.3.2 Let �F�G�� A�B� � S�n� � O�n�m� be symmetric state space linear sys-

tems.

a) There exists a global minimum � min� Kmin� � O�n�� S�m� of �,

�� min� Kmin� � inff�� K� j � � K� � O�n�� S�m�g�

b) There exists a global minimum �Amin� Bmin� � F�A�B� of �,

��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�

c) The submanifoldF�A�B� � S�n��O�n�m� is closed in S�n��Rn�m.

Proof To prove part a), choose 0 such that the sublevel set J � f� � K� j �� K� � gis non empty. Then �jJ : J � �0� � is a continuous map2 from a compact space into the

2Let f : M � N be a map between two sets M and N . Let U �M be a subset of M , then f jU : U � N isthe restriction of f to the set U .


reals and the minimum value theorem (Munkres 1975, pg. 175) ensures the existence of

� min� Kmin�. The proof of part b) is analogous to that for part a).

To prove c) Assume that F�A�B� is not closed. Choose a boundary point �F�G� �F�A�B��F�A�B� in the closure3 ofF�A�B�. By part b) there exists a minimum �Amin� Bmin� �F�A�B� such that

��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�

� 0

since �F�G� is in the closure ofF�A�B�. But this implies jjAmin�F jj2�2jjBmin�Gjj2 � 0 and

consequently �Amin� Bmin� � �F�G�. This contradicts the assumption that �F�G� �� F�A�B�.

Having determined the existence of a solution to the system assignment problem one may

consider the problem of computing the global minima of the cost functions � and �.

Theorem 4.3.3 Let �A�B�� F�G� � S�n�� O�n�m� be symmetric state space systems. Let

� : F�A�B�� R� ��A�B� :� jjA� F jj2 � 2jjB �Gjj2� �4�3�1�

measure the Euclidean distance between two symmetric realisations. Then

a) The gradient of � with respect to the normal metric is

grad��A�B� �

�B� ��A� ��A� F � �BGT � GBT �� BBT �A� F �BBT

��A� F � �BGT � GBT �B

�CA �

�4�3�2�

b) The critical points of � are characterised by

�A� F � � GBT �BGT �

0 � BT �A� F �B��4�3�3�

3Let U �M be a subset of a topological spaceM . The closure of U , denotedU , is the intersection of all closedsets in the topology which contain the set U .


c) Solutions of the gradient flow � �A� �B� � �grad��A�B�,

�A � �A� ��A� F � �BGT � GBT �� BBT �A� F �BBT

�B � ��A� F � � BGT �GBT �B�4�3�4�

exist for all time t 0 and remain in F�A�B�.

d) Any solution to (4.3.4) converges as t�� to a connected set of matrix pairs �A�B� �F�A�B� which satisfy (4.3.3) and lie in a single level set of �.

Proof The gradient is computed using the identities4

�i� D�j�A�B� �� hhgrad��A�B�� ii� � � �� A� � BBT ��B� � T�A�B�F�A�B��ii� grad��A�B� � T�A�B�F�A�B��

Computing the Frechet derivative of � in direction �� A� �BBT ��B� gives

D�j�A�B� �� A� �BBT ��B�

� 2tr��A� F �T �� A� �BBT �� 4tr��B � G�T�B�

� 2tr��A� F�A� � 2B�B �G�T �� 2tr�BT �A� F �B� (4.3.5)

� hh��A� F � �BGT � GBT �� A� � BBT �A� F �BBT � ��A� F � �BGT � GBT �B

��

�� A� �BBT ��B�ii�

When deriving the above relations it is useful to recall that

�� A� �BBT ��B� � �� A� � B�BT ��B�

where �� , (cf. the discussion of normal metrics at the end of

Section 4.2). Observing that ��A� F � � BGT �GBT � � Sk�n� while BT �A� F �B � S�m�

completes the proof of part a).

4 D�j�A�B� �� is the Frechet derivative of � in direction � (Helmke & Moore 1994b, pg. 334).


To prove b), observe that the first identity ensures that the Frechet derivative at a critical

point (grad� � 0) is zero in all tangent directions. Setting (4.3.5) to zero, yields

2tr��F�A� � GBT � BGT �� 2tr�BT �A� F �B� � 0

for arbitrary �� Sk�n�� S�m� and �A�B� a critical point of �.

For given initial conditions �A�0�� B�0�� solutions of (4.3.4) will remain in the sublevel

set f�A�B� � F�A�B� j ��A�B� � ��A�0�� B�0��g. Since this set is compact, Lemma

4.3.1, infinite time existence of the solution follows. This proves c) while d) follows from an

application of La’Salle’s invariance principle.

Remark 4.3.4 Let N�s�D�s��1 be a coprime factorisation of the symmetric transfer function

G�s� � BT �sIn � A��1B. Then the coefficients of the polynomial matrix N�s� � R�s�m�m

are invariants of the flow (4.3.4). In particular the zeros of the system �A�B�BT � are invariant

under the flow (4.3.4) (Kailath 1980, Exercise 6.5-4, pg. 464). �

Remark 4.3.5 It would be desirable to interpret the equilibrium condition (4.3.3) in terms of

the properties of the linear system �A�B�. Unfortunately, this appears to be a difficult task. �

The above theorem provides a method of investigating best approximations to a given

“target system” within a symmetric output feedback orbit. However, it does not provide

any explicit information on the changing feedback transformations � �t�� K�t��. To obtain

such information a related flow on the output feedback group O�n� � S�m� is proposed.

The following result generalises work by Brockett (1989a) on matching problems. Brockett

considers similar cost functions but only allows state space transformations rather than output

feedback transformations.

Theorem 4.3.6 Let �A�B�� F�G� � S�n��O�n�m�be symmetric state space linear systems.

Define

� : O�n�� S�m�� R�

�� K� :� jj �A�BKBT � T � F jj2 � 2jj B �Gjj2 (4.3.6)

then:


a) The gradient of � with respect to the right invariant group metric is

grad�� K� �

�B� � �A� BKBT � T � F � � � BGT �GBT T �

BT �A�BKBT � TF �B

�CA � �4�3�7�

b) The critical points of � are characterised by

�F� �A�BKBT � T � � � BGT �GBT T ��

K � BT � TF �A�B��4�3�8�

c) Solutions of the gradient flow � � � �K� � �grad�� K�

� � �� A�BKBT � T � F � � � BGT � GBT T � �

�K � �BT �A� BKBT � TF �B��4�3�9�

exist for all time t 0 and remain in a bounded subset of O�n� � S�m�. Moreover,

as t � � any solution of (4.3.9) converges to a connected subset of critical points in

O�n�� S�m� which are contained in a single level set of �.

d) If � �t�� K�t�� is a solution to (4.3.9) then

�A�t��B�t�� t��A�BK�t�BT � �t�T � �t�TB�

is a solution of (4.3.4).

Proof The computation of the gradient is analogous to that undertaken in the proof of Theorem

4.3.3 while the characterisation of the critical points follows directly from setting (4.3.7) to

zero. The proof of c) is also analogous to the proof of parts c) and d) in Theorem 4.3.3.

The linearization of f� � K� :� � �A�BKBT � T � TB� is readily computed to be

T��K�f�� A� � BBT ��B�

where A � �A� BKBT � T and B � B. The image of � � � �K� via this linearization is


T��K�f� � � �K� ��A� �A� F � � �BGT �GB�� BBT �A� F �BBT �

� ��A� F � � BGT �GBT �B��

Consequently � �A� �B� � �grad��A�B�. Classical O.D.E. uniqueness results complete the

proof.

The following lemma provides an alternative approach (to that given in Lemma 4.3.1) for

determining a bound on the feedback gain K�t�. The method of proof for the following result

is of interest and the result obtained is somewhat tighter than that obtained in Lemma 4.3.1.

Lemma 4.3.7 Let � �t�� K�t�� be a solution of (4.3.9). Then the bound

jjK�t��K�0�jj2 � 12��T �0�� K�0��

holds for all time.

Proof Integrating out (4.3.9) for initial conditions � 0� K0� and then taking norms gives the

integral bound

jj �t�� 0jj2 � jjK�t��K0jj2 � jjZ t

0grad�� K��d� jj2

�Z t

0jjgrad�� K��jj2d�

�12

Z t

0hgrad�� K�� grad�� K��id��

Alsod

dt�� t�� K�t�� hgrad�� t�� K�t�� grad�� t�� K�t��i�

and thus, integrating between 0 and t and recalling that 0 � �� t�� K�t�� 0�� K�0��

for all t 0 one obtains

R t0 hgrad�� K�� grad�� K��id�

� �� 0�� K�0�� t�� K�t�� 0�� K�0��


and consequently

jj �t�� 0jj2 � jjK�t��K0jj2 � 12�� 0�� K�0��

The result follows directly.

It is advantageous to consider a closely related flow that evolves only on O�n� rather

than the full output feedback group O�n� � S�m�. The following development uses similar

techniques to those proposed by Chu (1992).

Let �A�B� � S�n�� O�n�m� be a given symmetric state space system and define

L � fBKBT j K � S�m�g

to be the linear subspace of S�n� corresponding to the range of the linear map K � BKBT .

Define L� to be the orthogonal complement of L with respect to the Euclidean inner product

onRn�n . The projection operators

P : S�n�� L� P�X� :� BBTXBBT �4�3�10�

and

Q : S�n�� L�� Q�X� :� �I�P��X� � X � BBTXBBT �4�3�11�

are well defined. Here Irepresents the identity operator and BTB � Im by assumption. The

tangent space of O�n� at a point is TO�n� � f� j � � Sk�n�g with Riemannian metric

h�1 ��2 i � 2tr��T1 �2�, corresponding to the right invariant group metric on O�n�.

Theorem 4.3.8 Let �A�B�� F�G� � S�n� � O�n�m� be symmetric state space systems.

Define

�� : O�n�� R

�� :� jjQ�A� TF �jj2 � 2jj B �Gjj2 (4.3.12)

then,


a) The gradient of �� with respect to the right invariant group metric is

grad�� Q�A� TF � T � F � � � BGT �GBT T � �

b) The critical points � O�n� of �� are characterised by

�F� Q�A� TF � T � � � BGT �GBT T ��

and correspond exactly to the orthogonal matrix component of the critical points (4.3.8)

of �.

c) The negative gradient flow minimising�� is

� � �F� Q�A� TF � T � � � BGT � GBT T � � �0� � 0� �4�3�13�

Solutions to this flow exist for all time t 0 and converge as t �� to a connected set

of critical points contained in a level set of ��.

Proof The gradient and the critical point characterisation are proved as for Theorem 4.3.3.

The equivalence of the critical points is easily seen by solving (4.3.8) for independently

of K. Part c) follows from the observation that (4.3.13) is a gradient flow on a compact

manifold.

Fixing constant in the second line of (4.3.9) yields a linear differential equation in K

with solution

K�t� � e�t�K�0� �BT �A� TF �B� �BT �A� TF �B�

It follows that K�t�� BT �A� TF �B as t��. Observe that

�� jjQ�A� TF �jj2 � 2jj B � Gjj2

� jj �A�B�BT �A� TF �B�BT � T � F jj2 � 2jj B �Gjj2

� �� BT �A� TF �B��


Recall also that for exact system assignment it has been shown that K � BT � F T �A�B, Lemma 4.1.2. Thus, it is reasonable to consider solutions �t� of (4.3.13) along with

continuously changing feedback gain

K�t� � BT � �t�TF �t�� A�B� �4�3�14�

as an approach to solving least squares system assignment problems. A numerical scheme

based on this approach is presented in Section 4.6.

4.4 Least Squares Pole placement and Simultaneous System Assign-

ment

Having developed the necessary tools it is a simple matter to derive gradient flow solutions to

Problem B and Problem C described in Section 4.1.

Corollary 4.4.1 Pole Placement Let �A�B� � S�n�� O�n�m� be a symmetric state space

system and let F � S�n� be a given symmetric matrix. Define

� : F�A�B�� R� �A�B� � jjA� F jj2� : O�n�� S�m�� R� � � K� � jj �A�BKBT � T � F jj2�

then

a) The gradient of � and with respect to the normal and the right invariant group metric

respectively are

grad��A�B� �

�B� ��A� �A� F �� BBT �A� F �BBT

�A� F �B

�CA � �4�4�1�

and

grad� � K� �

�B� � �A�BKBT � T � F �

BT �A� BKBT � TF �B

�CA � �4�4�2�

x4-4 Least Squares Pole placement and Simultaneous System Assignment 103

b) The critical points of � and are characterised by

�A� F � � 0

BT �A� F �B � 0� (4.4.3)

and

� �A�BKBT � T � F � � 0

BT � TF �A�B � K� (4.4.4)

respectively.

c) Solutions of the gradient flows � �A� �B� � �grad��A�B�

�A � �A� �A� F �� BBT �A� F �BBT

�B � ��A� F �B�4�4�5�

exist for all time t 0 and remain in F�A�B�. Moreover, any solution of (4.4.5)

converges as t �� to a connected set of matrix pairs �A�B� � F�A�B� which satisfy

(4.4.3) and lie in a single level set of �.

d) Solutions of the gradient flow � � � �K� � �grad� � K�

� � �� A�BKBT � T � F �

�K � �BT �A� BKBT � TF �B�4�4�6�

exist for all time t 0 and remain in a bounded subset of O�n� � S�m�. Moreover,

as t � � any solution of (4.4.6) converges to a connected subset of critical points in

O�n�� S�m� which are contained in a single level set of �.

e) If � �t�� K�t�� is a solution to (4.4.6) then� �t��A� BK�t�BT � T �t�� T �t�B

�is a

solution of (4.4.5).

Proof Consider the symmetric state space system �A�B� � S�n�� O�n�m� and the matrix

pair �F�G0� � S�n�� Rn�m where G0 is the n �m zero matrix. Observe that ��A�B� �

��A�B� � 2jjBjj2 and similarly �� K� � � � K� � 2jjBjj2, where � and � are given by


(4.3.1) and (4.3.6) respectively. Since the norm jjBjj2 is constant on F�A�B� the structure

of the above optimization problems is exactly that considered in Theorem 4.3.3 and Theorem

4.3.6. The results follow as direct corollaries.

Similar to the discussion at the end of Section 4.3 the pole placement problem can be solved

by a gradient flow evolving on the orthogonal group O�n� alone.

Corollary 4.4.2 Let �A�B� � S�n� � O�n�m� be a symmetric state space system and let

F � S�n� be a symmetric matrix. Define

�� : O�n�� R

�� :� jjQ�A� TF �jj2

where Q�X� � �I�P��X� � X � BBTXBBT (4.3.11). Then,

a) The gradient of �� with respect to the right invariant group metric is

grad�� Q�A� TF � T � F � �

b) The critical points � O�n� of �� are characterised by

�F� Q�A� TF � T � � 0�

and correspond exactly to the orthogonal matrix component of the critical points (4.4.4)

of �.

c) The negative gradient flow minimising �� is

� � �F� Q�A� TF � T � � �0� � 0� �4�4�7�

Solutions to this flow exist for all time t 0 and converge as t �� to a connected set

of critical points contained in a level set of ��.

Proof Consider the matrix pair �F�G0� � S�n��Rn�m where G0 is the n�m zero matrix.

It is easily verified that�� 2jjBjj2 where �� is given by (4.3.12). The corollary

follows as a direct consequence of Theorem 4.3.8.

x4-4 Least Squares Pole placement and Simultaneous System Assignment 105

Simultaneous system assignment is known to be a hard problem which generically does

not have an exact solution. The best that can be hoped for is an approximate solution provided

by a suitable numerical technique. The following discussion is a direct generalisation of the

development given in Section 4.3. The generalisation is similar to that employed by Chu

(1991a) when considering the simultaneous reduction of real matrices.

For any integerN � N let �A1� B1�� AN � BN� � S�n��O�n�m�be given symmetric

state space systems. The output feedback orbit for the multiple system case is

F��A1� B1�� AN � BN�� :�

f� �A1 � B1KBT1 �

T � B1�� AN � BNKBTN �

T � BN � j � O�n�� K � S�m�g�

An analogous argument to Lemma 4.2.1 shows that F��A1� B1�� AN � BN�� is a smooth

manifold. Moreover, the tangent space is given by

T��A1�B1��AN �BN��F��A1� B1�� AN � BN��

f�� A1� �B1BT1 ��B1�� AN � � BNB

TN ��BN � j � � Sk�n�� S�m�g�

Indeed, F��A1� B1�� AN � BN �� is a Riemannian manifold when equipped with the normal

metric, defined analogously to the normal metric on F�A�B�.

Corollary 4.4.3 For any integerN � N let �A1� B1�� AN � BN � and �F1� G1�� , �FN � GN�

be two sets of N symmetric state space systems. Define

�N � F��A1� B1�� AN � BN �� R

�N ��A1� B1�� AN � BN�� :�NXi�1

�jjAi � Fijj2 � 2jjBi �Gijj2

�

and

�N � O�n�� S�m�� R

�N� � K� :�NXi�1

�jj �Ai �BiKBT

i � T � Fijj2 � 2jj Bi �Gijj2

��

Then,


a) The negative gradient flows of �N and �N with respect to the normal and the right

invariant group metric are

�Ai � �Ai�NXj�1

��Aj � Fj� � BjG

Tj �GjB

Tj

��

NXj�1

BiBTj �Aj � Fj�B

Tj Bi�

�Bi � �NXj�1

��Aj � Fj � � BjGTj �GjB

Tj �Bi� (4.4.8)

for i � 1� � � �N , and

� �NXj�1

��Aj � Fj� �BjG

Tj � GjB

Tj

�

�K � �NXj�1

BTj �Aj � BjKBT

j � Fj T �Bj � (4.4.9)

respectively.

b) The critical points of �N and �N are characterised by

NXj�1

�Aj � Fj � �NXj�1

�GjB

Tj � BjG

Tj

�NXj�1

BTj �Aj � Fj�Bj � 0� (4.4.10)

and

NXj�1

� �Aj � BjKBTj �

T � Fj � �NXj�1

� BjG

Tj �GjB

Tj

T�

K �NXj�1

BTj � Fj

T �Aj�Bj � (4.4.11)

respectively.

c) Solutions of the gradient flow (4.4.8) exist for all time t 0 and remain in F��A1� B1�,

� � � � �AN � BN ��. Moreover, any solution of (4.4.8) converges as t � � to a connected

set of matrix pairs ��A1� B1�, � � �, �AN � BN�� F��A1� B1�, � � �, �AN � BN�� which

satisfy (4.4.10) and lie in a single level set of �N .

x4-5 Simulations 107

d) Solutions of the gradient flow (4.4.9) exist for all time t 0 and remain in a bounded

subset of O�n� � S�m�. Moreover, as t � � any solution of (4.4.9) converges to a

connected subset of critical points in O�n��S�m�which are contained in a single level

set of �N .

e) If � �t�� K�t�� is a solution to (4.4.6) then �Ai�t��Bi�t�� Ai�BiKBTi �

T � Bi�,

for i � 1� � � � � N , is a solution of (4.4.8).

Proof Observe that the potentials�N and �N are linear sums of potentials of the form � and

� considered in Theorem 4.3.6 and Theorem 4.3.3. The proof is then a simple generalisation

of the arguments employed in the proofs of these theorems.

4.5 Simulations

A number of simulations studies have been completed to investigate the properties of the

gradient flows presented and obtain general information about the system assignment and pole

placement problems5.

In the following simulations the solutions of the ordinary differential equations considered

were numerically estimated using the MATLAB function ODE45. This function integrates

ordinary differential equations using the Runge-Kutta-Fehlberg method with an automatic step

size selection. Numerical integration is undertaken using fourth order Runge-Kutta method

while the accuracy of each iteration over the step length is checked against a fifth order method.

At each step of the interpolation the step length is reduced until the error between the fourth

and fifth order method estimates is less than a pre-specified constantE � 0. In the simulations

undertaken the error bound was set to E � 1 � 10�7, this allowed for reasonable accuracy

without excessive computational cost.

Due to Lemma 4.1.1 one does not expect to see convergence of the solution of (4.3.4) to an

exact solution of the System Assignment problem for arbitrary initial condition (unless n � m

in which case a solution can be computed algebraically). The typical behaviour of solutions to

5Indeed, computing the gradient flows (4.3.4) and (4.4.1) has already helped in understanding of the poleplacement and system assignment tasks since it was the non-convergence of the original simulations that lead to afurther investigation of the existence of exact solutions to the problems, and eventually to Lemmas 4.1.1 and 4.1.2.


Time

Pote

ntia

l Ψ

Figure 4.5.1: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a typical solution to (4.3.4).

(4.3.4) is shown in Figure 4.5.1, where the potential,��A�t�� B�t��, for �A�t�� B�t�� a solution

to (4.3.4), is plotted verses time. The potential is plotted on log10 scaled axis for all the plots

presented to display the linear convergence properties of the solution. The initial conditions

�A0� B0� � S�5� � O�5� 4� and the target system �F�G� � S�5� � O�5� 4� were randomly

generated apart from symmetry and orthogonality requirements. The state dimension, n � 5,

and the input and output dimension, m � 4, were arbitrarily chosen. Similar behaviour is

obtained for all simulations for any choice of n and m for which m � n. In Figure 4.5.1,

observe that the potential converges to a non-zero constant limt��A�t�� B�t�� 9�3. For

the limiting value of the solution to be an exact solution to the system assignment problem one

would require limt��A�t�� B�t�� 0.

In contrast, Lemma 4.1.2 ensures only that the pole placement task is not solvable on some

open set of symmetric state space systems but leaves open the question of whether other open

sets of systems exists for which the pole placement problem is solvable. Simulations show that

the pole placement problem is indeed solvable for some open sets of symmetric state space

systems. Figure 4.5.2 shows a plot of the potential ��A�t�� B�t�� (cf. Corollary 4.4.1) verses

time for �A�t�� B�t�� a solution to (4.4.5). The initial conditions and target matrix here were

the initial conditions �A0� B0� and the state matrix F , from �F�G�, used to generate Figure

4.5.1. The plot clearly shows that the potential converges exponentially (linearly in the log10

verses unscaled plot) to zero. Consequently, the solution �A�t�� B�t�� converges to an exact

solution the pole placement problem, limt��A�t� � F . Comparing Figures 4.5.1 and 4.5.2

and recalling that they were generated using the same initial conditions, one sees explicitly that


Simulation ��A�40�� B�40��1 2�63� 10�10

2 2�09� 10�9

3 5�65� 10�9

5 3�35� 10�10

6 3�16� 10�11

7 1�62� 10�11

8 1�05� 10�10

9 3�68� 10�10

10 1�20� 10�8

11 2�72� 10�8

Table 4.5.1: Potentials��Ai�40�� Bi�40�� for experiments i � 1� � � � � 10 where �Ai�t�� Bi�t��is a solution to (4.3.4) with initial conditions �A i�0�� Bi�0�� A0 � Ni� UiB0� � S�n� �O�n�m�. Here Ni � NT

iis a randomly generated symmetric matrix with jjNijj � 0�25 and

Ui � O�n� is an randomly generated orthogonal matrix with jjUi � Injj � 0�25.

the system assignment problem is strictly more difficult than the pole placement problem.

One may ask does the particular initial condition �A0� B0� lie in an open set of initial

conditions for which the pole placement problem can be exactly solved. A series of ten

simulations was completed, integrating (4.4.5) for initial conditions �Ai� Bi� close to �A0� B0�,

jjA0 �Aijj� jjB0 �Bijj � 0�5. Each integration was carried out over a time interval of forty

seconds and the final potential��A�40�� B�40�� for each simulation is given in Table 4.5. The

plot of � verses time for each simulation was qualitatively the same as Figure 4.5.2. It is my

conclusion from this that the pole placement problem could be exactly solved for all initial

conditions in a neighbourhood of �A0� B0�.

Remark 4.5.1 It may appear reasonable that the pole placement problem could be solved for

all initial conditions with state matrix A0 in a neighbourhood of the desired structure F . In

fact simulations have shown this to be false.

LetC � O�n� n�m� be a matrix orthogonal toB, (i.e. BTC � 0). Observe that a solution

to the pole placement problem requires TF � A � BKBT and thus

TF C �AC � 0 �� F C � AC � 0�

Since A and C are specified by the initial condition (the span of C is the important object)


Time

Pote

ntia

l Φ

Figure 4.5.2: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a solution to (4.4.5) with initialconditions �A0� B0� for which the solution �A�t�� B�t�� converges to a global minimum of �.

then � Rn�n must lie in the linear subspace defined by the kernel of the linear map

� F C � AC. Of course must also lie in the set of orthogonal matrices and the

intersection of the kernel of � F C � AC with the set of orthogonal matrices provides

an exact criterion for the existence of a solution to the pole placement problem.

The difficulty for initial conditions where jjA0 � F jj is small is related to the fact that the

solution to the pole placement problem for initial conditions �A0� B0� � �F�B0�, (i.e. the state

matrix already has the desired structure), is given by the matrix pair �In� 0� � O�n� � S�m�

in the output feedback group. The matrix In lies at an extremity of O�n� in Rn�n and

it is reasonable that small perturbations of �A0� B0� may shift the kernel of the linear map

� F C � A0C such that it no longer intersects with O�n�. �

An advantage, mentioned in Section 4.3, in computing the limiting solution of (4.4.7)

(Figure 4.5.3) compared to computing the full gradient flow (4.4.5) (Figure 4.5.2) is the

associated reduction in the order of the O.D.E. that must be solved. Interestingly, it appears

that the solutions of the projected flow (4.4.7) will also converge more quickly than those

of (4.4.5). Figure 4.5.3 shows the potential �� t�� (cf. Corollary 4.4.2) verses time for

�t� a solution to (4.4.7). The initial conditions for this simulation were 0 � In while the

specified symmetric state space system used for computing the norm �� was �A0� B0� the

initial conditions for Figures 4.5.1 and 4.5.2. Observe that from time t � 1�2 to t � 2, Figure

4.5.3 displays unexpected behaviour which I interpret to be numerical error. The presence

of this error is not surprising since the potential (and consequently the gradient) is of order


Simulation � ��

1 2.05 53 25.852 1.73 43.5 25.143 2.03 27.75 13.664 0.52 20 38.465 1.6 44 27.5

Table 4.5.2: Linear rate of convergence for the solution of (4.4.5), given by �, and (4.4.7)given by ��. The final column shows the ratio between the rates of convergence for the twodifferential equations.

10�12 � E2, where E is the error bound chosen for the ODE routine in MATLAB. The

relationship of numerical error to order of the potential was checked by adjusting the error

bound E in a number of early simulations.

The exponential (linear) convergence rates of the solution to (4.4.7) and to (4.4.5) are

computed by reading off the slope of the linear sections of plots 4.5.2 and 4.5.3. For the

example shown in Figures 4.5.2 and 4.5.3 convergence of the solutions is characterised by

��A�t�� B�t�� e��t� � � 2�05

�� t�� e��t� �� 53

where �A�t�� B�t�� is a solution to (4.4.5) and �t� is a solution to (4.4.7). Five separate

experiments were completed in which the two flows were computed for randomly generated

target matrices and initial conditions with n � 5 and m � 4. The linear convergence rates

computed from these five experiments are given in Table 4.5. I deduce that solutions of (4.4.7)

converge around twenty times faster than solutions to (4.4.5) when the systems considered

have five states and four inputs and outputs. A brief study of the behaviour of systems with

other numbers of states and inputs indicate that the ratio between convergence rates is around

an order of magnetude.

In the system assignment problem Lemma 4.1.1 ensures that an exact solution to the

system assignment problem does not generically exist. The gradient flow (4.3.4), however, will

certainly converge to a connected set of local minima of the potential �, Theorem 4.3.3. An

important question to ask concerns the structure the critical level set associated with the local

minima of � may have. In particular, one may ask whether the level set is a single point or is


Time

φ∗Po

tent

ial

Figure 4.5.3: Plot of ��t�� verses t for ��t� a solution to (4.4.5) with initial conditions��0� � In the identity matrix. The potential �� :� jj�A0 � �TF�� B0B

T

0 �A0 ��TF��B0B

T

0 jj2 is computed with respect to the initial conditions �A0� B0� used in Figures

4.5.1 and 4.5.2.

it a submanifold (at least locally) of F�A�B�.

Remark 4.5.2 Observe that critical level sets of � are given by two algebraic conditions

jjgrad��A�B�jj � 0 and ��A�B� � �0, for some fixed �0, thus they are algebraic varieties

of the closed submanifold F�A�B� � Rn�n �Rn�m. It follows, apart from a set of measure

zero in F�A�B� (singularities of the algebraic conditions), that the critical sets will locally

have submanifold structure in F�A�B�. �

Rather than consider the computationally huge task of mapping out the local minima of

� by integrating out (4.3.4) for many different initial conditions in F�A�B�, it is possible

to obtain some qualitative information in the vicinity of a given local minima). Choosing

any initial condition and integrating (4.3.4) for a suitable time interval an estimate of a local

minima �A�� B�� is obtained. If this point is an isolated minima then it should be locally

attractive. By choosing a number of initial conditions �Ai� Bi� in the vicinity of �A�� B��

and integrating (4.3.4) a second time one obtains new estimates of local minima �A�i � B�i �.

If �A�� B�� approximates an isolated local minima then the ratio

ri �jj�A�i � B�

i �� A�� B��jjjj�Ai� Bi�� A�� B��jj �4�5�1�

should be approximately zero. If �A�� B�� is not isolated then one expects the ratio ri to

be significantly non-zero. Of course ri should be less than one on average since the flow is

x4-6 Numerical Methods for Symmetric Pole Placement 113

Ratio r i

Freq

uenc

y

Figure 4.5.4: Plot of frequency distribution of r i given by (4.5.1) computed for the limitingvalues of 100 simulations with initial conditions close to �A�� B��.

convergent. The difficulty in this approach is deciding on suitable time intervals for the various

integrations. The first time interval was determined by repeatedly integrating over longer and

longer time intervals (for the same initial conditions) until the norm difference between the

final values was less than 1 � 10�8. An initial time interval of two hundred seconds was

found to be suitable. Each subsequent simulation was integrated over a time interval of fifty

seconds. The results of one hundred measurements of the ratio ri for a given estimated local

minima �A�� B�� are plotted as a frequency plot, Figure 4.5.4. The frequency divisions for

this plot are 0�05, thus in the one hundred experiments undertaken eleven experiments yielded

an estimate of ri between 0�325 and 0�375. It is obvious from Figure 4.5.4 that the probability

of ri being zero is small and one concludes that the critical sublevel sets of � have a local

submanifold structure of non-zero dimension. In particular, the local minima of � are not

isolated.

4.6 Numerical Methods for Symmetric Pole Placement

In this section a numerical algorithm, based on the continuous-time flow (4.4.7) coupled with

the feedback gain (4.3.14) is proposed. The algorithm is analogous to those discussed in

Chapters 2 and 3.

Let �A�B� be a symmetric output feedback system and let F � S�n� posses the desired


closed loop eigenstructure. For 0 � O�n� consider the iterative algorithm generated by

i�1 � ie��i�T

iFi�Q�T

iFi�A�� (4.6.1)

Ki � BT � Ti F i �A�B� (4.6.2)

for i � N and i a sequence of positive real numbers termed time-steps. Observe that the

Lie-bracket � Ti F i, Q� T

i F i � A�� is skew symmetric, hence, e��i�Ti Fi�Q�T

i Fi�A��

is orthogonal and i�1 lies in O�n�.

To motivate the algorithm observe that

d

d� ie

�� TiFi�Q�T

iFi�A��

��0

� i� Ti F i�Q�

Ti F i �A��

� �F� iQ�A� Ti F i�

Ti � i

the negative gradient of �� at i (cf. Corollary 4.4.2). Thus, ie�� Ti Fi�Q�T

i Fi�A��

represents a curve in O�n�, passing through i at time � � 0, and with first derivative equal to

�grad�� i�. Indeed, the algorithm proposed can be thought of as a modified gradient descent

algorithm where instead of straight line interpolation the curves ie�� T

i Fi�Q�Ti Fi�A�� are

used.

To implement (4.6.1) it is necessary to choose a time-step i for each step of the recursion.

A convenient criteria for determining a suitable time-step is to minimise the smooth function

�� i� i� � �� i�1�� i�� 4�6�3�

In particular, one would like to ensure that �� i� �� is strictly negative unless i is a

equilibrium point of ��. The following argument is analogous to the derivation of step-size

selection schemes given in Section 2.2.

Lemma 4.6.1 Let �A�B� be a controllable6 symmetric output feedback system and F �S�m�, F �� 0, posses the desired closed loop eigenstructure. For any i � O�n� such that

6I.e. the controllability matrix �B AB A2B � � �An�1B� is full rank. It is easily shown that controllability of�A�B� ensures thatQ�A� �� 0.


grad�� i� �� 0, the recursive estimate i�1 � ie��i�T

iFi�Q�T

iFi�A��, where

i �1

4jjF jj�jjP� Ti F i�jj� jjQ�A�jj�

� �4�6�4�

satisfies�� i� i� � �� i�1�� i� � 0.

Proof Let i�1�� ie�� T

i Fi�Q�Ti Fi�A�� for an arbitrary time-step � and define

Xi � Ti F i and Xi�1�� T

i�1��F i�1��. The Taylor expansion for Xi�1�� is

Xi�1�� Xi � � �Xi� �Xi�Q�Xi� A�� 2R2��

where

R2�� Z 1

0��Xi�1�s�� Xi�Q�Xi� A�� Xi�Q�Xi� A��1� s�ds�

Substituting into (4.6.3) and bounding yields (after some algebraic manipulations)

�� i� �� jjQ�Xi�1�Xi�jj2 � 2� tr�Q�Xi�A��Xi� �Xi�Q�Xi�A��

� 2tr�Q�Xi� A�R2��

� �2� jj�Xi�Q�Xi�A��jj2 � 4� 2�jjP�Xi�jj� jjQ�A�jj� jj�Xi�Q�Xi�A��jj2jjF jj

:� ��u� i� ��

The controllability of �A�B� along with the assumptions grad�� i� �� 0 and F �� 0 ensures

that the quadratic coefficient of��u� i� �� does not vanish and it is easily seen that its unique

minimum is strictly negative and occurs for � � i of (4.6.4). The result follows since

0 � ��u� i� i� �� i� i�.

Theorem 4.6.2 Let �A�B� be a controllable symmetric output feedback system and let F �S�n�, F �� 0, posses the desired eigenstructure. For a given estimate i � O�n�, let i be

given by (4.6.4). The algorithm (4.6.1)

i�1 � ie��i�T

iFi�Q�T

iFi�A�� 4�6�5�


has the following properties.

a) The algorithm defines an iteration on O�n�.

b) Fixed points of the algorithm are the equilibrium points of (4.4.7).

c) If i is a solution to (4.6.5) then the real sequence �� i� is strictly monotonic decreasing

unless there is some i � N with i a fixed point of the algorithm.

d) Any solution i to (4.6.5) converges as i�� to a set of equilibrium points contained in a

level set of ��.

Proof Part a) follows from the observation that e��i�TiFi�Q�T

iFi�A�� is orthogonal. Fixed

points of the recursion are those for which the first derivative of � i�1�� vanishes (Lemma

4.6.1) and correspond exactly to the equilibrium points of (4.4.7). This proves part b) while

part c) is a corollary of Lemma 4.6.1.

To prove part d) observe that O�n� is a compact set, and thus�� i�, a bounded monoton-

ically decreasing sequence, must converge. This implies that �� i� i�� 0 as i� �. It

follows that i converges to a level set of�� such that for any in this set�� 0.

Lemma 4.6.1 ensures that any point in this set is an equilibrium point of (4.4.7).

Remark 4.6.3 Observe that there is an associated sequence of realisations

Ai � i�A�BKiBT � T

i

Bi � iB

for any solution � i� Ki� of (4.6.1) and (4.6.2). �

A primary aim in developing the algorithm (4.6.1) is to provide a reliable numerical tool

with which to investigate the structure of the pole placement (system assignment) problem

for symmetric realisations. Figure 4.6.1 is a simulation for a fifth order symmetric state space

system with four inputs. The initial condition is 0 � In, the identity matrix, and the algorithm

is run for 1000 steps. The linear convergence properties of the algorithm are shown by the

linear appearance of the log verses iteration plot, Figure 4.6.1. The time-step selection for this

simulation is displayed in Figure 4.6.2 and indicates both the non-linear nature of the selection


IterationL

og o

f po

tent

ial

Figure 4.6.1: Iteration verses ��i� showing linear convergence properties.

Iteration

Tim

e–st

ep s

elec

tion

Figure 4.6.2: Iteration verses time-step selection �i.

scheme as well as its limiting behaviour. The existence of a limit to the time-step selection

scheme (4.6.4) as i��, ensures that the linearization of (4.6.1) around a critical point exists.

By computing this linearization the linear convergence properties displayed in Figure 4.6.1 can

be confirmed theoretically.

Simulation studies have shown the presence of many local minima in the cost potential��.

Figure 4.6.3 is a plot of both the cost �� and the norm of the gradient jjgrad�� i�jj2 for a

simulation of a seventh order symmetric state space system with four inputs. The system was

chosen such that an exact solution to the pole placement problem existed. Thus, the global

minimum of �� was known to be zero, however, Figure 4.6.3 shows the cost �� converging

to a constant while the gradient converges to zero. The algorithm (4.6.1) provides a reliable

numerical method to investigate the presence and position of such local minima.


Iteration

Pote

ntia

l and

nor

m o

f gr

adie

ntNorm of gradient

Potential

Figure 4.6.3: Iteration verses both potential��i� and the norm of the gradient jj�F��Q�A��TF��T �jj2.


An important question that has not been addressed in this chapter is that of understanding the

equilibrium conditions for the various dynamical systems in the context of classical systems

theory. It would be nice to relate conditions such as (4.4.5) to properties such as the frequency

response of the achieved system. Unfortunately, even finding a relationship between the desired

and the achieved pole positions appears to be difficult. The discussion of Problem C, multiple

systems assignment, is another area that would benefit from further work. The results presented

in this chapter are far from comprehensive.

A natural extension of the theory presented in this chapter is to consider more general

systems. For example a class of systems �A�B�C� with a given Cauchy-Maslov index (i.e.

�AIpq�T � AIpq and CT � IpqB where Ipq � diag�Ip��Iq�) could be approached using the

same techniques developed earlier. The Lie transformation group associated with the set of

such systems is

G � fT � Rn�n j TTIpqT � Ipq� det�T � �� 0g

which has identity tangent space (or Lie-algebra)

g � f� � Rn�n j ��Ipq�T � ��Ipqg�

the set of signature skew symmetric matrices.


Related to the general construction for systems with an arbitrary Cauchy-Maslov index is

the problem for Hamiltonian linear systems. These are systems �A�B�C�where �AJ�T � AJ

and CT � JB where

J �

�B� 0 �InIn 0

�CA �

The set of Hamiltonian linear systems is a homogeneous space with Lie transformation group

Sp�n�R� � fT � R2n�2n j TTJT � J� det�T � �� 0g�

termed the symplectic group. The Lie-algebra associated with the symplectic group is the set

of 2n� 2n Hamiltonian matrices

Ham�n�R� � f� � R2n�2n j ��J�T ��J � 0g�

Hamiltonian systems are important for modelling mechanical systems.

One may also consider pole placement problems on the set of general linear systems. A

discussion of some basic results is contained in the monograph (Helmke & Moore 1994b,

Section 5.3). One area in which these results could be extended is to consider dynamic output

feedback. Assume that one knows the degree d of a dynamic compensator applied to a given

linear state space system. The dynamics of the closed loop system can be modelled by the

differential equation

�x � Ax� Bu

y � Cx

�w � Gw � Cx

u � Fw �Ky�

where the feedback law u is allowed to depend both on the dynamic compensator state w and

the direct output y. This system can be rewritten as an augmented system with static feedback

d

dt

�B� x

w

�CA �

�B� A 0

C G

�CA�B� x

w

�CA �

�B� B

0

�CA u


�B� y

w

�CA �

�B� C 0

0 Id

�CA�B� x

w

�CAu �

�K F

� �B� y

w

�CA �

Once the system is written in this form it is amenable to analysis via the general linear theory

presented in Helmke and Moore (1994b, Section 5.3). Of course, one could also exploit the

structure of the augmented problem itself to reduce computational cost and ensure that the roles

of system and compensator states do not become confused.

Gradient descent methods could also be used to compute canonical forms for system

realizations. For example, to compute the companion form of a given state matrix A consider

the smooth cost function

�A� �nXi�2

Xj ��i�1

A2ij �

nXi�2

�Ai�i�1� � 1�2

on the homogeneous space

S�A� � fTAT�1 j T � Rn�n� det�T � �� 0g�

Given that computating canonical forms is often an ill conditioned numerical problem, dy-

namical system techniques and related numerical gradient descent algorithms with their strong

stability properties may prove to be an important numerical tool in certain situations.

Chapter 5

Gradient Flows on Lie-Groups and

Homogeneous Spaces

The optimization problems considered in Chapters 2, 3 and 4 are all problems where the

constraint set is a homogeneous space. In each case the approach taken is to consider a suitable

Riemannian metric on the homogeneous space and compute the maximising (or minimising)

gradient flow. The limiting value of a solution of the gradient flow (for arbitrary initial condition)

then provides an estimate of the maximum (or minimum). The numerical methods discussed

in Chapters 2 to 4 are closely related to each other. They each rely on using a ‘standard’ curve

lying within the homogeneous space, which can be assigned an arbitrary initial condition and

arbitrary initial tangent vector, to interpolate the solution of the continuous-time gradient flow.

Thus, for an arbitrary point in the constraint set one estimates the solution of the gradient flow

by travelling a short distance along the ‘standard’ curve starting from the present estimate with

initial tangent vector equal to the gradient at that point. It is natural to ask whether there is an

underlying structure on which the numerical solutions proposed in Chapters 2 to 4 are based

and, if there is, to what degree can such an approach be applied to any generic optimization

problem on a homogeneous space.

With the developing interest in dynamical system solutions to linear algebraic problems

(Symes 1982, Deift et al. 1983, Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b)

during the eighties there came an interest in the potential of continuous realizations of classical

121

122 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5

problems as efficient numerical methods (Chu 1988). Interestingly, it has taken several years

before the connection between dynamical systems and linear algebraic problems is examined

in the other direction, namely, can one use insights and understanding developed by studying

problems using the dynamical systems approach to design efficient numerical algorithms for

problems in linear algebraic. Recently Chu (1992) has shown that by using the insight provided

by a geometric understanding of a structured inverse eigenvalue problem a better understanding

of a quadratically convergent algorithm first proposed by Friedland et. al. (1987) is obtained.

Perhaps more directly based on the dynamical systems literature is the work by Brockett (1993)

that looks at the design of gradient algorithms on the adjoint orbits of compact Lie-groups. The

methods proposed in Chapters 2 to 4 are gradient descent algorithms constructed explicitly on

the homogeneous space (Moore et al. 1992, Mahony et al. 1993, Moore et al. 1994, Mahony et

al. 1994).

Certainly the numerical methods proposed satisfy the broad requirements of simplicity,

global convergence and constraint stability discussed on page 2. Moreover, the numerical

methods described in each chapter have strong similarities, for example the Riemannian metrics

used are all of a similar form and the ‘standard’ curves used to generate the numerical methods

are all based on matrix exponentials. To develop a general understanding of these methods,

however, it is apparent that one must develop a better understanding of the Riemannian geometry

of the homogeneous constraint sets on which the algorithms are constructed.

In this chapter I attempt to provide a rigourous but brief review of the relevant theory

associated with developing numerical methods on homogeneous spaces. The focus of the

development is on the classes of homogeneous spaces encountered in engineering applications

and the simplest theoretical constructions which provide a rigourous basis for the numerical

methods developed. A careful development is given of the relationship between gradient flows

on Lie-groups and homogeneous space (related by a group action) which motivates the choice of

a particular Riemannian structure for a homogeneous space. Convergence behaviour of gradient

flows is also considered. The curves used in constructing numerical methods in Chapters 2

to 4 were all based on matrix exponentials and the well understood theory of the exponential

map as a Lie-group homomorphism is reviewed to provide a basis for this choice. Moreover,

the geodesic structure of the spaces considered (following from Levi-Civita connection) is

developed and conditions are given on when the matrix exponential maps to a geodesic curve

x5-1 Lie-groups and Homogeneous Spaces 123

on a Lie-group. Finally, an explicit discussion of the relationship between geodesics on Lie-

groups and homogeneous spaces is given. The conclusion is that the algorithms proposed in

Chapters 2 to 4 are modified gradient descent algorithms with geodesic curves used to replace

the straight line interpolation of the classical gradient descent algorithm.

Much of the material presented is standard or at least accessible to people working in the

fields of Riemannian geometry and Lie-groups, however, this material would not be standard

knowledge for researchers in an engineering field. Moreover, the development strongly em-

phasizes the aspects of the general theory that is relevant to problems in linear systems theory.

Due to to the focus of the work, explicit proofs are given for a number of results which do

not appear to be standard in the literature. In particular, I have not seen the results concerning

the interrelation of gradient flows on Lie-groups and homogeneous spaces nor a careful pre-

sentation of the relationship between geodesics on Lie-groups and homogeneous spaces in any

existing reference.

The chapter is divided into nine sections. Section 5.1 presents a brief review of Lie-groups

and homogeneous spaces while Section 5.2 considers a certain class of homogeneous space,

orbits of semi-algebraic Lie-groups, which includes all the constraint sets considered in this

thesis. Section 5.3 describes a natural choice of Riemannian metric for a given homogeneous

space while Section 5.4 discusses the derivation of gradient flows on Lie-groups and homoge-

neous spaces and shows why the choice of Riemannian metric made in Section 5.3 is the most

natural. Section 5.5 discusses the convergence properties of gradient flows. Sections 5.6 to

5.9 develop the geometry of Lie-groups and homogeneous spaces concentrating on providing

a basis for understanding the exponential map and geodesics.

5.1 Lie-groups and Homogeneous Spaces

In this section a brief review of Lie-groups and homogeneous spaces is presented. The reader

is referred to Helgason (1978) and Warner (1983) for further technical details.

A Lie-groupG is an abstract group which is also a smooth manifold on which the operations

of group multiplication(� � �� , for �, � � G) and taking the inverse (� � ��1, for � � G) are

smooth diffeomorphisms of G onto G. For � � G one defines automorphisms of G associated


with right and left multiplications by a constant

r� : G� G� r�� :� �� (5.1.1)

l� : G� G� l�� :� ��

Observe that r� and l� are diffeomorphisms of G with smooth inverse given by r��1 and l��1

respectively.

Let M be a manifold and G be a Lie-group. A smooth group action of G on M is a smooth

mapping

: G�M �M

which satisfies

�� q� � �� q�� q �M� �� G�

�e� q� � q� q �M� e is the identity of G�

The action is known as transitive if for any q and r in M there exists � � G such that

�� q� � r. Observe that �� : M � M is a diffeomorphism of M into M since

�� is smooth, surjective (let � � G, then for any q � M , �� 1� q�� q� and

has smooth inverse �1�� 1� �. A smooth manifold M with a transitive group of

diffeomorphisms ( �� : M �M ) is known as a smooth homogeneous space.

Let p �M and define the stabiliser of p by

stab�p� � f� � G j �� p� � pg�

By construction stab�p� � G is an abstract subgroup of G. By inspection the map

p : G�M

p�� :� �� p� (5.1.2)

is a smooth map which is onto if and only if is transitive. As a consequence, if is a

smooth transitive group action of G onto M one has that dimG dimM . The stabilizer,

stab�p� � �1p �p�, is the inverse image of a single point under a continuous map and is a

x5-2 Semi-Algebraic Lie-Groups, Actions and their Orbits 125

closed set in the manifold topology on G. Consequently, stab�p� is a closed abstract subgroup

of G and is a Lie-subgroup of G with the relative topology inherited from G (Warner 1983,

pg. 110). The left coset space G�stab�p� � f�stab�p� j � � Gg has a natural topology

such that the surjective mapping � : G � G�stab�p�, � � stab�p��, is a continuous, open

mapping. Similarly, equipping G�stab�p� with the unique differential structure that makes �

smooth (Warner 1983, pg. 120), it is easily verified that � is a submersion.

The right coset space is itself a homogeneous space under the group action

� : G� G�stab�p�� G�stab�p�

�� stab�p�� :� ��stab�p��

Consider the smooth map

�p : G�stab�p��M (5.1.3)

�p ��stab�p�� :� �� p��

It is a standard result that �p : G�stab�p� � M is a diffeomorphism (Helgason 1978,

Proposition 4.3, pg. 124). By construction, the following diagram commutes

�p p

�

�

HHHHHHHHHHj

M

G�stab�p��G

In particular, p � �p � � is the composition of a submersion and a diffeomorphism and is

itself a submersion.

5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits

A setG � Rs is known as semi-algebraic when it can be obtained by finitely many applications

of the operations of intersection, union and set difference starting from sets of the form


fx � Rs j f�x� � 0g with f a polynomial function on Rs. A semi-algebraic Lie-group is a

Lie-group which is also a semi-algebraic subset ofRs. The following two sets are examples of

semi-algebraic Lie-groups.

Example 5.2.1 a) The general linear group

GL�N�R� � fT � RN�N j det�T � �� 0g

b) The orthogonal group

O�n� � O�n�R� � fT � RN�N j TT � � INg�

where IN is the N �N identity matrix.

�

Let G be a Lie-group and be a smooth group action : G�Rr � Rr. Fix p � Rr and

define the orbit of the action to be the set

O�p� � f �� p� j � � Gg�

The set O�p� is an immersed1 submanifold of Rr in the sense that it is a subset of Rr with a

differential structure given by that induced by the diffeomorphism �p : G�stab�p� � O�p�,(cf. (5.1.3)). The map is a smooth transitive group action of G acting on O�p� and thus,

O�p� is given the structure of a homogeneous space. It is certainly not clear that the differential

structure induced by the immersion is compatible with the Euclidean differential structure on

Rr. In the case where the two differential structures are compatible then O�p� is an embedded

submanifold of Rr.

Let G be a subset of Rs. A map f : G � Rr is semi-algebraic when the graph of f ,

f�x� f�x�� j x � Gg � Rs � Rr is semi-algebraic. In particular, if G is a semi-algebraic

1An immersion is a one-to-one map f : M � N between two manifolds M and N which for which thedifferential df is full rank at all points. An immersed submanifold is a subset U � N such that U � f�M� is theimage of some manifoldM via an immersion f . The setU � N inherits the differential structure onM via the mapf , however, this need not correspond to the differential structure associated with the manifold N . An embeddingis an immersion f : M � N such that the image U � f�M� is a manifold with subspace differential structureinherited from N (Warner 1983, pg. 22).

x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127

subset of Rs and f : Rs � Rr is a rational map (i.e. the i’th component of f is a ratio of two

polynomial maps) then the map f : G� Rr is semi-algebraic (Gibson 1979, pg. 223).

The following result shows that for semi-algebraic Lie-groups and semi-algebraic group

actions the orbit of a point p � Rr is always an embedded submanifold of Rr. The result is

standard where G is a compact Lie-group (Varadarajan 1984, pg. 81). For G semi-algebraic

the reader is referred to Gibson (1979, pg. 224).

Proposition 5.2.2 LetG be a Lie-group and : G�RN � Rr be a smooth group action ofG

onRr. Let p � Rr be an arbitrarypoint and denote the orbit of p byO�p� � f �� p� j � � Gg.Then, O�p� is an embedded submanifold ofRr with the embedding

�p : G�stab�p�� O�p�

given by (5.1.3), if either:

a) G is a compact Lie-group.

b) G is a semi-algebraic Lie-group and : G � RN � RN is a semi-algebraic group

action.

5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces

Let G be a Lie-group, M be a smooth manifold and : G�M � M be a smooth transitive

group action of G on M . Denote the tangent space of G at the identity e by TeG. Let

ge : TeG � TeG be an inner product on TeG, i.e. a positive definite, bilinear map. The inner

product chosen in the sequel is always a Euclidean inner product computed by choosing an

arbitrary fixed basis fE1� � � � � Eng for TeG, expressing given tangent vectors X �Pn

i�1 xiEi

and Y �Pn

i�1 yiEi in terms of this basis and setting

ge�X� Y � �nXi�1

xiyi�

Of course, this construction depends on the basis vectors used. One could also consider other

inner products, for example when G is semi-simple the negative of the killing form (Helgason


1978, pg. 131) is a positive definite inner product. A number of authors have used the killing

form in related work (Faybusovich 1989, Bloch et al. 1992, Brockett 1993), however, the choice

of a particular inner product is immaterial to the following development.

Let ge be an inner product on TeG. Let r� of (5.1.1), be right translation by � � G. Since

r� is a diffeomorphism, the differential2 at the identity e of G, Ter� : TeG� T�G, is a vector

space isomorphism. Using Ter� one can define an inner product on each tangent space of G

g� : T�G� T�G� R

g�� :� ge�Ter

�1� �� Ter

�1� ��

��

where � and � are elements of T�G. It is easily verified that g� varies smoothly on G and

consequently defines a Riemannian metric,

g�� :� g�� 5�3�1�

for � � G, �, � in T�G. This Riemannian metric is termed the right invariant group metric for

G. Observe that for any two smooth vector fields X and Y on G one has

g�dr�X� dr�Y � � g�X� Y �� 5�3�2�

Let p � M be arbitrary and recall that p of (5.1.2) is a submersion of G onto M (since

the group action is transitive). Thus, the differential of p at the identity Te p : TeG� TpM

is a linear surjection of vector spaces. Decompose TeG into the topological product

TeG � kerTe p � dom Te p�

2Let � : M � N be a smooth map between smooth manifolds M and N . Let p �M be an arbitrary point thenthe differential of � at p (or the tangent map of � at p) is the linear map

Tp� : TpM � T��p�N�

Tp��X� :� D�jp �X�

where D�jp �X� is the Frechet derivative (Helmke & Moore 1994b, pg. 334) of � in direction X � TpM . Thefull differential of � is a map from the tangent bundle of M , TM � �p�MTpM to the tangent bundle of N

d� : TM � TN�

d��Xp� :� Tp��Xp��

where Xp is an element of the tangent space TpM for arbitrary p �M .

x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 129

where kerTe p is the kernel of Te p and

dom Te p � fX � TeG j ge�X� Y � � 0� Y � kerTe pg �5�3�3�

is the domain (the subspace orthogonal to kerTe p using the inner product provided on TeG).

By construction, Te p restricts to a vector space isomorphism T�e p,

T�e p : dom Te p � TpM

T�e p�X� :� Te p�X��

Thus, one may define an inner product on the tangent space TpM by

gMp �X� Y � � ge�T�e

�1p �X�� T�e

�1p �Y �

��

where T�e �1p �X� � TeG via the natural inclusion dom Te p � TeG. It is easily verified that

this construction defines a smooth inner product on the tangent bundle TM . Thus, one defines

a Riemannian metric,

gM�X� Y � :� gMq �� 5�3�4�

for q �M and X , Y in TqM . This is termed the normal metric on M .

Let q � M be arbitrary then the normal Riemannian metric on M and the right invariant

group metric on G are related by the differential of p : G� M . To see this observe that for

any p �M there exists � � G such that �� p� � q. Thus,

q�� p�� p�

� p � r��

Differentiating at the identity gives the following commutating diagram of vector space homo-

morphisms.


d pd q

dr�

TqM

T�GTeG�

HHHHHHHHHHj �

In particular, the normal Riemannian metric can also be defined by

gM�X� Y � :� g�T�� 1p �� T��

�1p �� 5�3�5�

where g�� is the right invariant group metric on G, X and Y in TqM and T�� p is the

restriction of T� p to

dom T� p � fY � T�G j g�Y�X� � 0� for all X � kerT� pg� �5�3�6�

the domain of T� p. Observe that dom T� p � dr��dom Te q�.

5.4 Gradient Flows

LetM be a Riemannian manifold (with Riemannian metric gM ) and let : M � Rbe a smooth

potential function. The gradient of on M is defined pointwise on M by the relationships

Djp �� gM�grad�p��

� � � TpM (5.4.1)

grad�p� � TpM� (5.4.2)

where Djp �� is the Frechet derivative of in direction � at the point p � M (Helmke &

Moore 1994b, pg. 334). Existence follows from the positive definiteness and bilinearity of the

inner product along with linearity of the Frechet derivative.

Observe that grad is a smooth vector field on M which vanishes at local maxima and

minima of . Consider the ordinary differential equation on M , termed the gradient flow of ,

�p � grad�p�

x5-4 Gradient Flows 131

whose solutions are integral curves3 of grad. Let p0 � M be some initial condition then

solutions of the gradient flow with initial condition p0 exists and are unique (apply classical

O.D.E. theory to the local co-ordinate representation of the differential equation).

Let G be a Lie-group and : G�M � M be a smooth transitive group action of G on

M . Fix p �M and consider the ‘lifted’ potential � : G� R,

�� :� � p�� 5�4�3�

where p is given by (5.1.2). Let ge�� be an inner product on TeG and define the right

invariant group metric g on G and the normal metric gM onM as described in Section 5.3. The

smooth potential : M � Rand the ‘lifted’ potential � give rise to the two gradient flows

�q � grad�q�� q �M� (5.4.4)

�� grad�� G� (5.4.5)

defined with respect to the normal metric and the right invariant group metric respectively.

Lemma 5.4.1 Let p � M be some fixed element of M . Let q0 � p��0� (where p is given

by (5.1.2)) be an arbitrary initial condition in M . Let q�t� denote the solution of (5.4.4) with

initial condition q0 and let ��t� denote a solution of (5.4.5) with initial condition �0 then

q�t� � p��t��

Proof By construction q0 � p��0�. Consider the time derivative of p��t��

d

dt p��t�� d p

�d

dt��t�

�� T� p � grad��t��

3Let X be a smooth vector field on a manifold M . An integral curve of X is a smooth map

� : R�M�

��t� � X��t��

where �� :� d��

ddt

��

and d

dt

��

denotes the tangent vector toRat the point �R.


Thus, it is sufficient to show that

T� p � grad�� grad� p��

and use the uniqueness of solutions to (5.4.4) and (5.4.5) to complete the proof.

Let grad� � grad�0 � grad�� be the unique decomposition of grad� into grad�0 �kerT� p and grad�� dom T� p (cf. (5.3.6)). Observe that

g�grad�0�� grad�0�� g�grad�� grad�0��

� D�j� �grad�0��

� Dj�p�� T� p � grad�0�� 0�

since T� p � grad�0�� 0. Since the metric g is positive definite then grad�0 � 0 and it

follows that grad� � dom �T� p�.

Let q � M be arbitrary and choose � � G such that p�� q. Let X � TqM be an

arbitrary tangent vector and observe

gM�T� p � grad�� X� � g�T��

�1p � T� p � grad�� T��

�1p �X�

�

using (5.3.5). Of course T�� 1p � T� p�grad�� grad�� since grad� � dom �T� p�.

It follows that

gM�T� p � grad�� X

� g

�grad�� T��

�1p �X�

�� D�j� �T�� 1

p �X��

� Djq �T� p � T�� 1p �X��

� Djq �X� � gM�grad�q�� X

�

Since X is arbitrary and gM is positive definite then T� p � grad�� grad� p�� and

the proof is completed.

x5-5 Convergence of Gradient Flows 133

5.5 Convergence of Gradient Flows

Let M be a Riemannian manifold and let : M � R be smooth function. Let grad denote

the gradient vector field with respect to the Riemannian metric on M . The critical points of

: M � Rcoincide with the equilibria of the gradient flow on M .

�q�t� � �grad�q�t�� 5�5�1�

For any solution x�t� of the gradient flow

d

dt�q�t�� g�grad�q�t�� q�t��

� �g�grad�q�t�� grad�q�t�� 0

and therefore �q�t�� is a monotonically decreasing function of t. The following proposition

is discussed in Helmke and Moore (Helmke & Moore 1994b, pg. 360).

Proposition 5.5.1 Let : M � R be a smooth function on a Riemannian manifold with

compact sublevel sets4. Then every solution q�t� �M of the gradient flow (5.5.1) on M exists

for all t 0. Furthermore, x�t� converges to a connected component of the set of critical

points of as t� ��.

Note that the condition of the proposition is automatically satisfied if M is compact.

Solutions of a gradient flow (5.5.1) display no periodic solutions or strange attractors and there

is no chaotic behaviour. If has isolated critical points in any level set fq � M j �q� � cg,

c � R, then every solution of the gradient flow (5.5.1) converges to one of these critical points

as t � ��. This is also the case where the critical level sets are smooth submanifolds ofM .

In general, however, it is possible that the solution of a gradient flow converges to a connected

level set of critical points of the function . Such ‘non-generic’ behaviour is undesirable when

gradient flows are being used as a numerical tool. For problems of the type considered in this

thesis the following lemma is generally applicable.

Lemma 5.5.2 Let : M � Rbe a smooth function compact sublevel sets, such that

4Let c � R then the sublevel set of � associated with the value c is fq � M j ��q� � cg. If � has compactsublevel sets then each such set (possibly empty) is a compact subset of M .


(i) The set of critical points of is the union of closed, disjoint, critical level sets, each of

which is a submanifold of M .

(ii) The Hessian5 H at a critical point degenerates exactly on the tangent space of the

critical level sets of . Thus, for q � M a critical point of and � TqM then

Hjq �� 0 if and only if is in the tangent space of the critical level set of .

Then every solution of the gradient flow

�q � �grad�q�

converges exponentially fast to a critical point of .

Proof Denote the separate connected components of the critical level sets of by Ni for

i � 1� 2� � � � � K where K is the number of disjoint critical level sets. Thus, the limit set of a

solution to the gradient flow �x � grad is fully contained in someNj for j � �1� 2� � � � � K�. Let

a � Nj be an element of this limit set. Condition ii) ensures that each Nj is a non-degenerate

critical set. It may be assumed without loss of generality that the value of constrained to Nj

is zero. The generalised Morse lemma (Hirsch 1976, pg. 149) gives a open neighbourhoodUa

of a in M and a diffeomorphism f : Ua � Rn, n � dimM , nj � dimNj , such that

�i� f�Ua �Nj� � Rnj � f0g�ii� � f�1�x1� x2� x3� �

12�jjx2jj2 � jjx3jj2��

with x1 � Rnj , x2 � Rn� , x3 � Rn� and nj � n� � n� � n. Let W � f�Ua� � Rn then the

gradient flow of � f�1 on W is

�x1 � 0� �x2 � �x2� �x3 � x3� �5�5�2�

5Let M be a smooth manifold. The Hessian of a smooth function � : M � R is a symmetric bilinear mapH�jq : TqM TqM � Rgiven by

H�jq�� i

�2 ��

�xi�xj�j�

where x � fx1� � � � � xng is a local coordinate chart on M and ��, � are the local coordinate representations of � and � TqM while �� is a local coordinate representation of �.

x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 135

W +

W-

W-

x2

x3

W +

Figure 5.5.1: Flow around a saddle point

Let W� :� f�x1� x2� x3� j jjx2jj jjx3jjg and W� :� f�x1� x2� x3� j jjx2jj � jjx3jjg. Using

the convergence properties of (5.5.2) it follows that every solution of original gradient flow

starting in f�1�W��f�x1� x2� x3� j x3 � 0g�will enter the region f�1�W�� for which � 0

(cf. Figure 5.5.1). On the other hand, every solution starting in ff�1�x1� x2� x3� j x3 � 0g will

converge to the point f�1�x1� 0� 0� � Nj � x1 � Rnj . As is strictly negative on f�1�W��,

all solutions starting in f�1 �W� SW� � ff�1�x1� x2� x3� j x3 � 0g will eventually and

converge to some Ni �� Nj. By repeating this analysis for each Ni and recalling that any

solution must converge to a connected subset of some Ni (Proposition 5.5.1) the proof is

completed.

5.6 Lie-Algebras, The Exponential Map and the General Linear

Group

Let G be a Lie-group. The Lie-algebra of G, denoted g, is the set of all left invariant smooth

vector fields on G, i.e. smooth vector fields X�� T�G such that

X � l�� dl�X��

where l�� :� �� is left multiplication by �. In particular,

X�� dl�X�e�� 5�6�1�


Let X�� and Y �� be two smooth vector fields on G and think of them as derivations6 of

C��G� (the set of smooth functions which map G� R). The Lie-bracket of X�� and Y ��

is defined with respect to the action of X and Y as derivations

�X�� Y ��f � X��Y ��f � Y ��X��f�

where f � C��G�. By checking the linearity of this map it follows that the Lie-bracket of

two vector fields is itself a derivation and corresponds to a vector field, denoted �X� Y ��.

The set of smooth vector fields on G, denoted D�G�, is a vector space over Runder pointwise

addition of vector fields and scalar multiplication. Considering the Lie-bracket operation as a

multiplication rule, D�G� is given the structure of an algebra. Assume that X�� X and

Y �� Y are left invariant vector fields on G, then �X� Y � � �X� Y �� is also a left invariant

vector field on G, since

�dl��X� Y ��f � �X� Y ��f � l�� (5.6.3)

� X D�f � l��j� �Y �� Y D�f � l��j� �X�

� X Df jl�� dl�Y �� Y Df jl�� dl�X�

� X Df jl�� Y � l�� Y Df jl�� X � l�� X� Y � � l��f�

Thus g forms a subalgebra of the algebra of derivations. Note that there is a one-to-one

correspondence between g and TeG the tangent space of G at the identity given by (5.6.1).

Thus, g is a finite dimensional algebra of the same dimension as the Lie-group G. Indeed,

an alternative way of thinking about g is as the tangent space TeG equipped with the bracket

operation

�X�e�� Y �e�� dl��X�e�� dl��Y �e��e��

6Let C��G� be the set of all smooth maps from G intoR. The setC��G� acquires a vector space structure underscalar multiplication and pointwise addition of functions. A derivation of C��G� is a linear mapC��G�� C��G�.The set of all derivations of C��G�, denoted D�G�, itself forms a vector space under scalar multiplication andpointwise addition of functions. A smooth vector field is a smooth mapX : G� TGwhich assigns a vectorX�� T�G to each element � � G. Any smooth vector field X defines a derivation X�f� � Xf�� :� Df j� �X��,the Frechet derivative of f in direction X at the point . Indeed, this correspondence is an isomorphism

D�G� � fthe set of smooth vector fields on Gg� �5�6�2�

betweenD�G� and the vector space of smooth vector fields on G (Varadarajan 1984, pg. 5).


Example 5.6.1 Let N be a positive integer and consider the set of all real non-singularN �N

matrices

GL�N�R� � f� � RN�N j det�� 0g�

where det�� is the determinant of �. The set GL�N�R� is known as the general linear

group and is a Lie-group under the group operation of matrix multiplication. Since GL�N�R�

is an open subset of RN�N it inherits the relative Euclidean topology and differential struc-

ture. The tangent space at the identity IN (the N � N identity matrix) of GL�N�R� is

TINGL�N�R� � RN�N the set of all real N �N matrices. Consequently, the dimension of

the Lie-groupGL�N�R� is n � N 2. The tangent space ofGL�N�R� at a point � � GL�N�R�

can be represented by the image of TINGL�N�R� � RN�N via the linearization of the

diffeomorphism generated by left multiplication l�,

T�GL�N�R� � TIN l��TINGL�N�R��

� f�A j A � RN�Ng�

The Lie-algebra of GL�N�R�, denoted gl�N�R�, is the set of all left invariant vector fields of

GL�N�R�. From (5.6.1) it follows that

gl�N�R� � fX�� A j � � G� A � RN�Ng�

Let f � C��GL�N�R�� be any smooth real function then the Lie-bracket of two elements �A,

�B � gl�N�R� acting on f is

��A� �B�f � ��A��B�f � ��B��A�f

� ��A�Df j� ��B�� B� Df j� ��B�� D

�Df j� ��B�

j� ��A�� D�Df j� ��A�

j� ��B�Now since GL�N�R� inherits the Euclidean differential structure from RN�N the Frechet

derivative Df j� ��X�, X � RN�N , can be written

Df j� ��X� �NX

i�j�1

df

d�ij��X�ij


where Xij is the �i� j�’th entry of X � RN�N . Writing ��B�ij �PN

s�1 �isBsj and applying

the product rule of differentiation gives

D�Df j� ��B�

j� ��A� �NX

i�j�1

NXp�k�1

d2f

d�ijd�pk��B�ij��A�pk

�NX

i�j�1

df

d�ij

NXp�k�1

d

d�pk

�NXs�1

�isBsj

��A�pk

�NX

i�j�1

NXp�k�1

d2f

d�ijd�pk��B�ij��A�pk �

NXi�j�1

df

d�ij

NXk�1

Bkj��A�ik

since dfd�pk

�PNs�1 �isBsj

�� 0 unless p � i and s � k. It follows that

��A� �B�f �NX

i�j�1

df

d�ij

�NXk�1

��A�ikBkj �NXk�1

��B�ikAkj

�

�NX

i�j�1

df

d�ij��AB � BA��ij � ��AB �BA��f�

where ��AB � BA� is a smooth left invariant vector field on GL�N�R�. For any two

matrices A�B � RN�N define the matrix Lie-bracket by �A�B� � AB � BA. The bracket

operation on the Lie-algebra can now be written in terms of the matrix Lie-bracket operation

on TeGL�N�R� � RN�N

��A� �B� � ��A�B�� 5�6�4�

Indeed, it is usual to think of gl�N�R� as the set

gl�N�R� � fA j A � RN�Ng �5�6�5�

with the matrix Lie-bracket operation �A�B� � AB �BA. �

Let G and H be two Lie-groups and let g and h be their associated Lie-algebras. A

map : G � H is called a Lie-group homomorphism (or just homomorphism) from G to

H if is smooth and is a group homomorphism (i.e. �g1g�12 � � �g1��g2��1). A map

� : g � h is called a Lie-algebra homomorphism (or just homomorphism) from g to h if �

is linear and preserves the bracket operation ��X� Y �� X�� Y ��. The tangent map

Te : TeG� TeH induces a map � : g� h, ��X� � dl�Te�X�e��, which is a Lie-algebra

homomorphism (Warner 1983, pg. 90). Abusing notation slightly it is standard to identify g


with TeG, h with TeH (cf. (5.6.1)) and write Te : g � h as the Lie-algebra homomorphism

induced by a Lie-group homomorphism : G � H . The following result is fundamental in

the theory of Lie-groups. A typical proof is given in Warner (1983, Theorem 3.27).

Proposition 5.6.2 Let G and H be Lie-groups with Lie-algebras g and h respectively and

assume thatG is simply connected. Let � : g� h be a Lie-algebra homomorphism, then there

exists a unique Lie-group homomorphism : G� H such that Te � �.

Let G be any Lie-group and denote its Lie-algebra by g. Denote the identity component

of G by Ge, the set of all points in G path connected to the identity e. Observe that R is a

Lie-group under addition. The Lie-algebra of R is a one-dimensional vector space r � � ddr ,

� � R, where ddr denotes the derivative inR. Let X � g be arbitrary and consider the map

� : r� g

��d

dr� :� �X�

It is easily seen that� is a Lie-algebra homomorphism and using Proposition 5.6.2 there exists

a unique Lie-group homomorphism

expX : R� Ge �5�6�6�

such that Te � �. Since expX is a Lie-group homomorphism then expX�t1 � t2� �

expX�t1� expX�t2� and the set

�tRexpX�t� � G

is known as a one-parameter subgroup of G.

One may define the full exponential by

exp : g� G (5.6.7)

exp�X� :� expX�1��

The exponential map is a local diffeomorphism from an open neighbourhoodN0 of 0 � g into

an open neighbourhood Me of e � G (Helgason 1978, Proposition 1.6).


Lemma 5.6.3 Let exp : gl�N�R�� GL�N�R� denote the exponential map (5.6.7) and let

eX � IN �X �X2

2!�X3

3!

be the standard matrix exponential. Let X � gl�N�R� � RN�N , then

exp�X� � eX �

Proof Recall that the matrix exponential eX is well defined for all X � RN�N (Horn &

Johnson 1985, pg. 300). Let X � gl�N�R� (thought of as the set of N �N matrices equipped

with the matrix Lie-bracket) then define X : R� GL�N�R� by

X�t� � etX �

Observe that X is well defined smooth map since the matrix exponential is itself smooth and

always non-singular (det�eX� � etr�X� �� 0). Indeed, X is a group homomorphism (R is a

Lie-group under addition) since X�t1 � t2� � X�t1�X�t2� and X��t� � X�t��1. The

tangent space of Rat 0 is the set f� ddr j � � Rg where d

dr denotes normal derivation. Observe

that

TeX��d

dr� � DX j0 ��

d

dr�

� �d

dtetX

��t�0

� �X�

But this is exactly the Lie-algebra homomorphism� that induces the Lie-group homomorphism

expX (5.6.6). Since expX is the unique Lie-group homomorphism that has the property

Te expX��ddr� � �X then it follows that X�t� � expX�t�. The full result follows from the

definition of exp (5.6.7).

x5-7 Affine Connections and Covariant Differentiation 141

5.7 Affine Connections and Covariant Differentiation

Let G be a smooth manifold. An affine connection is a rule r which assigns to each smooth

vector field X � D�G� a linear mapping rX : D�G�� D�G�, rX�Y � :� rXY , satisfying

rfX�gY � frX � grY (5.7.1)

rX�fY � � frXY � �Xf�Y� (5.7.2)

where f , g � C��G� and X , Y � D�G�.

An affine connection naturally defines parallel transport on a manifold. Let X and Y �D�G� be smooth vector fields and let ��t� be a smooth integral curve of X on some time

interval �0� ��, � � 0, then the family of tangent vectors t � Y ��t�� is said to be transported

parallel to ��t� if

�rXY ��t�� 0� �5�7�3�

Expressing (5.7.3) in local coordinates one can show that the relationship depends only on the

values of the vector fields X and Y along the curve ��t� (Helgason 1978, pg. 29). Thus,

given a curve ��t� on �0� �� and a smooth assignment �0� �� Y �t� � T�t�G then Y �t� is

transported parallel to � if and only if any smooth extensions X , Y � D�G�, X��t�� t�,

Y ��t�� Y �t� satisfy (5.7.3).

A geodesic is a curve ��t� for which the family of tangent vectors ��t� is transported

parallel to the curve ��t�. It is usual to write the parallel transport equation for a geodesic

� : R� G as

r �� 0� �5�7�4�

where by this one means that any smooth extensionX � D�G� of �� satisfiesrXX��t�� 0.

Given a point � � G and a tangent vector X � T�G there exists a maximal open interval I� Rcontaining zero and a unique geodesic �X : I� G with ��0� � � and ��0� � X (Helgason

1978, pg. 30).

Given a fixed curve between two points �, � � G, (� : �0� �� G, ��0� � �, �� )

there exists a set of n linearly independent smooth assignments t � Yi�t� � T�t�G, i �

1� � � � � n, t � �0� �� (where each Yi�t� is transported parallel to �) and which span the set of all


smooth assignments t � Y �t� (Y �t� transported parallel to �) (Helgason 1978, pg. 30). These

solutions correspond to choosing n linearly independent vectors in T�G as initial conditions

and solving (5.7.3) for Yi�t�. The construction induces an isomorphism

P�0�� : T�G� T�G� P�0��Z� �nXi�1

ziYi�� 5�7�5�

where Z �Pn

i�1 ziYi�0� � T�G. Of course, this isomorphism will normally depend on the

on the curve �.

Parallel transport of a smooth covector field w : G� T�G, is defined in terms of its action

on an arbitrary vector field X � D�G�

�P�0��w��X� � w�P��0�X��

where P��0� is parallel transport from �� backwards to ��0� along the curve ��t�. Parallel

transport of an arbitrary tensor field T : G � T�G � � T �G � TG � � TG of type

�r� s� is given by its action on arbitrary covector and vector fields

�P�0��T ��w1� � � � � wr�W1� � � � �Ws� � T �P��0�w1� � � � � P��0�wr� P��0�W1� � � � � P��0�Ws��

Parallel transport of a function f � C��G� is just

P�0��f � f��

An affine connection on a manifold G induces a unique differentiation on tensor fields

known as covariant differentiation (Helgason 1978, pg. 40). It is usual to denote the covariant

differentiation associated with a given affine connection by the same symbolr. One may think

of covariant differentiation of a tensor T (with respect to a vector field X � D�G�) evaluated

at a point � � G, as the limit

�rXT �� lims�0

1s�P��0�T ��s�� T �� 5�7�6�

where ��t� is the integral curve associated with X , ��0� � �, (Helgason 1978, pg. 42). In

particular, if T is a tensor of type �r� s� then rXT is also a tensor of type �r� s�. Considering

x5-7 Affine Connections and Covariant Differentiation 143

the above definition applied to a function f � C��G� one has

�rXf�� lims�0

1sf��s�� f��

� Df j� �X�� Xf��

Thus as expected, covariant differentiation on C��G� corresponds to derivation with respect

to the vector field X .

It is easily seen that covariant differentiation inherits property (5.7.1) from the affine

connection. To see that it satisfies the Leibniz formulae (rZ (T � R) � rZT�R�R�rZR)

one observes that any operation defined by a limit of the form (5.7.6) has the properties of a

classical derivative. A rigourous proof is given in Mishchenko and Fomenko (1980, pg. 329).

In particular, given a �0� 2� tensor g�� contracted with two vector fields X and Y then

rZ�g�X� Y �� rZg�X� Y � � g�rZX� Y � � g�X�rZY ��

GivenG a Riemannian manifold (with Riemannian metric g : TG�TG� R) there exists

a unique covariant differentiation satisfying

rXY �rYX � �X� Y � (5.7.7)

rZg � 0� (5.7.8)

for any smooth vector fields X , Y , Z � D�G� (Helgason 1978, pg. 48). The affine connection

associated with this covariant differentiation is known as the Levi-Civita connection. Consider

the action of the Levi-Civita connection on g�X� Y � for arbitrary vector fieldsX , Y ,Z � D�G�,

rZg�X� Y � � rZg�X� Y � � g�rZX� Y � � g�X�rZY �

� g�rXZ� Y � � g�X�rZY � � g��Z�X�� Y ��

By permuting the vector fieldsX , Y andZ, and then eliminatingrX andrY from the resulting

equations one obtains

2g�X�rZY � � Zg�X� Y ��g�Z� �X�Y ��Y g�X�Z��g�Y� �X�Z��Xg�Y�Z��g�X� �Y�Z��5�7�9�


Since X , Y and Z are arbitrary, this equation uniquely determines the Levi-Civita connection

in terms of the metric g.

5.8 Right Invariant Affine Connections on Lie-Groups

Let G be a smooth manifold and let : G� G be a smooth map from G into itself. An affine

connection r on G is invariant under if

drXY � rd�XdY�

IfG is a Lie-group thenr is termed right invariant ifr is invariant under each map r�� :� ��,

� � G.

Lemma 5.8.1 Let G be a Lie-group. There is a one-to-one correspondence between right

invariant affine connections on G and bilinear maps

� : TeG� TeG� TeG�

given by

��Y� Z� � �rdr�Y dr�Z��e�� 5�8�1�

for Y , Z � TeG.

Proof If r is an affine connection then (5.8.1) certainly defines a bilinear map from TeG �TeG� TeG.

Conversely, given a bilinear map � : TeG� TeG � TeG, let fE1� � � � � Eng be a linearly

independent basis for TeG. Define the n smooth right invariant vector fields �Ei � dr�Ei,

i � 1� � � � � n. Thus, for arbitrary vector fields Y , Z � D�G� there exist functions yi � C��G�,for i � 1� � � � � n and zj � C��G�, for j � 1� � � � � n such that Y �

Pni�1 yi

�Ei and Z �Pnj�1 zj

�Ej . One defines rY : D�G�� D�G�

rY Z �nXi�1

yi

nXj�1

zjdr��Ei� Ej� � � �Eizj� �E

j� �5�8�2�

x5-8 Right Invariant Affine Connections on Lie-Groups 145

To see that r is well defined observe that both � �Eizj� �Ej and � are bilinear in �Ei and �Ej

and thus the definition is independent of the choice of fE1� � � � � Eng. To see that r is an

affine connection one observes that linearity in �Ei ensures that (5.7.1) holds; while for any

f � C��G�

rY �fZ� �

�� nXi�1

yi

nXj�1

fzjdr��Ei� Ej� � f� �Eizj� �E

j

�A �nXi�1

yi

nXj�1

zj� �Eif� �Ej

� frY Z � �Y f�Z�

and (5.7.2) also holds.

Consider two arbitrary vector fields Y and Z and observe that

rdr�Y dr�Z �nXi�1

yi

nXj�1

zjdr��Ei� Ej� � ��dr� �E

i�zj�dr� �Ej�

� dr�

�� nXi�1

yi

nXj�1

zj��Ei� Ej� � � �Eizj� �E

j

�A �

� dr�rY Z

since for any � � G

�dr� �Ei�zj�� Dzj j� �dr� �Ei� (5.8.3)

� D�zj � r��j��1 � �Ei�

� Dzj j� � �Ei� � �Eizj��

Thus, r is a right invariant affine connection. Moreover, for any two right invariant vector

fields Y and Z

rY Z�e� � rdr�Yedr�Ze�e� � dre��Ye� Ze� � ��Ye� Ze��

and thusr satisfies (5.8.1). This completes the proof.

The following result provides an important relationship between the exponential map on

G (5.6.6), and geodesics with respect to right invariant affine connections. A proof for left

invariant connections is given in Helgason (1978, pg. 102).


Proposition 5.8.2 Let r be a right invariant affine connection and let � be given by (5.8.1)

then for any X � TeG

��X�X� � 0�

if and only if the geodesic �X : R� Gwith ��0� � X is an analytic Lie-group homomorphism

ofR into G.

In particular, if �X is a group homomorphism then then �X must be the unique group

homomorphism with d�X�1� � X (cf. Proposition 5.6.2). Thus, if ��X�X� � 0 then the

geodesic �X is just

�X � expX � �5�8�4�

the exponential map (5.6.6).

LetG be a Lie-group with an inner product ge : TeG�TeG� Ron the tangent space at the

identity. Let g be the right invariant group metric (cf. (5.3.1)), then the Levi-Civita connection

defined by g is also right invariant. To see this one computes rdr�Zdr�Y for arbitrary vector

fields X , Y , Z � D�G�. Using (5.7.9) it follows that

2g�dr�X�rdr�Zdr�Y � � dr�Zg�X� Y � � g�dr�Z� dr��X� Y �� dr�Y g�X�Z�

� g�dr�Y� dr��X�Z�� dr�Xg�Y� Z�� g�dr�X� dr��Y� Z��

since g is right invariant (cf. (5.3.2)) and d�X� Y � � �dX� dY � (Helgason 1978, pg. 24).

Recalling (5.8.3) one obtains

2g�dr�X�rdr�Zdr�Y � � Zg�X� Y � � g�Z� �X�Y �� Y g�X�Z� � g�Y� �X�Z��

�Xg�Y� Z�� g�X� �Y�Z��

� 2g�X�rZY ��

But g is right invariant, and thus 2g�dr�X�rdr�Zdr�Y � � 2g�dr�X� dr�rZY � which shows

that dr�rZY � rdr�Zdr�Y .

Example 5.8.3 Consider the general linear group GL�N�R� (cf. example Section 5.6). The

tangent space of GL�N�R� at the identity is TINGL�N�R� � RN�N since GL�N�R� is an

x5-8 Right Invariant Affine Connections on Lie-Groups 147

open subset ofRN�N . Consider the Euclidean inner product on TINGL�N�R�

hX� Y i � tr�XTY ��

The tangent space ofGL�N�R� at a point � � G is represented as T�GL�N�R� � fX� j X �RN�Ng the image ofTINGL�N�R� viadr�. The right invariant metric forGL�N�R�generated

by h� i is just

g : T�GL�N�R�� T�GL�N�R�� R�

g�Y� Z� � tr��1�TY TX��1��

The Levi-Civita connection r associated with g can be explicitly computed on the set of

right invariant vector fields on GL�N�R�. Let X , Y , Z � RN�N then X�, Y � and Z� are

the unique right invariant vector fields associated with X , Y and Z. Using (5.7.9) one has

2g�X��rZ�Y �� Z�g�X� Y � � g�Z�� X�� Y�� Y �g�X�Z� � g�Y� �X��Z��

�X�g�Y� Z�� g�X�� Y�� Z��

Now �Y �� Z�� is certainly right invariant (since d�X� Y � � �dX� dY � (Helgason 1978, pg.

24)). In particular, observe that Z�g�X� Y � � 0 � Y �g�X�Z� � X�g�Y� Z� since in each

case the metric computation is independent of �. Parallelling the argument leading to (5.6.4)

given in the example Section 5.6 for right invariant vector fields one obtains

�A��B�� BA �AB�� A�B��

Using this it follows that

2g�X��rZ�Y �� g�Z� �X� Y �� g�Y� �X�Z�� g�X� �Y� Z��

� tr�ZT �Y�X �� Y T �Z�X ��XT �Y� Z��


Evaluating the left hand side of this equation at � � IN and writing rZ�Y ��e� � ��Z� Y �

one obtains7

tr�XT��Z� Y �� 12

tr��ZT � Y �X � �Y T � Z�X �XT �Y� Z��

�12

tr�XT �Y T � Z� �XT �ZT � Y � �XT �Y� Z��

Consequently the bilinear map � for the Levi-Civita connection is given by

��Y� Z� �12

��Y T � Z� � �ZT � Y � � �Y� Z�

�� 5�8�5�

Note that ��X�X� � �XT � X � is zero if and only if X is a normal matrix, (i.e. commutes with

its transpose). Consequently, the exponential exp�tX� � etX on GL�N�R� is a geodesic if

and only if X is normal. �

5.9 Geodesics

In this section the relationship between geodesics on a Lie-group to geodesics on a homogeneous

space equipped with the normal metric is outlined. Though intuitively natural the result

is difficult to prove. The approach taken is to construct a coordinate basis which block

decomposes the Riemannian metric on the Lie-group into two parts, one of which is related

to the homogeneous space and the other of which lies in the kernel of the group action. This

construction is of interest in itself and justifies the somewhat long proof.

Let G be a Riemannian Lie-group with metric denoted g. Let r denote the Levi-Civita

connection. If � � G is arbitrary then the geodesics through � are just the curves ��t� :�

r� � �X�t� � �X�t�� where �X is a geodesic of G passing through e, the identity of G. To see

this one computes (cf. (5.7.4))

r �� rdr� dr� ��

� dr�r �� 0�

7One also needs the easily verified results tr�A�B�C�� tr��A�B�C�, �AT �B�T � �BT �A� and tr�A� � tr�AT �for arbitrary matrices A, B, C �RN�N .

x5-9 Geodesics 149

When dealing with a Riemannian manifold (equipped with the Levi-Civita connection)

there is an equivalent characterisation of geodesics using variational arguments. Loosely,

geodesics are curves of minimal (or maximal) length between two given points on a manifold.

The following result is proved in Mishchenko and Fomenko (1980, pg. 417).

Proposition 5.9.1 Let G be a Riemannian manifold with metric denoted g. Consider the cost

functional

E�� Z 1

0g� �� d��

on the set of all smooth curves � : �0� 1�� G. Then the extremals of E�� are geodesics on

M .

The cost functional E�� measures the action of a curve �. The length of a curve � is

measured by the related cost functional

L�� Z 1

0

qg� �� d��

Extremals of L�� correspond to curves that minimise (or maximise) the curve length between

��0� and ��1� on M . Extremals of E�� are also extremals of L�� (Mishchenko & Fomenko

1980, Theorem 2, pg. 417), however, the converse is not true. The reason for this is that the

uniqueness of geodesics ensures that a geodesic � : �0� 1� � G is uniquely parametrized by

t � �0� 1� whereas any curve ��t� � � � T , for T : �0� 1� � �0� 1� a smooth map, will have

the same length and consequently is an extremal of L��.

Theorem 5.9.2 Let G be a Lie-group, M be a smooth manifold and : G �M � M be a

smooth transitive group action of G on M . Let g denote a right invariant Riemannian metric

on G and gM denote the induced normal metric on M . If � : R� G is a geodesic (with

respect to the Levi-Civita connection) on G then the curve � : R�M

��t� :� ��t�� p��

is a geodesic (with respect to the Levi-Civita connection generated by the induced normal

metric) on M .

Proof It is necessary to develop some preliminary theory before proving the main result.


Denote the dimension of G by n and the dimension of M by m. Let p � M be arbitrary and

recall that the stabilizer of p, H � stab�p� � f� � G j �� p� � pg is a Lie-subgroup of

of dimension n �m of G. In particular, h the Lie-algebra of H is a Lie-subalgebra of g the

Lie-algebra of G. Let X � h and let exp be the exponential map on G, then t � exp�tX� is

a smooth curve lying in H (Warner 1983, pg. 104). Moreover, let fEm�1� � � � � Eng be a basis

for TeH then one can choose local coordinates for H around e

�xm�1� � � � � xn� � exp�xm�1Em�1� exp�xm�2Em�2� exp�xnEn��

These coordinates are known as canonical coordinates of the second kind for H and are

described in Varadarajan (Varadarajan 1984, pg. 89).

Extend the partial basis fEm�1� � � � � Eng of TeG to a full basis of TeG choosing the

remaining tangent vectors fE1� � � � � Emg to satisfy

g�Ei� Ej� � 0� i � 1� � � � � m� j � m� 1� � � � � n�

Let � � G be an arbitrary point in G and define canonical coordinates of the second kind on

G, centred at �,

x : Rn � G�

x�x1� � � � � xn� :� exp�x1E1� exp�x2E2� exp�xnEn��

Identify Rn � Rm �Rn�m as a canonical decomposition into the first m coordinates and the

remaining n�m coordinates. Define the two inclusion maps

i1 : Rm � Rn� i1�x1� � � � � xm� :� �x1� � � � � xm� 0� � � � � 0�

i2 : Rn�m � Rn� i2�xm�1� � � � � xn� :� �0� � � � � 0� xm�1� � � � � xn��

One now has maps

x1 : Rm � G� x1 :� x � i1 � exp�x1E1� exp�xmEm��

x5-9 Geodesics 151

and

x2 : Rn�m � G� x2 :� x � i2 � exp�xm�1Em�1� exp�xnEn��

The map x2 is just the canonical coordinates of the second kind for the embedded submanifold

r��H� � H�. The relationship of these maps is shown in the commutative diagram, Figure

5.9.

Observe that the range of dx1 is exactly dr��spfE1� � � � � Emg� since the map xi � r� �exp�xiEi�, which is exactly x�0� � � � � 0� xi� 0� � � � � 0� has differential

dx��

�xi� � dr�Ei � �Ei� �5�9�1�

where �Ei is the unique right invariant vector field on G associated withEi � TeG. In addition,

one has d p �Ei � 0 for i � m � 1� � � � � n since H is a coset of the stabilizer stab�p�. Recall

the definition of dom d p (cf. (5.3.6)). It follows directly that dom d p � spf �E1� � � � � �Emg.

Consider the map

y : Rm �M�

y�y1� � � � � ym� :� p � x � i1�y1� � � � � ym��

Observe from the above discussion that the differential dy � d p � dx1 is bijection. Thus,

the map y forms local coordinates for the manifold M centred at p��. This completes the

construction of the local coordinate charts shown in the commutative diagram, Figure 5.9.

��

PPPP

PPPP

PPPPi

R � M

�R G

��

�

Rm

Rn

�

�

�

��

p

y

i1

i2

x�

� R��

�

x1

x2

Rn�m

Figure 5.9.1: A commutative diagram showing the various coordinate charts and smooth

curves on G and M constructed in the proof of Lemma 5.9.2.


Consider the local coordinate representation of the Riemannian metric g in the coordinates

x. Canonically associating tangent vectors of Rn at a point x, Z � TxRn with the full space

Rn

Z �nXi�1

zi�

�xi � �z1� � � � � zn��

then the local coordinates representation of the metric g, denoted g� can be written in matrix

form

g��Y� Z� � Y TG�x�Z�

where G�x� � Rn�n is a positive definite, symmetric matrix. Now consider arbitrary vector

fields Y � �y1� � � � � ym� 0� � � � � 0� and Z � �0� � � � � zm�1� � � � � zn� then

g��Y� Z� � g�dxY� dxZ�

�mXi�1

nXj�m�1

yizjg�Ei� Ej� � 0�

Thus, the matrix G�x� is block diagonal of form

G�x� �

�B� G11�x� 0

0 G22�x�

�CA �

Moreover, since the maps shown in Figure 5.9 are commutative and the metric gM on M is

induced by the action of g on dom d p � spf �E1� � � � � �Emg it is easily shown that the local

coordinate representation of gM onRm is

g�M�Y� Z� � Y TG11�i1�y��Z � �di1Y �TG�i1�y��di1Z��

I proceed now to prove the main result. Let � : R� G be a geodesic and define

� : R� M�

��t� :� p � ��t��

Let � � R be a parameter and consider any one parameter smooth variation �� of the curve �

on M . Assume that �0 � � and ��t� is a smooth map from R� M . Both � and �� have

local coordinate representations on Rm in the coordinates described above. Denote the local

x5-9 Geodesics 153

coordinate representations by �� :� y�1 � � and �� :� y�1 � ��. Let �� : R� Rm be the

smooth curve

�� :� ��

since subtraction of vectors is well defined in Rm. The curves �, �, �� and �� are shown

on the commutative diagram, Figure 5.9. Denote the local coordinate representation of � by

�� x�1 � �. Observe that since � is a geodesic of G then �� is a geodesic of Rn equipped

with the metric g� (Mishchenko & Fomenko 1980, Lemma 3, pg. 345). Consider the following

one parameter smooth variation of ��

�� :� � � � i1 � �� 1 � ��1� � � � � ��m � ��m� �

�m�1� � � � � �

�n��

The action E�� onRn with respect to the Riemannian metric g� is

E��

Z 1

0g��

��d�

�Z 1

0� ��1� � � � � ��

��m� 0� � � � � 0�

TG�x�� 1� � � � � ��m� 0� � � � � 0�

�2� �� 1� � � � � ��m� 0� � � � � 0�

TG�x��0� � � � � 0� �� m�1� � � � � ��n�

� �0� � � � � 0� �� m�1� � � � � ��n�

TG�x��0� � � � � 0� �� m�1� � � � � ��n�d��

The middle term of this expansion is zero due to the block diagonal structure of G�x�while the

last term is independent of � since the perturbation i1 �� only enters in the firstm coordinates.

Thus, recalling the construction of ��, one has

d

d�E��

��0

�d

d�

Z 1

0� �� 1� � � � � ��

��m� 0� � � � � 0�TG�x�� 1� � � � � ��m� 0� � � � � 0�

�� 1� � � � � ��m� 0� � � � � 0�TG�x�� 1� � � � � ��m� 0� � � � � 0�d�

��0

�d

d�

Z 1

0� ��1� � � � � ��

��m�

TG11�x�� 1� � � � � ��

��m�d�

��0

�

�d

d�

Z 1

0g�M� ��

��

��d�

��0

�d

d�E��d�

��0

However, since �� is a geodesic it follows that � � is extremal and

d

d�E��

��0

� 0�


which means that the derivative dd�E��

��0

� 0. Thus, dd�E��

��0

� 0 on M and since

�� is an arbitrary smooth one parameter perturbation it follows that � is an extremal of the

action E on M . From Proposition 5.9.1 one now concludes that � is a geodesic and the proof

is complete.

Remark 5.9.3 It is also possible to construct geodesics on G from geodesics on M . Let

� : R � M be a geodesic and define � : R � G, by � :� x � i1 � y�1 � �. As above,

define �� :� y�1 � � and �� :� x�1 � � � i1 � ��. Then let �� : R � Rn be any one

parameter perturbation of �, �� :� x�1 �� and define �� : � ��. This construction induces a

perturbation in �� given by �� :� ��1� � � � � ��m

��1� � � � � ��m

. Furthermore,

one has by construction that the i’th component of �� is zero for i � m � 1� � � � � n. It follows

that

d

d�E��

��0

�d

d�E��

��0

�d

d�

Z 1

0� ��1� � � � � ��

��m� 0� � � � � 0�

TG�x�� 1� � � � � ��m� 0� � � � � 0�

��0� � � � � 0� ��m�1� � � � � ��n�TG�x��0� � � � � 0� ��m�1� � � � � ��n�d�

��0

�d

d�E��

��0

�d

d�E��2

��

��0

�

Here �2� denotes the curve in Rn with first m components zero and the remaining n � m

components given by the corresponding components of ��. Observe that �20 � 0 since �0 � 0

and thus � � 0 is a local minimum of E��2�� since E is positive definite. It follows that

dd�E��

2��0

� 0 while the first relationship dd�E��

��0

� 0 since �� is a geodesic. This

shows thatd

d�E��

��0

� 0

for any one-parameter perturbation �� of � which proves that � is a geodesic. �

Chapter 6

Numerical Optimization on

Lie-Groups and Homogeneous Spaces

The numerical algorithms proposed in Chapters 2 to 4 are based on a single idea, that of

interpolating the integral solution of a gradient flow via a series of curves lying wholly within

the constraint set. For each iteration, the particular curve chosen is tangent to the gradient flow at

the present estimate and the next estimate is evaluated using a time-step chosen to ensure the cost

function is monotonic decreasing (for minimisation problems) on the sequence of estimates

generated. Algorithms of this type are related to the classical gradient descent algorithms

on Euclidean space, for which the interpolation curves are straight lines. Consequently, the

algorithms proposed in the preceding chapters are termed modified gradient descent algorithms

where the modification is the use of a curve rather that a straight line to interpolate the gradient

flow.

The property of preserving the constraint while solving the optimization problem is a

fundamental property of the algorithms proposed. This property, termed constraint stability

(cf. page 2) is conceptually related to recent work in developing numerical integration schemes

that preserve invariants of the solution of an ordinary differential equation. Results on numerical

integration methods for Hamiltonian systems are particularly relevant to the general class of

problems considered in this thesis. Early work in this area is contained in the articles (Ruth

1983, Channell 1983, Menyuk 1984, Feng 1985, Zhong & Marsden 1988). A recent review

155

156 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6

article of this work is Sanz-Serna (1991). Following from this approach is a general concept

of numerical stability (Stuart & Humphries 1994) which is loosely defined as the ability of

a numerical integration method to reproduce the qualitative behaviour of the continuous-time

solution of a differential equation. The development given in Stuart and Humphries (1994) is

not directly applicable to the solution of optimization problems since it is primarily focussed on

integration methods and considers only a single qualitative behaviour at any one time, either the

preservation of an invariant of a flow (Hamiltonian systems) or the convergence of the solution

to a limit point (contractive problems). In contrast, the optimization problems discussed in

this thesis require two simultaneous forms of numerical stability, namely preservation of the

constraint relation and convergence to a limit point within the constraint set.

This leads one to consider what properties a numerical method for optimization on a

homogeneous space should display. In Chapter 1 the three properties of simplicity, global

convergence and constraint stability were defined (page 2) in the context of numerical methods

for on-line and adaptive processes. The modified gradient descent algorithms proposed in the

early part of this thesis all displayed these properties. It is natural to ask whether the proposed

algorithms are in fact closely related. In particular, since the the only difference between the

proposed algorithms is in the curves used to interpolate the gradient flow it is important to

investigate the properties of these curves more carefully. Indeed, one may ask whether the

choice of curves can be justified or whether there may be more suitable choices available.

In this chapter I begin by reviewing the gradient descent algorithms proposed in Chapters 2

to 4 and using the theoretical results of Chapter 5 to develop a mathematical framework which

explains each algorithm as an example of the same concept. This provides a design procedure

for a deriving numerical methods suitable for solving any constrained optimization problem on

a homogeneous space.

The remainder of the chapter is devoted to developing a more sophisticated constrained

optimization algorithm exploiting the general theoretical framework provided by Chapter 5.

The method considered is based on the Newton-Raphson method reformulated (in coordinate

free form) to evolve explicitly on a Lie-group. Local quadratic convergence behaviour is proved

though the method is not globally convergent. To provide an interesting example the symmetric

eigenvalue problem is considered (first discussed in Chapter 2) and a Newton-Raphson method

derived for this case. It is interesting to compare the behaviour of this example with the classical

x6-1 Gradient Descent Algorithms on Homogeneous Spaces 157

shifted QR algorithm, however, it is not envisaged that the proposed method is competitive for

solving traditional problems. The interest in such methods is for solving numerical problems

for on-line and adaptive processes.

The chapter is divided into five sections. Section 6.1 discusses the theoretical foundation

of the modified gradient descent algorithms proposed in Chapters 2 to 4 and develops a general

template for generating such methods. Section 6.2 develops the general form of the Newton-

Raphson iteration on a Lie-group and proves quadratic convergence of the algorithm in a

neighbourhood of a given critical point. Section 6.3 provides a coordinate free formulation of

the Newton-Raphson algorithm. The theory is applied to the symmetric eigenvalue problem in

Section 6.4 and a comparison is made to the performance of the QR algorithm.

6.1 Gradient Descent Algorithms on Homogeneous Spaces

In this section the numerical algorithms proposed in Chapters 2 to 4 are discussed in the context

of the theoretical discussion of Chapter 5.

Recall the constrained optimization problem posed in Chapter 2 for computing the spectral

decomposition of a matrix H0. The algorithm proposed for this task was the double-bracket

algorithm (2.1.4)

Hk�1 � e��k �Hk�D�Hke�k�Hk�D��

where1 D � diag��1� � � � � �N�. The algorithm has the property of explicitly evolving on the

set

M�H0� � fUTH0U j U � O�N�g

of all orthogonal congruency transformations of H0. The set of orthogonal matrices O�N� is

certainly an abstract group and indeed is a Lie-subgroup ofGL�N�R� (Warner 1983, pg. 107).

The orthogonal group O�N� features in all of the numerical algorithms considered and it

seems a good opportunity to review its geometric structure.

1. The identity tangent space of O�N� is the set of skew symmetric matrices (Warner 1983,

1In Chapter 2 the diagonal target matrix was denoted N , however, to avoid confusion with the notation ofChapter 5 the target matrix is now denoted D and the dimension of the matrices is denoted N .


pg. 107)

TINO�N� � Sk�N� � f� � RN�N j � � ��T g�

2. The tangent space at a point U � O�N� is given by the image TINO�N� via the

linearization of rU : O�N�� O�N�, rU�W � :� WU (right translation by U )

TUO�N� � f�U � RN�N j � � Sk�N�g� �6�1�1�

3. By inclusion Sk�N� � RN�N is a Lie-subalgebra of the Lie-algebra gl�N�R� of

GL�N�R�. In particular, Sk�N� is closed under the matrix Lie-bracket operation

�X� Y � � Sk�N� if X and Y are skew symmetric.

4. The scaled Euclidean inner product on Sk�N�

h�1��2i � 2tr��T1 �2�

generates a right invariant group metric on O�N�,

g��1U��2U� � 2tr��T1 �2�� 6�1�2�

Observe that g��1U��2U� � 2tr�UT�T1 �2U� � h�1U��2Ui since UTU � IN . Thus

the right invariant group metric onO�N� is the scaled Euclidean inner product restricted

to each individual tangent space.

5. The Levi-Civita connection generated by the right invariant group metric (6.1.2) (cf.

Example 5.8.3) is associated with the bilinear map � : Sk�N�� Sk�N�� Sk�N�,

��1��2� � ��1��2��

This follows directly from (5.8.5) while observing that� � Sk�N� implies ��T �� 0.

The extra factor of 2 in(6.1.2) cancels the factor of 1�2 in (5.8.5).

6. The value of �� 0 for any � � Sk�N� and thus all curves

��t� � exp�t��

x6-1 Gradient Descent Algorithms on Homogeneous Spaces 159

are geodesics on O�N� passing through IN at time t � 0. By uniqueness this includes

all the possible geodesics on O�N� passing through IN .

7. Geodesics on O�N� passing through U � O�N� and with tangent vector ��0� � �U �TUO�N� at time t � 0 are given by (cf. Section 5.9)

��t� � exp�t��U�

Recall once more the double-bracket algorithm (2.1.4) Hk�1 � e��k�Hk�D�Hke�k�Hk�D�,

mentioned above. In Section 2.5 the associated orthogonal algorithm

Uk�1 � Uke�k �U

TkH0Uk�D�

was discussed and shown to be related to the double-bracket equation via the algebraic rela-

tionship

Hk � UTk H0Uk�

Unfortunately, Uke�k�UTkH0Uk�D� does not appear to be in the correct form for a geodesic

exp�t��U onO�N�. The reason for this lies in the characterisation ofM�H0� � fUTH0U j U �O�N�g. In particular, �U�H� � UTHU is not a group action of O�N� on M�H0�. The use

of this awkward definition for M�H0� is historical (cf. Brockett (1988) and the development

in Helmke and Moore (1994b, Chapter 2)). By considering the related characterisation

M�H0� � fWH0WT jW � O�N�g�

M�H0� is seen to be a homogeneous space with transformation group O�N� and group action

�W�H� � WHWT . Of course, all that has been done is to take the transpose of the orthogonal

matrices. It is easily shown that the associated orthogonal iteration for the new characterisation

of M�H0� is

Wk�1 � e��k �WkH0WTk�D�Wk�

Observe that this iteration is constructed from geodesics on O�N�. Thus, the associated

orthogonal iteration for the double-bracket algorithm is a geodesic interpolation of the flow

�W � �WH0WT � D�W . Using Lemma 5.9.2 geodesics on O�N� will map to geodesics on

M�H0� and one concludes that the the double-bracket algorithm itself is a geodesic interpolation


of the double-bracket flow

�H � �H� �H�D��

Recall that geodesics are curves of minimum length between two points on a curved surface

and are the natural generalization of straight lines to non-Euclidean geometry. Then, at least

for the double-bracket algorithm, the question posed in the introduction to this chapter, whether

the choice of interpolating curves in the proposed numerical algorithms is justified, is answered

in the affirmative.

It should not come as a surprise that the other algorithms proposed in Chapters 2 to 4 are

also geodesic interpolations of continuous-time flows. The algorithm proposed in Section 2.4 is

based directly on the double-bracket equation and can be analysed in exactly the same manner.

In Chapter 3 the Rayleigh gradient algorithm (3.2.1) is immediately in the correct form to

observe its geodesic nature. Indeed, for the rank-1 case (cf. Subsection 3.4.1) the geodesic

nature of the recursion has already been observed explicitly. Finally the pole placement

algorithm (4.6.1) proposed in Chapter 4

i�1 � ie��i�T

iFi�Q�T

iFi�A��

� e��i�F�iQ�TiFi�A�T

i� i

is explicitly a geodesic interpolation of the gradient flow (4.4.7)

� � �F� Q�A� TF � T � �

evolving directly on the Lie-group O�N�.

Thus, the algorithms proposed in Chapters 2 to 4 form a template for a generic numerical

approach to solving optimization problems on homogeneous spaces associated with the orthog-

onal group. In every case considered exponential interpolation of the relevant continuous-time

flow is equivalent to geodesic interpolation of the flow due to the specific structure of O�N�.

Care should be taken before the same approach is used for more abstract Lie-groups (the easily

constructed exponential interpolation curves may no longer be geodesics), nevertheless, the

basic structure of the algorithms presented is extremely simple and could be applied to almost

any optimization problem on a homogeneous space. Of course, step-size selection schemes

x6-2 Newton-Raphson Algorithm on Lie-Groups 161

must be determined for each new situation and the stability analysis depends on the step-size

selection. The basic properties of the algorithms will remain consistent, however, and provide

a useful technique for practical problems where the properties of constraint stability and global

convergence (cf. page 1) are more important than those of computation cost.

6.2 Newton-Raphson Algorithm on Lie-Groups

In this section a general formulation of the Newton-Raphson algorithm is proposed which

evolves explicitly on a Lie-group. Interestingly, the iteration can be expressed in terms of

Lie-derivatives and the exponential map. In practise, one still has to solve a linear system of

equations to determine the regression vector.

The Newton-Raphson algorithm is a classical (quadratically convergent) optimization tech-

nique for determining the stationary points of a smooth vector field (Kincaid & Cheney 1991,

pg. 64). Given Z : Rn � Rn a smooth vector field2 on Rn, let p � Rn be a stationary

point of Z (i.e. Z�p� � 0) and let q � Rn be an estimate of the stationary point p. Let

k � �k1� k2� � � � � kn�, with k1� � � � � kn non-negative integers be a multi-index and denote its

size by jkj � k1 � k2 � � kn. Expanding Z as a Taylor series around q one obtains for

each element of Z � �Z1� Z2� � � � � Zn�,

Zi�x� � Zi�q� ��X

jkj�1

�h1�k1 �hn�knk1! kn!

�jkjZi

��x1�k1 ��xn�kn �q�

where h � x � q � Rn and hj is the j’th element of h, and the sum is taken over all

multi-indices k with jkj � j for j � 1� 2� � � �. The Taylor series of an analytic3 function is

uniformly and absolutely convergent in a neighbourhood of q (Fleming 1977, pg. 97). Indeed,

if q is a good estimate of p one expects that only the first few terms of the Taylor series are

sufficient to provide a good approximation of Zi. Assume that p is known and consider setting

2When dealing with Euclidean space one naturally associates the element � �x i of the basis of TxRn with thebasis element ei of Rn (the unit vector with a 1 in the i’th position). This induces an isomorphism TxR

n � Rn

(Warner 1983, pg. 86) and one writes a vector field as map Z : Rn � Rn rather than the technically more correctZ : Rn � TRn , Z�x� � TxR

n .3In fact a smooth function f � C�M� on a smooth manifold M is defined to be analytic at a point p �M if the

Taylor series of f �, the expression of f in local coordinates centred at p, is uniformly and absolutely convergent ina neighbourhood of 0.


x � p � h � q. Ignoring all terms with jkj 2 one obtains the approximation

0 � Zi�p� � Zi�q� �nXj�1

�Zi

�xj�q�hj �

The Jacobi matrix is defined as the n � n matrix with �i� j�’th element �JqZ�ij � �Zi

�xj�q�

(Mishchenko & Fomenko 1980, pg. 16). Thus, the above equation can be rewritten in matrix

form as 0 � Z�q� � JqZ h. When JqZ is non-singular one can solve this relation uniquely

for h, an estimate of the residule error between q and p. Thus, one obtains a new estimate q� of

p based on the previous estimate q and the correction h

q� � q � h�

This estimate is the next estimate of the Newton-Raphson algorithm. Given an initial estimate

q0 � Rn, the Newton-Raphson algorithm is:

Algorithm 6.2.1 [Newton-Raphson Algorithm onRn]

Given qk � Rn compute Z�qk�.

Compute the Jacobi matrix JqkZ given by �JqkZ�ij ��Zi

�xj�qk�.

Set h � ��JqkZ��1Z�qk�.

Set qk�1 � qk � h.

Set k � k � 1 and repeat.�

The convergence properties of the Newton-Raphson algorithm are given by the following

proposition (Kincaid & Cheney 1991, pg. 68).

Proposition 6.2.2 Let Z : Rn � Rn be an analytic vector field on Rn and p � Rn be a

stationary point of Z. Then there is a neighbourhood U of p and a constant C such that the

Newton-Raphson method (Algorithm 6.2.1) converges to p for any initial estimate q0 � U and

the error decreases quadratically

jjqk�1 � pjj � Cjjqk � pjj2�

It is not clear how best to go about reformulating the Newton-Raphson algorithm on an


arbitrary Lie-group G. One could use the Euclidean Newton-Raphson algorithm in separate

local coordinate charts on G. Care must be taken, however, since local coordinate charts may

display extreme sensitivity to perturbation in the Euclidean coordinates, leading to numerically

ill conditioned algorithms.

Given a Lie-groupG, let � C��G� be an analytic real function onG. Denote the identity

element of G by e and associate the tangent space TeG with the Lie-algebra g of G in the

canonical manner (cf. Section 5.6). For X � TeG arbitrary define a right invariant vector

field �X � D�G� by �X � dr�X , for i � 1� � � � � n, where r�� :� �� (cf. (5.1.1) and the

analogous definition for left invariant vector fields (5.6.1)). Recall that the map t � exp�tX�

(where the exponential is the unique Lie-group homomorphism associated with the Lie-algebra

homomorphism ��d�dt� � �X , cf. (5.6.7)) is an integral curve of �X passing through e at

time zero. Given � � G arbitrary, the map t � exp�tX�� is an integral curve of the right

invariant vector field �X passing through the point � � G at time zero. It follows directly from

this observation that

� �X��exp��X��t

�d

dt�exp��X��

��t

�

Indeed, there is a natural extension of this idea which generalizes to higher order derivatives.

These derivatives can be combined into a Taylor theorem for analytic real functions on a Lie-

group. Proposition 6.2.3 is proved in Varadarajan (Varadarajan 1984, pg. 96) and formalises

this concept. Before this result can be stated it is necessary to introduce some notation.

Notation: Let k � �k1� k2� � � � � kn�, with k1, k2� � � � non-negative integers, represent a multi-

index and denote its size by jkj � k1 � k2 � � kn. Let Z1� � � � � Zn be n objects and

let t � �t1� � � � � tn� be any set of n real numbers. The set of objects (in Proposition 6.2.3

the objects will be vector fields) of the form t1Z1 � � tnZn forms a vector space under

addition and scalar multiplication. One also considers formal products of elements, for example

�t1Z1��t2Z2��t1Z1� � t21t2�Z1Z2Z1�, where the scalar multiplication is commutative but

multiplication between elements Z1 and Z2 is non-commutative. One defines an additional

element 1 � Z0 which acts as a multiplicative identity,Z0�t1Z1� � �t1Z1� � �t1Z1�Z0. Given

a multi-index k � �k1� k2� � � � � kn� consider a second multi-index �i1� � � � � ijkj� with jkj entries

ip � �0� 1� � � � � n� such that the number of occurrences where ip � j for 1 � j � n is exactly


kj . Let Z � t1Z1 � � tnZn then the formal power Zk is defined by

�t1Z1 � � tnZn�k �

1jkj!

X�i1�i2��ijkj�

�tk11 tknn ��Zi1Zi2 Zijkj��

In other words, the sum is taken over all permutations of elements of the form �ti1Zi1��ti2Zi2�

�tijkjZijkj� such that there are exactly k1 occurrences of t1Z1, k2 occurrences of t2Z2 etc.

Of course, if the size of jkj is equal to either zero or one then the situation is particularly simple

�t1Z1 � � tnZn�k � 1� for jkj � 0

�t1Z1 � � tnZn�k � tkjZkj � for jkj � 1�where kj � 1 is the only

nonzero element of k�

Proposition 6.2.3 Given G a Lie-group and � C��G� an analytic real function in a neigh-

bourhood of a point � � G, let X1� � � � � Xn � TeG be a basis for the identity tangent space of

G. Define the associated right invariant vector fields �Xi � dr�Xi, for i � 1� � � � � n and let k

represent a multi-index with n entries. The asymptotic expansion

�exp�t1X1 � � tnXn��

�

�Xjkj�0

tk11 tknnk1! kn!

�� X1 � � �Xn�k�� 6�2�1�

converges absolutely and uniformly in a neighbourhood of �.

Let G be a Lie-group and � C��G� be an analytic map on G. Choose a basis

X1� � � � � Xn � TeG for the identity tangent space of G and define the associated right in-

variant vector fields �Xi � dr�Xi, for i � 1� � � � � n. Expressing as a Taylor series around a

point � � G one has

�exp�t1X1 � � tnXn��

� ��

nXj�1

tj� �Xj�� O�jjtjj2� �6�2�2�

where O�jjtjj2� represents the remainder of the Taylor expansion, all terms for which jkj 2.

By taking the derivative of this relation with respect to the vector fields �Xi and discarding the

higher order terms one obtains the approximation

�Xi�exp�t1X1 � � tnXn��

� �Xi�� nXj�1

� �Xi�Xj��tj� �6�2�3�


Define the Jacobi matrix of to be the n� n matrix with �i� j�’th element

�J��ij � � �Xi�Xj�� 6�2�4�

which is dependent on the choice of basisX1� � � � � Xn for TeG. Define the two column vectors

t � �t1� � � � � tn�T and �� X1�� Xn��

T . Recalling the discussion of the

Newton-Raphson method on Rn it is natural to consider the following iteration defined for

� � G

t � ��J��1��

�� exp�t1X1 � � tnXn��

The motivation for considering this algorithm parallels that given above for the Newton-

Raphson method on Rn. If � is a critical point of then �Xi�� 0 for each �Xi. Thus

assuming that exp�t1X1 � � tnXn�� and then solving the approximate relation for

�t1� � � � � tn� gives a new estimate �� exp�t1X1 � � tnXn��. It follows that if � was a

good estimate of � then the difference between and the approximate Taylor expansion should

be of order O�jjtjj2� and consequently the new estimate �� will be a correspondingly better

estimate of �. Given an initial point �0 � G and a choice of n basis elements fX1� � � � � Xngfor TeG the Newton-Raphson algorithm on G is:

Algorithm 6.2.4 [Newton-Raphson Algorithm on a Lie-group G]

Given �k � G compute ��k�.

Compute the Jacobi matrix �J�k� given by �J�k�ij � � �Xi�Xj��k�.

Set t � ��J�k��1��k�.

Set �k�1 � exp�t1X1 � � tnXn��k.


Lemma 6.2.5 Given G a Lie-group and � C��G� an analytic real function with a critical

point� � G, let � � G be arbitrary and define f : Rn � G, f�t� :� exp�t1X1� � tnXn��

to be canonical coordinates of the first kind on G centred at � (Varadarajan 1984, pg. 88).

Define a smooth vector field Z on Rn by Z�t� � � �X1�f�t�� Xn�f�t��. An iteration

of the Newton-Raphson algorithm (Algorithm 6.2.4) on G with initial condition � is the image


of a single iteration of the Newton-Raphson algorithm (Algorithm 6.2.1) on Rn with initial

condition 0 via the canonical coordinates f .

Proof Observe that

Z�0� � � �X1�f�0�� Xn�f�0��

� ��

Also for 1 � i� j � n one finds

�

�tiZj

��t�0

��

�ti�Xj�f�t��

��t�0

��

�ti� �Xj � r��exp�tiXi��

��ti�0

� �Xi�Xj��

since ddrg�exp�rX��

��r�0

� �Xg�e� for any g � C��G� and X � g. Thus, the two Jacobi

matrices J0Z � J�� are equal. The Newton-Raphson algorithm onRn is just

t � 0� �J0Z��1Z�0� � ��J��1��

and the image of t is exactly �� exp�t1X1 � � tnXn�� the Newton-Raphson algorithm

on G.

It is desirable to prove a similar result to Proposition 6.2.2 for the Newton-Raphson method

on a Lie-group. To compute the rate of convergence one needs to define a measure of distance

in a neighbourhood of the critical point considered. Let� � G be a neighbourhood of a critical

point � � G of an analytic function � C��G� and let fX1� � � � � Xng be a basis for TeG as

above. There exists a subset U � � such that the canonical coordinates of the first kind on

G centred at �, �t1� � � � � tn� � exp�t1X1 � � tnXn�� are a local diffeomorphism onto U

(Helgason 1978, pg. 104). One defines distance withinU by the distance induced on canonical

coordinates centred at � by the Euclidean norm inRn,

jj exp�t1X1 � � tnXn��jj ��

nXi�1

�ti�2

� 12

�


Lemma 6.2.6 Given � C��G� an analytic real function on a Lie-group G let � � G be a

critical point of . There exists a neighbourhoodW � G of p and a constantC � 0 such that

the Newton-Raphson algorithm on G (Algorithm 6.2.4) converges to � for any initial estimate

�0 � W and the error, measured with respect to distance induced by canonical coordinates of

the first kind, decreases quadratically

jj�k�1 � �jj � Cjj�k � �jj2�

Proof LetU1 � Rn be an open neighbourhood of 0 inRn and define a smooth vector byZ�x� �

� �X1�f�x�� Xn�f�x��, where f : Rn � G, f�x� :� exp�x1X1 � � xnXn�� are

canonical coordinates of the first kind. Since � is a critical point of then 0 is a stationary

point of Z, i.e. Z�0� � 0. Applying Proposition 6.2.2 one obtains an open neighbourhood

U2 � U1 of 0 and a constant C1 such that the Newton-Raphson algorithm on Rn (Algorithm

6.2.1) converges quadratically to zero for any initial condition in U2.

A standard result concerning the exponential of the sum of two elements of a Lie-algebra

is (Helgason 1978, pg. 106)

exp�X� exp�Y � � exp��X � Y � � O�jjX jj jjY jj��

for X and Y sufficiently small. By this one means there exists an open set � � g containing

0 and a real number C2 � 0 such that for X , Y � � � g there exists Z� � g such that

exp�Z� � exp�X� exp�Y � and jjZjj � C2jjX jj jjY jj. Of course Rn � g via the isomorphism

x � x1X1 � � xnXn and � corresponds to an open set U3 � Rn. Let r � 0 such that the

open ball Br � fx � Rn j jjxjj � rg is fully contained in U3 � U2.

Let

U �

�x � Rn j x � B� � � � minfr�2�

14�C1 � C2�

g��

and define W � f�U� � G as the image of U via the canonical coordinates of the first kind

centred at �.

The proof proceeds by induction. Assume �k � W and qk � U such that f�qk� � �k. Let

t�k�1 denote the next iterate of the Newton-Raphson algorithm on Rn (Algorithm 6.2.1) with


initial condition qk . From above one has

jjt�k�1jj � C1jjqkjj2 � 12jjqkjj�

where the first inequality follows from the fact that U � U1 and the second follows from

the fact that jjqkjj � 14�C1�C2�

. Define tk�1 � t�k�1 � qk and observe that the affine change

of basis x � x � qk � x� preserves the form of the Newton-Raphson algorithm (Algorithm

6.2.1) applied to the transformed vector field Z��x�� Z�x� � qk� � Z�x�. Thus, tk�1 is the

next iterate of the Newton-Raphson algorithm (Algorithm 6.2.1) for the vector field Z� and the

initial condition 0. Moreover

x� � exp��x��1X1 � � �x��nXn��k�

are the canonical coordinates of the first kind centred at �k. In particular, applying Lemma

6.2.5, it follows that the next iterate of the Newton-Raphson algorithm on G (Algorithm 6.2.4)

�k�1 is

�k�1 � exp�t1k�1X1 � � tnk�1Xn��k�

Substituting �k � exp�q1kX1 � � qnkXn�� one has

�k�1 � exp�nXi�1

tik�1Xi� exp�nXi�1

qikXi��

But

jjtk�1jj � jjt�k�1jj� jjqkjj � 2jjqkjj � 2Br 2 � U3

and thus there exists qk�1 such that

exp�nXi�1

qik�1Xi� � exp�nXi�1

tik�1Xi� exp�nXi�1

qikXi��

jjqk�1 � �tk�1 � qk�jj � jjqk�1 � t�k�1jj� C2jjqkjjjjtk�1jj � 2C2jjqkjj2�

By construction �k�1 � exp�Pn

i�1 qik�1Xi�� and

x6-3 Coordinate Free Newton-Raphson Methods 169

jjqk�1jj � jjqk�1 � t�k�1jj� jjt�k�1jj� 2C2jjqkjj2 � C1jjqkjj2�

To see that the sequence qk�1 does in fact converge to zero one observes that jjqk�1jj � 12 jjqkjj

since jjqkjj � 14�C1�C2�

. Observing that qk�1 is just the representation of the next iterate �k�1

of the Newton-Raphson algorithm on G (Algorithm 6.2.4) in local coordinates, one has

jj�k�1 � �jj � Cjj�k � �jj2�

where C � 2C2 � C1 and the proof is complete.

Remark 6.2.7 An interesting observation is that though each single iteration of the Newton-

Raphson algorithm (Algorithm 6.2.4) on G is equivalent to an iteration of the Euclidean

Newton-Raphson algorithm (Algorithm 6.2.1) in a certain set of local coordinates this is not

true of multiple iterations of the algorithm and the same coordinate chart. �

6.3 Coordinate Free Newton-Raphson Methods

The construction presented in the previous section for computing the Newton-Raphson method

on a Lie-group G depends on the construction of the Jacobi matrix J� (cf. (6.2.4)) which is

explicitly defined in terms of an arbitrary choice of n basis vectors fX1� � � � � Xng for TeG. In

this section the Newton-Raphson algorithm on an arbitrary Lie-group equipped with a right

invariant Riemannian metric is formulated as a coordinate free manner iteration.

LetG be a Lie-group with an inner product ge�� defined on TeG. Denote the right invari-

ant group metric that ge generates on G by g (cf. Section 5.3). Choose a basis fX1� � � � � Xngfor TeG which is orthonormal with respect to the inner product ge�� , (i.e. ge�Xi� Xj� � �ij ,

where �ij is the Kronecker delta function, �ij � 0 unless i � j in which case �ij � 1). Define

the right invariant vector fields

�Xi � dr�Xi�


associated with the basis vectors fX1� � � � � Xng. Since the basis fX1� � � � � Xng was chosen to

be orthonormal it follows that the decomposition of an arbitrary smooth vector fieldZ � D�G�,can be written

Z �nXj�1

zj �Xj �nXj�1

g� �Xj� Z� �Xj�

In particular, let � C��G� be an analytic real map on G and grad be defined with respect

to the metric g (cf. Section 5.4)

grad �nXj�1

g� �Xj� grad� �Xj �nXj�1

� �Xj� �Xj � �6�3�1�

Let t � �t1� � � � � tn� � Rn and define the vector field �X � D�G� by �X �Pn

j�1 tj�Xj which

is the right invariant vector field associated with the unique element X �Pn

j�1 tjXj � TeG.

Observe thatPn

j�1� �Xj��tj � � �X�� and consequently post-multiplying (6.2.3) by �Xi

and summing over i � 1� � � � � n one obtains the approximation

nXi�1

� �Xi��exp�X�� Xi �nXi�1

� �Xi�� Xi�nXi�1

�� Xi

nXj�1

�tj �Xj��

�A �Xi

� grad�� grad� �X��

Now assuming that exp�X�� is a critical point of then computing the regression vector for

the Newton-Raphson algorithm is equivalent to solving the coordinate free equation

0 � grad�� grad� �X�� 6�3�2�

for the vector field �X (or equivalently the tangent vector X � TeG that uniquely defines �X). In

Algorithm 6.2.4 the choice of fX1� � � � � Xng was arbitrary and it follows that solving directly

for �X using (6.3.2) is equivalent to setting X � t1X1 � � tnXn, where t � �t1� � � � � tn�

is the error estimate t � ��J��. Given an initial point �0 � G the Newton-Raphson

algorithm on a Lie-group G can be written in a coordinate free form as:

x6-3 Coordinate Free Newton-Raphson Methods 171

Algorithm 6.3.1 [Coordinate Free Newton-Raphson Algorithm]

Find Xk � TeG such that �Xk � dr�Xk solves

0 � grad��k� � grad� �Xk��k��

Set �k�1 � exp�Xk��k.


To compute grad� �X� one may use the identity

g�grad� �X�� Y �� Y �X��

��2

�t1�t2�exp�t1Y ��exp�t2X��

��t1�t2�0

�

where �Y is an arbitrary right invariant vector field. Explicitly computing the derivatives on the

right hand side for arbitrary �Y completely determines grad� �X� since the metric g is positive

definite. An example of the nature of this computation is given in the next section.

Remark 6.3.2 Without the insight provided by the Taylor expansion (Proposition 6.2.3) one

may guess the Newton-Raphson algorithm would be given by solving

0 � grad�r �Xgrad

for the right invariant vector field �X, where r is the Levi-Civita connection. However,

r �Xgrad �� grad� �X� except in particular cases. Let �Y � D�G� be an arbitrary right invariant

vector field, then from (5.7.9), one hasr �Xg�grad� �Y � � g�r �Xgrad� �Y �� g�grad�r �X�Y �.

Nowr �Xg�grad� �Y � � r �X�Y � �X �Y whiler �X

�Y � r�Y�X�� X� �Y � since the Levi-Civita

connection is symmetric. Thus, one obtains

0 � � �X �Y � g�r �Xgrad� �Y � � g�grad�r�Y�X� � g�grad� � �X� �Y ��

� � �Y �X� g�r �Xgrad� �Y � � g�grad�r�Y�X�


and consequently

g�grad� �X�� Y � � g�r �Xgrad� �Y � � g�grad�r�Y�X��

The value of r�Y�X is given by the unique bilinear map associated with the right invari-

ant affine connection r (cf. Section 5.8). One has r�Xgrad � grad� �X� if and only if

g�grad�r�Y�X� � 0 for all �Y . The most likely situation for this to occur is when the bilinear

map associated with r is identically zero. For the examples considered in this thesis this will

not be true. �

6.4 Symmetric Eigenvalue Problem

In this section the general structure developed in the previous two sections is used to derive a

coordinate free Newton-Raphson method for the symmetric eigenvalue problem. An advantage

of considering the symmetric eigenvalue problem is that one can compare the Newton-Raphson

algorithm to classical methods such as the shifted QR algorithm. This provides a good

perspective on the performance of the Newton-Raphson algorithm, however, I stress that the

method is not proposed as competition to state of the art numerical linear algebra methods for

solving the classical symmetric eigenvalue problem. Rather, the focus is still on adaptive and

on-line applications.

Recall the constrained optimization problem that was posed in Chapter 2 for computing

the spectral decomposition of a matrix H . It was shown that minimising the functional

��H� :� jjH �Djj2

� jjH jj2 � jjDjj2 � 2tr�DH��

on the set4

M�H0� � fUH0UT j U � O�N�g� H0 � HT

0 � �6�4�1�

where D � diag��1� � � � � �N� (a diagonal matrix with independent eigenvalues) is equivalent

4The original definition (2.1.2) is slightly different M�H0� � fUTH0U j U � O�N�g to the definition usedhere. The mapU � UTH0U , however, is not a group action and the definition given above is equivalent to (2.1.2).

x6-4 Symmetric Eigenvalue Problem 173

to computing the eigenvalues of H (Brockett 1988, Helmke & Moore 1994b). To apply the

theory developed in the previous section one must reformulate this optimization problem on

O�N� the Lie-group associated with the homogeneous space M�H0�. The new optimization

problem that considered is:

Problem A Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N � be

a diagonal matrix with real eigenvalues �1 � � � � � �N . Consider the potential

: O�N�� R�

�U� :� �tr�DUH0UT ��

Find an orthogonal matrix which minimises over O�N�. �

It is easily seen that if one computes a minimum U� of Problem A then U�H0UT� is a

minimum of �. Recalling Section 5.4 one easily verifies that the minimising gradient flow

solutions to Problem A will map via the group action to the minimisinggradient flow associated

with � (Helmke & Moore 1994b, pg. 50).

Computing a single iteration of the Newton-Raphson method (Algorithm 6.3.1) relies on

computing both grad and grad� �X� for an arbitrary right invariant vector field �X . Recall the

discussion in Section 6.1 regarding the Riemannian geometry of O�N�.

Lemma 6.4.1 Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N�

be a diagonal matrix with real eigenvalues �1 � � � � � �N and let

: O�N�� R�

�U� :� �tr�DUH0UT ��

Express the tangent spaces of O�N� by (6.1.1) and consider the right invariant group metric

(6.1.2).

a) The gradient of on O�N� is

grad � ��UH0UT � D�U�


b) Let X � Sk�N� be arbitrary and set �X � XU � drUX the right invariant vector field

on O�N� generated by X . The gradient of �X on O�N� is

grad� �X� � ��X�D�� UH0UT �U�

Proof Recall the definition of gradient (5.4.1) and (5.4.2). The Frechet derivative of in a

direction �U � TUO�N� is

DjU ��U� � �tr�D�UH0UT �� tr�DUH0��U�

T�

� tr��D�UH0UT �T�� g��D�UH0U

T �U��U��

Observing that ��D�UH0UT �U � TUO�N� completes the proof of part a).

For part b) observe that

�X � DjU �XU�

� tr��X�D�UH0UT ��

Taking a second derivative of this in an arbitrary direction �U one obtains

Dtr��X�D�UH0UT ��U��U� � tr��UH0U

T � �X�D��

� g��X�D�� UH0UT �U��U��

and thus grad� �X� � ��X�D�� UH0UT �U .

Recall the equation for the coordinate free Newton-Raphson method (6.3.2). Rewriting

this in terms of the expression derived in Lemma 6.4.1 gives the algebraic equation

0 � ��UH0UT � D�U � ��X�D�� UH0U

T �U

which one wishes to solve for X � Sk�N�.

Remark 6.4.2 To see that a solution to this equation exists observe that given a general linear

solutionX � Rn�n (which always exists since the equation is a linear systems of N 2 equations


in N 2 unknowns) then

��XT�� D�� UH0UT � � ��D�X �T � UH0U

T �

� ��X�D�� UH0UT �T

� ��UH0UT � D�T � �UH0U

T � D��

Thus, �XT is also a solution and by linearity so is �X�XT �2 . The question of uniqueness for

the solution X � Sk�N� obtained is unclear. In the case where UH0UT � is diagonal

with distinct eigenvalues it is clear that ��X�D�� 0 �� X�D� � 0 �� X � 0 and the

solution is unique. As a consequence a genericity assumption on the eigenvalues of H0 would

need to be made to obtain a general uniqueness result. I expect that once such an assumption

is made on the eigenvalues of H0 the skew solution of the linear system would be unique.

Unfortunately I have no proof for this result at the present time. �

Given an initial matrix H0 and choosing U0 � In then the Newton-Raphson solution to

Problem A is:

Algorithm 6.4.3 [Newton-Raphson Algorithm for Spectral Decomposition]

Find Xk � Sk�N� such that

��Xk� D�� UkH0UTk � � �UkH0U

Tk � D�� 6�4�2�

Set Uk�1 � eXkUk , where eXk is the matrix exponential of Xk.


Remark 6.4.4 To solve (6.4.2) one can reformulate the matrix system of linear equations as a

constrained vector linear system. Denote by vec�A� the vector generated by taking the columns

of A � Rl�m (for l and m arbitrary integers) one on top of the other. Taking the vec of both

sides of (6.4.2) gives5

��DUH0U

T �T � IN � �UH0UT ��D �D � �UH0U

T � � IN � �UH0UTD�

�vec�Xk�

� vec��UH0UT � D�� (6.4.3)

5Let A, B and C be real N N matrices and let Aij denote the ij’th entry of the matrix A. The Kronecker


Iteration

kD

ista

nce

|| H

– D

||

Gradient Descent

Newton–Raphson

Figure 6.4.1: Plot of jjHk �Djj where Hk � UkH0UT

kand Uk is a solution to both (6.4.4)

and Algorithm 6.4.2. The eigenvalues ofH0 are chosen to be ��1� � � � � �N � the eigenvalues ofD thoughH0 is not diagonal. Thus, the minimum Euclidean distance between Hk �M �H0�and D is zero. By plotting the Euclidean norm distance jjHk�Djj on a logarithmic scale thequadratic convergence characteristics of Algorithm 6.4.2 are displayed.

The constraint Xk � Sk�N� can be written as a vector equation

�IN 2 � P �vec�Xk� � 0�

where P is the N2 �N2 permutation matrix such that vec�A� � Pvec�AT �, A � RN�N .

In practice, it is known that a skew symmetric solution to (6.4.3) exists and one proceeds

by extracting the 12N�N� 1�� 1

2N�N� 1� submatrix of the N 2�N2 Kronecker product and

using Gaussian elimination to solve for the free variables Xij , i � j. �

Of course a Newton-Raphson algorithm cannot be expected to converge globally in O�N�

and for arbitrary choice of H0 one must couple the Newton-Raphson algorithm with some

other globally convergent method to obtain a practical numerical method. In the following

simulations the associated orthogonal iteration described in Section 2.5 is used. In fact the

product of two matrices is defined by

A�B �

�B� A11B � � � A1nB...

...An1B � � � AnnB

�CA � RN2�N2

�

A readily verified identity relating the vec operation and Kronecker products is (Helmke & Moore 1994b, pg. 314)

vec�ABC� � �CT � A�vecB�


algorithm implemented is a slight variation of (2.5.1)

Uk�1 � e��k�UkH0UTk�D�Uk � �6�4�4�

where the modification is due to the new definition (6.4.1) of M�H0�. The step size selection

method used is that given in Lemma 2.2.4

k �1

2jj�Hk� D�jj log�jj�Hk� D�jj2

jjH0jj jj�D� �Hk� D��jj � 1��

where Hk � UkH0UTk and Uk is a solution to (6.4.4). The minor difference between (6.4.4)

and the associated orthogonal double-bracket algorithm (2.5.1) does not affect the convergence

results proved in Chapter 1. It follows that (6.4.4) is globally convergent to an orthogonal

matrix U� such that U�H0UT� is a diagonal matrix with diagonal entries in descending order.

Figure 6.4.1 is an example of (6.4.4) combined with the Newton-Raphson algorithm 6.4.3.

The aim of the simulation is to display the quadratic convergence behaviour of the Newton-

Raphson algorithm. The initial condition used was generated via a random orthogonal congru-

ency transformation of the matrix D � diag�1� 2� 3�,

H0 �

�BBBB�2�1974 �0�8465 �0�2401

�0�8465 2�0890 �0�4016

�0�2401 �0�4016 1�7136

�CCCCA �

Thus, the eigenvalues ofH0 are 1, 2 and 3 and the minimum distance between D andM�H0� is

zero. In Figure 6.4.1 the distance jjHk�Djj is plotted for Hk � UkH0UTk and Uk is a solution

to both (6.4.4) and Algorithm 6.4.2. In this example the modified gradient descent method

(6.4.4) was used for the first six iterations and the Newton-Raphson algorithm was used for the

remaining three iterations. The plot of jjHk � Djj measures the absolute Euclidean distance

betweenHk andD. Naturally, there is some distortion involved in measuring distance along the

surface of M�H0�, however for limiting behaviour, jjHk �Djj is a reasonable approximation

of distance measured along M�H0�. The distance jjHk � Djj is expressed on a log scale to

show the linear and quadratic convergence behaviour. In particular, the quadratic convergence

behaviour of the Newton-Raphson algorithm is displayed by iterations seven, eight and nine in

Figure 6.4.1.


Iteration �Hk�21 �Hk�31 �Hk�41 �Hk�32 �Hk�42 �Hk�43

0 2 zero zero 4 zero 61 1.6817 3.2344 0.86492 1.6142 2.5755 0.00063 1.6245 1.6965 10�13

4 1.6245 0.0150 converg.5 1.5117 10�9

6 1.1195 converg.7 0.70718 converg.

Table 6.4.1: The evolution of the lower off-diagonal entries of the shiftedQRmethod describedby Golub and Van Loan (1989, Algorithm 8.2.3., pg. 423). The initial condition used is H �

0(6.4.5).

To provide a comparison of the coordinate free Newton-Raphson method to classical

algorithms the following simulation is completed for both the Newton-Raphson algorithm and

the shifted QR algorithm (Golub & Van Loan 1989, Section 8.2). The example chosen is

taken from page 424 of Golub and Van Loan (1989) and rather than simulate the symmetric

QR algorithm again the results used are taken directly from the book. The initial condition

considered is the tridiagonal matrix

H�0 �

�BBBBBBB�

1 2 0 0

2 3 4 0

0 4 5 6

0 0 6 7

�CCCCCCCA� �6�4�5�

To display the convergence properties of the QR algorithm (Golub & Van Loan 1989) give

a table in which they list the the values of the off-diagonal elements of each iterate generated

for the example considered. This table is included (in a slightly modified format) as Table 6.4.

Each element �Hk�ij is said to have converged when it has norm of order 10�12 or smaller.

The initial condition H �0 is tridiagonal and the QR algorithm preserves tridiagonal structure

and consequently the elements �Hk�31, �Hk�41 and �Hk�42 remain zero for all iterates. The

convergence behaviour of the symmetric QR algorithm is cubic in successive off-diagonal

entries. Thus, �Hk�43 converges cubically to zero, then �Hk�32 converges cubically and so on

(Wilkinson 1968). The algorithm as a whole, however, does not converge cubically since each


off-diagonal entry must converge in turn.

It is interesting to also display the results in a graphical format (Figure 6.4.2). Here the

norm jjHk � diag�Hk�jj

jjHk � diag�Hk�jj ��Hk�

221 � �Hk�

231 � �Hk�

241 � �Hk�

232 � �Hk�

242 � �Hk�

243

� 12�

is plotted verses iteration. This would seem to be an important quantity which indicates

robustness and stability margins of the numerical methods considered when the values of Hk

are uncertain or subject to noise in an on-line or adaptive environment. The dotted line show

the behaviour of the QR algorithm. The plot displays the property of the QR algorithm that it

must be run to completion to obtain a solution.

Figure 6.4.2 also shows the plot of jjHk� diag�Hk�jj for a sequence generated initially by

the modified gradient descent algorithm (6.4.4) (the first five iterations) and then the Newton-

Raphson algorithm (for the remaining three iterations). Since the aim of this simulation is to

show the potential of Newton-Raphson algorithm the parameters were optimized to provide

good convergence properties. The step-size for (6.4.4) was chosen as a constant k � 0�1

which is somewhat larger than the variable step-size used in the first simulation. This ensures

slightly faster convergence in this example, although in general there are initial conditions H0

for which the modified gradient descent algorithm is unstable with step-size selection fixed at

0.1. The point at which the modified gradient descent algorithm was halted and the Newton-

Raphson algorithm was begun was also chosen by experiment. Note that the Newton-Raphson

method acts directly to decrease the cost jjHk � diag�Hk�jj, at least in a local neighbourhood

of the critical point. It is this aspect of the algorithm that suggests it would be useful in an

on-line or adaptive environment.

Remark 6.4.5 It is interesting to note that in this example the combination of the modified gra-

dient descent algorithm (6.4.4) and the Newton-Raphson method (Algorithm 6.4.2) converges

in the same number of iterations as the QR algorithm. �


Iteration

kk

|| H

–

diag

( H

)

||

Gradient descentNewton–Raphson

QR algortihm

Figure 6.4.2: A comparison of jjHk�diag�Hk�jjwhereHk is a solution to the symmetricQRalgorithm (dotted line) and Hk � UkH0U

T

kfor Uk a solution to both (6.4.4) and Algorithm

6.4.2 (solid line). The initial condition is H �

0 (6.4.5).

Iteration �Hk�21 �Hk�31 �Hk�41 �Hk�32 �Hk�42 �Hk�43

0 2 0 0 4 0 61 2.5709 -0.0117 -0.0233 4.9252 -0.4733 4.07172 3.7163 -0.2994 0.2498 4.3369 -0.2838 1.47983 4.7566 -0.7252 -0.1088 2.5257 -0.0176 0.86434 1.1572 -0.2222 -0.8584 1.1514 -0.1216 0.28225 -0.0690 -0.0362 0.0199 -0.1112 0.0649 0.00756 0.0011 10�6 10�5 10�5 10�6 0.00117 converg. 10�9 10�10 10�10 10�9 10�11

8 converg. converg. converg. converg. converg.

Table 6.4.2: The evolution of the lower off-diagonal entries of Hk � UkH0UT

kwhere Uk is a

solution to Algorithm 6.4.2. The initial condition is H �

0 (6.4.5).


There are several issues that have not been resolved in the present chapter. In Section 6.1 it

is concluded that the modified gradient descent algorithms proposed in Chapters 2 to 4 can

be interpreted as geodesic interpolations of gradient algorithms. This provides a template for

generating numerical algorithms that solve optimization problems on homogeneous spaces

which have Lie transformation group O�N�, however things are somewhat more complicated

if one considers general matrix Lie-groups. Certainly, it is a simple matter to derive exponential

interpolation algorithms based on the same ideas and it would be interesting to investigate the

relationship between exponential and geodesic interpolations for GL�N�R�.

The full Newton-Raphson algorithm could also benefit from further study. In particular,


issues relating to rank degeneracy in the Jacobi matrix need to be addressed. These issues are

important since many relevant optimization problems are defined on a homogeneous space of

lower dimension than its Lie transformation group. In this situation the lifted potential on the

Lie-group will certainly have level sets of non-zero dimension and there will be directions in

which the Jacobi matrix is degenerate. This issue is related to the difficulties encountered in

determining whether a unique solution exists (6.4.2).

Once the Newton-Raphson method on a Lie-group is fully understood it should be a simple

matter to generalize the theory to an arbitrary homogeneous space. If there is an associated drop

in dimension this may result in computational advantages and the development of algorithms

that do not suffer from the degeneracy problems discussed above.

It is also interesting to consider the computational cost of the Newton-Raphson method

relative to classical algorithms such as the QR method. One would hope that the total

computational cost of a single step of the Newton-Raphson method would be comparable

to taht of a step of the QR method, especially if the matrix linear systems can be solved using

parallel algorithms.

The relationship between the modified gradient descent algorithms, the Newton-Raphson

algorithm and modern integration techniques that preserve a Hamiltonian function (Sanz-Serna

1991, Stuart & Humphries 1994) is worth investigating. It is hoped that the insights provided

by Hamiltonian integration techniques along with the perspective given by the present work can

be combined to design efficient optimization methods that preserve homogeneous constraints.

Chapter 7

Conclusion

7.1 Overview

The following summary outlines the contributions of this thesis.

Chapter 2: Two numerical algorithms are proposed for the related tasks of estimating the

eigenvalues of a symmetric matrix and estimating the singular values of an arbitrary matrix.

Associated algorithms which compute the eigenvectors and singular vectors associated with

the spectral decomposition of a matrix are also presented. The algorithms are based on gradient

descent methods and evolve explicitly on a homogeneous constraint set. Step-size selection

criteria are developed which ensure good numerical properties and strong stability results are

proved for the proposed algorithms. To reduce computational cost on conventional machines

a Pade approximation of the matrix exponential is proposed which also explicitly preserves

the homogeneous constraint. An indication is given of the manner in which a time-varying

symmetric eigenvalue problem could be solved using the proposed algorithms.

Chapter 3: The problem of principal component analysis of a symmetric matrix is considered

as a smooth optimization problem on a homogeneous space. A solution in terms of the limiting

solution of a gradient dynamical system is proposed. It is shown that solutions to the dynamical

system considered do indeed converge to the desired limit for almost all initial conditions.

A modified gradient descent algorithm, based on the gradient dynamical system solution,

182

x7-1 Overview 183

is proposed which explicitly preserves the homogeneous constraint set. A step-size selection

scheme is given along with a stability analysis that shows the numerical algorithm proposed

converges for almost all initial conditions.

Comparisons are made between the proposed algorithm and classical methods. It is shown

that in the rank-1 case the modified gradient descent algorithm is equivalent to the classical

power method and steepest ascent method for computing a single dominant eigenvector of

a symmetric matrix. However this equivalence does not hold for higher dimension power

methods and orthogonal iterations.

Chapter 4: The problems of system assignment and pole placement are considered for the set

of symmetric linear state space systems. A major contribution of the chapter is the observation

that the additional structure inherent in symmetric linear systems forces the solution to the

“classical” pole placement question to be considerably different to that expected based on

intuition obtained from the general linear case. In particular, generic pole placement can not

be achieved unless the system considered has as many inputs (and outputs) as states.

To compute feedback gains which assign poles as close as possible to desired poles (in a

least squares sense) a number of ordinary differential equations are proposed. By computing

the limiting solution to these equations for arbitrary initial conditions, estimates of locally

optimal feedback gains are obtained. A gradient descent numerical method, based on the

dynamical systems developed, is presented along with a step-size selection scheme and full

stability analysis.

Chapter 5: A review of the mathematical theory underlying the numerical methods proposed

in Chapters 2 to 4 is given. A brief review of Lie-groups and homogeneous spaces is given,

especially the class of homogeneous space which is most common in linear systems theory,

orbits of semi-algebraic Lie-groups. A detailed discussion of Riemannian metrics on Lie-

groups and homogeneous spaces is provided along with the motivation for the choice of the

metrics used elsewhere in this thesis. The derivation of gradient flows and the relationship

between gradient flows on a homogeneous space and its Lie transformation group is covered.

The convergence properties of gradient flows are discussed and a theorem is proved which is

useful for proving convergence of gradient flows in many practical situations.

The remainder of the chapter works towards developing a practical understanding of geo-

184 Conclusion Chapter 7

desics on Lie-groups and homogeneous spaces. The theory of Lie-algebras is discussed and

the exponential map is introduced. Affine connections are discussed and the the Levi-Civita

connection associated with a given Riemannian metric is introduced. Geodesics are defined

and the theory of right invariant affine connections is used to derive conditions under which the

exponential map generates a geodesic curve on a Lie-group. Finally, it shown that geodesics

on a Lie-group G are related to geodesics on a homogeneous space (with Lie transformation

group G) via the group action.

Chapter 6: The numerical algorithms proposed in Chapters 2 to 4 are reconsidered in the

context of the theory developed in Chapter 5. The proposed algorithms are seen to be specific

examples of general gradient descent methods using geodesic interpolation.

The remainder of the chapter is devoted to developing a Newton-Raphson algorithm which

evolves explicitly on an arbitrary Lie-group. The iteration is derived in canonical coordinates

of the first kind and then generalised into a coordinate free form. A theorem is given proving

quadratic convergence in a local neighbourhood of a critical point. An explicit Newton-

Raphson algorithm, based on the general theory developed, is derived for the symmetric

eigenvalue problem.

7.2 Conclusion

A primary motivation for considering the problems posed in this thesis is the recognition of

the advantages of numerical algorithms that exploit the natural geometry of the constrained

optimization problem that they attempt to solve. This idea is especially important for on-line

and adaptive engineering applications where the properties of simplicity, global convergence

and constraint stability (cf. page 2) become the principal goals.

The starting point for the new results proposed in this work is the consideration of con-

strained optimization problems on homogeneous spaces and Lie-groups. The regular geometry

associated with these sets is suitable for the constructions necessary to develop practical numer-

ical methods. Moreover, there are numerous examples of constrained optimization problems

arising in linear systems theory where the constraint set is a homogeneous space or Lie-group.

The early results presented in Chapters 2 to 4 of this thesis do not rely heavily on abstract

x7-2 Conclusion 185

Lie theory. Nevertheless, the algorithms proposed are specific examples of a more general

construction outlined in Chapter 6. For any smooth optimization problem on an orbit (cf.

Section 5.2) of the orthogonal group O�N� this construction can be summarised as follows:

1. Given a smooth homogeneous spaceM embedded in Euclidean space with transitive Lie

transformation group O�N� and group action : O�N��M �M , let : M � R be

a smooth cost function.

2. Equip O�N� with the right invariant group metric induced by the Euclidean metric

acting on the identity tangent space. I.e. for �1, �2 � Sk�N�, U � O�N� then �1U ,

�2U � TUO�N� and

h�1U��2Ui � tr��T1 �2��

3. Fix q �M and define the lifted potential

� : O�N�� R

��U� :� � �U� q��

4. Compute the gradient descent flow

�U � �grad��U��

on O�N� with respect to the metric h� i.

5. The modified gradient descent algorithm for � using geodesic interpolation on O�N� is

Uk�1 � e��skgrad��Uk�UTk�Uk�

where sk � 0 is a small positive number.

6. The modified gradient descent algorithm using geodesic interpolation for on M is

pk � �Uk� q��

186 Conclusion Chapter 7

7. Determine a step-size selection scheme f : M � R,

sk :� f�pk��

which guarantees �pk�1� � �pk� except in the case where pk is a critical point of .

In Chapter 6 a general construction is outlined for computing a Newton-Raphson algorithm

on a Lie-group (Algorithm 6.3.1). The advantage of this construction is that the algorithm

generated converges quadratically in a neighbourhood of the desired equilibrium. There are

two main disadvantages; firstly, convergence can only be guaranteed in a local neighbourhood

of the equilibrium and secondly, the theoretical construction is complicated and relies on

abstract geometric construction.

A comparison between the gradient descent algorithm and the Newton-Raphson algorithm

nicely displays the trade off between a simple linearly convergent numerical method associated

with strong convergence theory and a numerical method designed to converge quadratically

(or better) and associated with weaker convergence theory. The stability and robustness of the

first approach suggests that it would be of use in on-line and adaptive engineering applications

where reliability is more important than computational cost of implementation. The second

approach may also have applications in adaptive processes where accurate estimates are needed

and the uncertainties are small.

Bibliography

Ammar, G. & Martin, C. (1986). The geometry of matrix eigenvalue methods,Acta Applicandae

Mathematicae 5: 239–278.

Anderson, B. D. O. & Moore, J. B. (1971). Linear Optimal Control, Electrical Engineering

Network Series, Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A.

Anderson, B. D. O. & Moore, J. B. (1990). Optimal Control: Linear Qudratic Methods,

Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A.

Anderson, B. D. O. & Vongpanitlerd, S. (1973). Network Analysis and Synthesis: A Modern

Systems Theory Approach, Electrical Engineering, Prentice-Hall, Englewood Cliffs, N.J.,

U.S.A.

Aoki, M. (1971). Introduction to Optimization Techniques: Fundamentals and Applications of

Nonlinear Programming, Macmillan Co., New York, U.S.A.

Auchmity, G. (1991). Globally and rapidly convergent algorithms for symmetric eigenprob-

lems, SIAM Journal of Matrix Analysis and Applications 12(4): 690–706.

Baldi, P. & Hornik, K. (1989). Neural networks and principal component analysis: Learning

from examples without local minima, Neural Networks 2: 53–58.

Batterson, S. & Smillie, J. (1989). The dynamics of the Rayleigh quotient iteration, SIAM

Journal of Numerical Analysis 26(3): 624–636.

Batterson, S. & Smillie, J. (1990). Rayleigh quotient iteration for nonsymmetric matrices,

Math. Comp. 55(191): 169–178.

187

188 Bibliography

Bayer, D. A. & Lagarias, J. C. (1989). The nonlinear geometry of linear programming I, II,

Transactions of the American Mathematical Society 314: 499–580.

Bitmead, R. R. & Anderson, B. D. O. (1977). The matrix Cauchy index: properties and

applications, SIAM Journal of Applied Mathematics 33(4): 655–672.

Bloch, A. M. (1985a). A completely integrable Hamiltonian system associated with line fitting

in complex vector spaces, Bulletin of the American Mathematical Society 12(2): 250–254.

Bloch, A. M. (1985b). Estimation, principal components and Hamiltonian systems, Systems

and Control Letters 6: 103–108.

Bloch, A. M. (1990a). The Kaehler structure of the total least squares problem, Brockett’s

steepest descent equations and constrained flows, in A. C. M. R. M. A. Kaashoek, J.

H. van Schuppen (ed.), Realization and Modelling in System Theory, Birkhauser Verlag,

Boston.

Bloch, A. M. (1990b). Steepest descent, linear programming and Hamiltonian flows, Contem-

porary Math. 114: 77–88.

Bloch, A. M., Brockett, R. W. & Ratiu, T. (1990). A new formulation of the generalised Toda

lattice equations and their fixed point analysis via the momentum map, Bulletin American

Mathematical Society 23(2): 477–485.

Bloch, A. M., Brockett, R. W. & Ratiu, T. (1992). Completely integrable gradient flows,

Communications in Mathematical Physics 23: 447–456.

Bloch, A. M., Flaschka, H. & Ratui, T. (1990). A convexity theorem for isospectral sets of

Jacobi matrices in a compact Lie algebra, Duke Math. J. 61: 41–65.

Blondel, V. (1992). Simultaneous Stabilization of Linear Systems, PhD thesis, Faculte des

Sciences Appliquee, Universite Catholique de Louvain.

Blondel, V., Campion, G. & Gevers, M. (1993). A sufficient condition for simultaneous

stabilization, IEEE Transactions on Automatic Control 38: 1264–1266.

Bouland, H. & Karp, Y. (1989). Auto-association by multilayer perceptrons and the singular

value decomposition, Biological Cybernetics 59: 291–294.

Bibliography 189

Brockett, R. W. (1988). Dynamical systems that sort lists, diagonalise matrices and solve

linear programming problems, Proceedings IEEE Conference on Decision and Control,

pp. 799–803.

Brockett, R. W. (1989a). Least squares matching problems, Linear Algebra and its Applications

122-124: 761–777.

Brockett, R. W. (1989b). Smooth dynamical systems which realize arithmetical and logical

operations, Three Decades of Mathical Systems Theory, number 135 in Lecture Notes in

Control and Information Sciences, Springer-Verlag, pp. 19–30.

Brockett, R. W. (1991a). Dynamical systems that learn subspaces, in A. C. Antoulas (ed.),

Mathematical sytems theory - The Influence of Kalman.

Brockett, R. W. (1991b). Dynamical systems that sort lists, diagonalise matrices and solve

linear programming problems, Linear Algebra and its Applications 146: 79–91.

Brockett, R. W. (1993). Differential geometry and the design of gradient algorithms, Proceed-

ings of Symposia in Pure Mathematics, Vol. 54, pp. 69–92.

Brockett, R. W. & Byrnes, C. B. (1979). On the algebraic geometry of the output feedback pole

placement map, IEEE Conference on Decisions and Control, Fort Lauderdale, Florida,

U.S.A., pp. 754–757.

Brockett, R. W. & Byrnes, C. I. (1981). Multivariable Nyquist criteria, root loci and pole

placement: A geometric viewpoint, IEEE Transactions on Automatic Control 26(1): 271–

283.

Brockett, R. W. & Faybusovich, L. E. (1991). Toda flows, inverse spectral transform and

realisation theory, Systems and Control Letters 16: 79–88.

Brockett, R. W. & Krishnaprasad, P. S. (1980). A scaling theory for linear systems, IEEE

Transactions of Automatic Control 25: 197–207.

Brockett, R. W. & Wong, W. S. (1991). A gradient flow for the assignment problem, in

G. Conte, A. M. Perdon & B. Wyman (eds), Progress in System and Control Theory,

Birkhauser, pp. 170–177.

190 Bibliography

Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms,

II: The new algorithm, Journal Institute of Mathematics and its Applications 6: 222–231.

Burrage, K. (1978). High order algebraically stable Runge-Kutta methods, B.I.T. 18: 373–383.

Burrage, K. & Butcher, J. C. (1979). Stability criteria for implicit Runge-Kutta methods, SIAM

Journal of Numerical Analysis 16(1): 46–57.

Butcher, J. (1987). The Numerical Analysis of Ordinary differential equations: Runge Kutter

and general linear methods, John Wiley and Sons, Chichester, U.K.

Butcher, J. C. (1975). A stability property of implicit Runge-Kutta methods, B.I.T. 15: 358–361.

Buurema, H. J. (1970). A geometric proof of convergence for the QR method, Phd. thesis,

Rijksuniversiteit Te Groningen.

Byrnes, C. I. (1978). On certain families of rational functions arising in dynamics, Proceedings

of the IEEE pp. 1002–1006.

Byrnes, C. I. (1983). On the stability of multivariable systems and the Ljusternik-snivel’mann

category of real Grassmanians, Systems and Control Letters 3: 255–262.

Byrnes, C. I. (1989). Pole placement by output feedback, Three Decades of Mathematical

Systems Theory, Vol. 135 of Lecture Notes in Control and Information Sciences, Springer-

Verlag, pp. 31–78.

Byrnes, C. I. & Martin, C. F. (eds) (1980). Algebraic and geometric methods in linear systems

theory, Vol. 18 of Lectures in Applied Mathematics, American Mathematical Society,

Providence, Rhode Island, U.S.A.

Byrnes, C. I. & Willems, J. C. (1986). Least-squares estimation, linear programming and

momentum: A geometric parametrization of local minima, IMA Journal of Mathematical

Control and Information 3: 103–118.

Byrnes, C. I., Hazewinkel, M., Martin, C. & Rouchaleau, Y. (1980). Introduction to geometrical

methods for the theory of linear systems, Geometrical Methods for the Theory of Linear

Systems, D. Reidel Publ. Comp. see also Reprint Series, 273, Erasmus University,

Rotterdam.

Bibliography 191

Cauchy, A. L. (1847). Methode generale pour la resolution des systems d’equationssimultanees,

Comptes Rendus Academie Science Paris XXV: 536–538.

Channell (1983). Symplectic integration algorithms, Technical Report 83-9, Los Alamos

National Laboratory.

Chu, M. T. (1984a). The generalized Toda flow, the QR-algorithm and the center manifold

theory, SIAM Journal Discrete Mathematics 5(2): 187–201.

Chu, M. T. (1984b). On the global convergence of the Toda lattice for real normal matrices

and its application to the eigenvalue problems, SIAM Journal Mathematical Analysis

15: 98–104.

Chu, M. T. (1986). A differential equation approach to the singular value decomposition of

bidiagonal matrices, Linear Algebra and its Applications 80: 71–80.

Chu, M. T. (1988). On the continuous realization of iterative processes, SIAM Review

30(3): 375–387.

Chu, M. T. (1991a). A continuous Jacobi-like approach to the simultaneous reduction of real

matrices, Linear Algebra and its Applications 147: 75–96.

Chu, M. T. (1991b). Least squares approximation by real normal matrices with specified

spectrum, SIAM Journal of Matrix Analysis and Applications 12(1): 115–127.

Chu, M. T. (1992). Numerical methods for inverse singular value problems, SIAM Journal on

Numerical Analysis 29(3): 885.

Chu, M. T. & Driessel, K. R. (1990). The projected gradient method for least squares matrix ap-

proximations with spectral constraints, SIAM Journal of Numerical Analysis 27(4): 1050–

1060.

Chu, M. T. & Driessel, K. R. (1991). Constructing symmetric non-negative matrices with

prescribed eigenvalues by differential equations, SIAM Journal on Mathematical Analysis

22(5): 1372–1387.

Colonius, F. & Klieman, W. (1990). Linear control semigroup acting on projective space,

Technical Report 224, Universitat Augsburg, Germany.

192 Bibliography

Cooper, G. J. (1986). On the existence of solution for algebraically stable Runge-Kutta methods,

IMA Journal of Numerical Analysis 6: 325–330.

Crouch, P. E., Grossman, R. & Yan, Y. (1992). On the numerical integration of the dynamic

attitude equations, Proceedings of the IEEE Conference on Decision and Control, Tucson,

Arizona, pp. 1497–1501.

Crouch, P. E., Grossman, R. & Yan, Y. (1994). A third order Runge-Kutta algorithm on a

manifold, preprint.

Crouch, P. E. & Grossman, R. (1994). Numerical integrations of ordinary differential equations

on manifolds, to appear in Journal of Nonlinear Science.

Curry, H. (1944). The method of steepest descent for nonlinear minimization problems,

Quarterly Applied Mathematics 2: 258–261.

Dahlquist, G. (1978). G-stability is equivalent to A-stability, B.I.T. 18: 384–401.

Davidon, W. C. (1959). Variable metric methods for minimization,Technical Report ANL-5990,

Argonne National Laboratory.

Davison, E. J. & Wang, S.-H. (1973). Properties of linear time-invariant multivariable systems

subject to arbitrary output and state feedback, IEEE Transactions on Automatic Control

pp. 24–32.

Davison, E. J. & Wang, S.-H. (1975). On pole assingment in linear multivariable systems using

output feedback, IEEE Transactions on Automatic Control pp. 516–518.

Deift, P., Nanda, T. & Tomei, C. (1983). Ordinary differential equations for the symmetric

eigenvalue problem, SIAM Journal of Numerical Analysis 20(1): 1–22.

Driessel, K. R. (1986). On isospectral gradient flows - solving matrix eigenproblems using

differential equations, in J. R. Cannon & U. Hornung (eds), Inverse Problems, Birkhauser

Verlag, pp. 69–90.

Duistermaat, J. J., Kolk, J. A. C. & Varadarajan, V. S. (1983). Functions, flows and oscilla-

tory integrals on flag manifolds and conjugacy classes in real semi-simple Lie groups,

Composito Mathematica 49: 309–398.

Bibliography 193

Ehle, B. L. (1973). A-stable methods and pade approximations to the exponential, SIAM

Journal of Mathematical Analysis 4: 671–680.

Euler, L. (1913). De integratione aequationum differentialium per approximationem, Opera

Omnia, 1’st series, Vol. 11, Institutiones Calculi Integralis, Teubner, Leipzig, Germany,

pp. 424–434.

Faddeev, D. K. & Faddeeva, V. N. (1963). Computational Methods of Linear Algebra, W. H.

Freeman and Co., San Francisco.

Falb, P. (1990). Methods of Algebraic Geometry in Control Theory: Part I, Vol. 4 of Systems

and Control: Foundations and Applications, Birkhauser, Boston, U.S.A.

Faybusovich, L. (1992a). Inverse problems for orthogonal matrices, Toda flows, and signal

processing, Proceedings of the IEEE Conference on Decision and Control.

Faybusovich, L. E. (1989). QR-type factorisations, the Yang-Baxter equation and eigenvalue

problem of control theory, Linear Algebra and its Applications 122-124: 943–971.

Faybusovich, L. E. (1991a). Dynamical systems which solve optimisation problems with linear

constraints, IMA Journal of Information and Control 8: 135–149.

Faybusovich, L. E. (1991b). Hamiltonian structure of dynamical systems which solve linear

programming problems, Physica D .

Faybusovich, L. E. (1992b). Toda flows and isospectral manifolds, Proceedings of the American

Mathematical Society 115(3): 837–847.

Feng (1985). Difference schemes for Hamiltonian formalism and symplectic geometry, Journal

of Computational Mathematics 4: 279–289.

Flashka, H. (1974). The Toda lattice, II. Existence of integrals, Physical Review B 9(4): 1924–

1925.

Flashka, H. (1975). Discrete and periodic illustrations of some aspects of the inverse methods,

in J. Moser (ed.), Dynamical System, Theory and Applications, Vol. 38 of Lecture Notes

in Physics, Springer-Verlag, Berlin.

Fleming, W. (1977). Functions of Several Variables, Undergraduate texts in Mathematics,

Springer-Verlag, New York, U.S.A.

194 Bibliography

Fletcher, R. (1970). A new approach to variable metric algorithms, Computer Journal

13(3): 317–322.

Fletcher, R. & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization,

Computer Journal 6: 163–168.

Fletcher, R. & Reeves, C. M. (1964). Function minimization by conjugate gradients, Computer

Journal 7: 149–154.

Forsythe, G. E. (1968). On the asymptotic directions of the s-dimensional optimum gradient

method, Numerische Mathematik 11: 57–76.

Friedland, S., Nocedal, J. & Overton, M. L. (1987). The formulation and analysis of numerical

methods for inverse eigenvalue problems, SIAM Journal of Numerical Analysis 24: 634–

667.

Gear, C. W. (1968). The automatic integration of stiff ordinary differential equations, in

A. J. Morrell (ed.), Information Processing 68: Proceedings IFIP Congress, Edinburgh,

pp. 187–193.

Gevers, M. R. & Li, G. (1993). Parametrizations in Control,Estimation and FilteringProblems:

Accuracy Aspects, Communications in Control Engineering, Springer Verlag, London,

United Kingdom.

Ghosh, B. K. (1988). An approach to simultaneous system design, part II: Nonswitching gain

and dynamic feedback compensation by algebraic geometric methods, SIAM Journal of

Control and Optimization 26(4): 919–963.

Gibson, C. G. (1979). Singular Points of Smooth Mappings, Vol. 25 of Research Notes in

Mathematics, Pitman, London, United Kingdom.

Godbout, L. F. & Jordan, D. (1989). Gradient matrices for output feedback systems, Interna-

tional Journal of Control 32(5): 411–433.

Goldfarb, D. (1970). A family of variable metric methods derived by variational means,

Mathematics of Computation 24: 23–26.

Golub, G. H. & Van Loan, C. F. (1989). Matrix Computations, The Johns Hopkins University

Press, Baltimore, Maryland U.S.A.

Bibliography 195

Greenspan, D. (1974). Discrete numerical methods in physics and engineering, Academic

Press, New York, U.S.A.

Greenspan, D. (1984). Discrete numerical methods in physics and engineering, Journal of

Computational Physics 56: 21.

Hazewinkel, M. (1979). On identification and the geometry of the space of linear systems,

Lecture Notes in Control and Information Science, Vol. 16, Springer-Verlag, Berlin. see

also Reprint Series, No. 245, Erasmus University, Rotterdam.

Helgason, S. (1978). Differential Geometry, Lie Groups and Symmetric Spaces, Academic

Press, New York.

Helmke, U. (1984). Topology of the moduli space for reachable linear dynamical systems:

The complex case, Technical Report 122, Forschungsschwerpunkt Dynamische Systeme,

University of Bremen.

Helmke, U. (1991). Isospectral flows on symmetric matrices and the Riccati equation, Systems


Helmke, U. (1992). Isospectral flows and linear programming, Journal of Australian Mathe-

matical Society 34(3).

Helmke, U. (1993). Balanced realisations for linear systems: A variational approach, SIAM

Journal of Control and Optimization 31: 1–15.

Helmke, U. & Moore, J. B. (1990). Singular value decomposition via gradient flows, Systems


Helmke, U. & Moore, J. B. (1994a). L2-sensitivity minimization of linear system representa-

tions via gradient flows, to appear in Journal of Mathematical Systems, Estimation and

Control.

Helmke, U. & Moore, J. B. (1994b). Optimization and Dynamical Systems, Communications

and Control Engineering, Springer-Verlag, London.

Helmke, U. & Shayman, M. A. (1992). Critical points of matrix least square distance functions,

2nd IFAC workshop on System Structure and Control, Prague, Czechoslovakia, pp. 116–

118. to appear in Linear Algebra and its Applications.

196 Bibliography

Helmke, U., Moore, J. B. & Perkins, J. E. (1994). Dynamical systems that compute bal-

anced realizations and the singular value decomposition, to appear SIAM Journal Matrix

Analysis.

Henon, M. (1974). Integrals of the Toda lattice, Physical Review B 9(4): 1921–1923.

Hermann, R. & Martin, C. F. (1977). Applications of algebraic geometry to systems theory -

part I, IEEE Transactions on Automatic Control 22: 19–25.

Hermann, R. & Martin, C. F. (1982). Lie and Morse theory of periodic orbits of vector fields

and matrix Riccati equations, I: General Lie theoretic methods, Math. Systems Theory

15: 277–284.

Hestenes, M. R. & Karush, W. (1951). A method of gradient for the calculation of the

characteristic roots and vectors of a real symmetric matrix, Journal of the Resources

National Burea of Standards 47: 45–61.

Hirsch, M. W. (1976). Differential Topology, number 33 in Graduate Texts in Mathematics,

Springer-Verlag, New York.

Horn, R. A. & Johnson, C. R. (1985). Matrix Analysis, Cambridge University Press, Cambridge,

U.K.

Imae, J., Perkins, J. E. & Moore, J. B. (1992). Towards time-varying balanced realisation via

Riccati equations, Mathematics of Control, Signals and Systems 5: 313–326.

J. E. Dennis, J. & Schnabel, R. B. (eds) (1983). Numerical methods for unconstrained opt-

imization and nonlinear equations, Computationa Mathematics, Prentice-Hall Inc., New

Jersey, U.S.A.

Kailath, T. (1980). Linear Systems, Prentice-Hall, Inglewood Cliffs, N.J., U.S.A.

Kalman, R. E. (1963). Mathematical description of linear systems,J.S.I.A.M. Control 1(2): 152–

192.

Karmarkar, N. (1984). A new polynomial time algorithm for linear programming, Combina-

torica 4 pp. 373–395.

Karmarkar, N. (1990). Riemannian geometry underlying interior-point methods for linear

programming, Contemp. Math. 114: 51–75.

Bibliography 197

Khachian, L. G. (1979). A polynomial algorithm in linear programming, Soviet Mathematics

Doklady 20: 191–194.

Kimura, H. (1975). Pole assignment by gain output feedback, IEEE Transactions on Automatic

Control pp. 509–516.

Kincaid, D. & Cheney, W. (1991). Numerical Analysis: Mathematics of Scientific Computing,

Brooks/Cole Publishing Company, Pacific Grove, California, u.S.A.

Kostant, B. (1979). The solution to a generalized Toda lattice and representation theory,

Advances in Mathmatics 34: 195–338.

Krishnaprasad, P. S. (1979). Symplectic mechanics and rational functions, Richerch Automatic

10: 107–135.

Kumar, S. (ed.) (1991). Recent developments in Mathematical Programming, Gordon and

Breach Science Publishers, Philadelphia, U.S.A.

Lagarias, J. C. (1991). Monotonicity properties of the Toda flow, the QR-flow, and subspace

iteration, SIAM Journal of Matrix Analysis and Applications 12(3): 449–462.

Lagarias, J. & Todd, M. J. (eds) (1990). Mathematical Development Arising from linear

Programming, Vol. 114 of Contemporary Mathematics, American Mathematical Society,

Providence, R.I., U.S.A.

Lasagni, F. (1988). Canonical Runge-Kutta methods, ZAMP 39: 952–953.

Laub, A. J., Heath, M. T., Paige, C. C. & Ward, R. C. (1987). Computation of system balancing

transformations and other applications of simultaneous diagonalization algorithms, IEEE

Transactions on Automatic Control 32(2): 115–121.

Li, G., Anderson, B. D. O., Gevers, M. & Perkins, J. E. (1992). Optimal FWL design of state-

space digital sytems with weighted sensitivityminimization and sparseness consideration,

IEEE Transactions on Circuits and Systems - I: Fundamental theory and applications

39: 365–377.

Luenburger, D. G. (1973). Introduction to Linear and Nonlinear Programming, Addison-

Wesley, Reading, U.K.

198 Bibliography

Madievski, A. G., Anderson, B. D. O. & Gevers, M. R. (1994). Optimum realizations of

sampled data controlers for FWL sensitivity minimization, submitted to Automatica.

Mahony, R. E. & Helmke, U. (1993). System assignment and pole placement for symmetric

realisations, Submitted to Journal of Mathematical Systems, Estimation and Control.

Mahony, R. E. & Moore, J. B. (1992). Recursive interior-point linear programming algorithm

based on Lie-Brockett flows, Proceedings of the International Conference on Optimisa-

tion: Techniques and Applications, Singapore.

Mahony, R. E., Helmke, U. & Moore, J. B. (1993). Pole placement algorithms for symmetric

realisations, Proceedings of IEEE Conference on Decision and Control, San Antonio,

U.S.A.

Mahony, R. E., Helmke, U. & Moore, J. B. (1994). Gradient algorithms for principal component

analysis, Submitted to Journal of the Australian Mathematical Society.

Martin, C. F. & Hermann, R. (1977a). Applications of algebraic geometry to systems theory -

part II: Feedback and pole placement for linear Hamiltonian systems, Proceedings of the

IEEE 65: 841–848.

Martin, C. F. & Hermann, R. (eds) (1977b). The 1976 Ames Research Centre (NASA) Conference

on Geometric Control Theory, Vol. VII of Lie Groups: History Frontiers and Applications,

Math Sci Press, Brookline, Massachusetts, U.S.A.

Martin, C. & Herman, R. (1979). Applications of algebraic geometry to systems theory:

The McMillan degree and Kroneker indices of transfer functions as topological and

holomorphic system invariants, SIAM Journal of Control and Optimization 16(5): 743–

755.

Menyuk (1984). Some properties of the discrete Hamiltonian method, Physica, series D

11: 109–129.

Minoux, M. (1986). Mathematical Programming: Theory and Algorithms, John Wiley and

sons, Chichester, Britain.

Mishchenko, A. & Fomenko, A. (1980). A Course in Differential Geometry and Topology, Mir

publishers, Moscow, Russia.

Bibliography 199

Moore, J. B., Mahony, R. E. & Helmke, U. (1992). Recursive gradient algorithms for eigenvalue

and singular value decompositions, Proceedings of the American Control Conference,

Chicago, U.S.A.

Moore, J. B., Mahony, R. E. & Helmke, U. (1994). Numerical gradient algorithms for eigenvalue

and singular value calculations, SIAM Journal of Matrix Analysis 15(3).

Moser, J. (1975). Finitely many mass points on the line under the influence of an exponen-

tial potential - an integrable system, in J. Moser (ed.), Dynamic Systems Theory and

Applications, Springer-Verlag, New York, pp. 467–497.

Moser, J. & Veselov, A. P. (1991). Discrete versions of some classical integrable systems and

factorization of matrix polynomials, Communications in Mathematical Physics 139: 217–

243.

Munkres, J. R. (1975). Topology, a first course, Prentice-Hall, Englewood Cliffs, N.J., U.S.A.

Nakamura, Y. (1989). Moduli spaces of controllable linear dynamical systems and nonlinear

integrable systems of Toda type, in J. M. J. Harnad (ed.), Proceedings CRM workshop on

Hamiltonian Systems, Transformation Groups and Spectral Transform Methods, pp. 103–

112.

Nanda, T. (1985). Differential equations and the QR algorithm, SIAM Journal Numerical

Analaysis 22(2): 310–321.

Oja, E. (1982). A simplified neuron model as a principal component analyzer, Journal of

Mathematical Biology 15: 267–273.

Oja, E. (1989). Neural networks, principal components, and subspaces, International Journal

on Neural Systems 1: 61–68.

Parlett, B. N. (1974). The Rayleigh quotient iteration and some generalisations for nonnormal

matrices, Mathematics of Computation 28(127): 679–693.

Parlett, B. N. & Poole, W. G. (1973). A geometric theory for the QR, LU, and power iterations,

SIAM Journal of Numerical Analysis 10(2): 389–412.

Paul, S., Hueper, K. & Nossek, J. A. (1992). A class of non-linear lossless dynamical systems,

Achiv fur Elektronik und Ubertragungstechnik 46: 219–227.

200 Bibliography

Perkins, J. E., Helmke, U. & Moore, J. B. (1990). Balanced realizations via gradient flow

techniques, Systems and control Letters 14: 369–380.

Polyak, B. T. (1966). A general method for solving extremum problems, Soviet Mathematics

Doklady 8: 593–597.

Riddell, R. C. (1984). Minimax problems on Grassmann manifolds: Sums of eigenvalues,

Advances in Mathematics 54: 107–199.

Rosenthal, J. (1989). Tuning natural frequencies by output feedback, in K. Bowers & J. Lund

(eds), Computation and Control, Proceedings of the Bozeman Conference, Bozeman,

Montana, Vol. 1 of Progress in Systems and Control Theory, Birkhauser, pp. 277–282.

Rosenthal, J. (1992). New results in pole assignment by real output feedback, SIAM Journal

of Control and Optimization 30(1): 203–211.

Ruth (1983). A canonical integration technique, IEEE Transactions on Nuclear Science

30: 2669–2671.

Rutishauser, H. (1954). Ein infinitesimales analogon zum quotienten-differenzen-algorithmus,

Archiv der Mathematik 5: 132–137.

Rutishauser, H. (1958). Solution of eigenvalue problems with the LR-transformation, National

Bureau of Standards: Applied Mathematics Series 49: 47–81.

Safanov, M. G. & Chiang, R. Y. (1989). A Schur method for banced-truncations, IEEE

Transactions on Automatic Control 34: 729–733.

Sanz-Serna, J. M. (1988). Runge-Kutta schemes for Hamiltonian systems, B.I.T. 28: 877–883.

Sanz-Serna, J. M. (1991). Symplectic integrators for Hamiltonian problems: An overview,

ACTA Numerica 1: 243–286.

Shanno, D. (1970). Conditioning of quasi-newton methods for function minimization, Mathe-

matics of Computation 24: 641–656.

Shayman, M. A. (1986). Phase portrait of the matrix Riccati equation, SIAM Journal of Control

and Optimization 24(1): 1–65.

Bibliography 201

Shub, M. & Vasquez, A. T. (1987). Some linearly induced Morse-Smale systems, the QR

algorithm and the Toda lattice, in L. Keen (ed.), The Legacy of Sonya Kovaleskaya,

Vol. 64 of Contemporary Mathematics, American Mathematical Society, Providence,

U.S.A., pp. 181–194.

Smale, S. (1961). On gradient dynamical systems, Annals of Mathematics 74(1): 199–206.

Smith, S. T. (1991). Dynamical systems the perform the singular value decomposition, Systems

and Control Letters 16(5): 319–327.

Smith, S. T. (1993). Geometric Optimization methods for adaptive filtering, PhD thesis,

Division of Applied Science.

Sontag, E. D. (1990). Mathematical Control Theory, Springer-Verlag, New York, U.S.A.

Sreeram, V., Teo, K. L., Yan, W.-Y. & Li, C. (1994). A gradient flow approach to simultaneous

stailization problem, submitted to 1994 Conference on Decision and Control.

Stuart, A. M. & Humphries, A. R. (1994). Model problems in numerical stability for initial

value problems, to appear SIAM Review.

Suris, Y. B. (1989). Hamiltonian Runge-Kutta type methods and their variational formulation,

Math. Sim. 2: 78–87. in Russian.

Symes, W. W. (1980a). Hamiltonian group actions and integrable systems, Physica D 1: 339–

374.

Symes, W. W. (1980b). Systems of the Toda type, inverse spectral problems, and representation

theory, Inventiones Mathematicae 59: 13–51.

Symes, W. W. (1982). The QR algorithm and scattering for the finite nonperiodic Toda lattice,

Physica 4D pp. 275–280.

Toda, M. (1970). Waves in nonlinear lattice, Supplement of the Progress in Theoretical Physics

45: 174–200.

Tombs, M. S. & Postlethwaite, I. (1987). Truncated balanced realization of stable non-minimal

state-space system, International Journal of Control 46: 1319–1330.

202 Bibliography

Varadarajan, V. S. (1984). Lie Groups, Lie Algebras, and their Representations, Vol. 102 of

Graduate texts in Mathematics, Springer-Verlag, New York, U.S.A.

Wang, X. (1989). Geometric inverse eigenvalue problem, in K. Bowers & J. Lund (eds),

Computation and Control, Proceedings of the Bozeman Conference, Bozeman, Montana,

Vol. 1 of Progress in Systems and Control THeory, Birkhauser, pp. 375–383.

Wang, X. (1992). Pole placement by static ouptput feedback, Journal of Mathematical Systems,

Estimation and Control 2(2): 205–218.

Warner, F. W. (1983). Foundations of Diferentiable manifolds and Lie Groups, Graduate texts

in Mathematics, Springer-Verlag, New York, U.S.A.

Watkins, D. S. (1982). Understanding the QR algorithm, SIAM Review 24(4): 427–440.

Watkins, D. S. (1984). Isospectral flows, SIAM Review 26(3): 379–391.

Watkins, D. S. & Elsner, L. (1988). Self similar flows, Linear Algebra and its Applications

110: 213–242.

Watkins, D. S. & Elsner, L. (1989a). Self-equivalent flows associated with the generalized

eigenvalue problem, Linear Algebra and its Applications 118: 107–127.

Watkins, D. S. & Elsner, L. (1989b). Self-equivalent flows associated with the singular value

decomposition, SIAM Journal Matrix Analysis Applications 10: 244–258.

Widlund, O. (1967). A note on unconditionally stable linear multistep methods, B.I.T. 7: 65–70.

Wilkinson, J. H. (1968). Global convergence of the QR algorithm, Linear Algebra and its

Applications 1.

Willems, J. C. & Hesselink, W. H. (1978). Generic properties of the pole placement map,

Proceedings of the 7th IFAC Congress, pp. 1725–1729.

Wonham, W. M. (1967). On pole assignment in multi-input controllable linear systems, IEEE

Transactions on Automatic Control 12: 660–665.

Wonham, W. M. (1985). Linear Multivariable Control, third edition edn, Springer-Verlag,

New York, U.S.A.

Bibliography 203

Yan, W.-Y. & B., M. J. (1991). On L2-sensitivity minimization of linear state-space systems,

preprint.

Yan, W.-Y., Helmke, U. & Moore, J. B. (1994). Global analysis of Oja’s flow for neural

networks, To appear IEEE Transactions on Neural Networks.

Yan, W.-Y., Moore, J. B. & Helmke, U. (1994). Recursive algorithms for solving a class of

nonlinear matrix equations with applications to certain sensitivity optimization problems,

to appear SIAM Journal of Control and Optimization.

Yan, W.-Y., Teo, K. L. & Moore, J. B. (n.d.). A gradient flow approach to computing LQ optimal

output feedback gains, Submitted to Journal of Optimal Control and Applications.

Zhong, G. & Marsden, J. E. (1988). Lie-Poisson Hamilton-Jacobi theory and Lie-Poisson

integrators, Physics Letters A 133(3): 134–139.

Optimization Algorithms on Homogeneous Spacesusers.cecs.anu.edu.au/~john/studenttheses/robertmahony.pdf5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127 5.4 Gradient

Documents