Conjugate Gradient Methods for Large-scale Unconstrained Optimization Dr. Neculai Andrei Research Institute for Informatics Bucharestand Academy of Romanian.

Conjugate Gradient Methods for Conjugate Gradient Methods for Large-scale Unconstrained Large-scale Unconstrained

OptimizationOptimization

Dr. Neculai AndreiDr. Neculai AndreiResearch Institute for InformaticsResearch Institute for Informatics

BucharestBucharestandand

Academy of Romanian ScientistsAcademy of Romanian Scientists

Ovidius University, Constantza - Romania, March 27, 2008

ContentsContents

Problem definitionProblem definition Unconstrained optimization methodsUnconstrained optimization methods Conjugate gradient methodsConjugate gradient methods

- Classical conjugate gradient algorithms- Classical conjugate gradient algorithms- Hybrid conjugate gradient algorithms- Hybrid conjugate gradient algorithms- Scaled conjugate gradient algorithms- Scaled conjugate gradient algorithms- Modified conjugate gradient algorithms- Modified conjugate gradient algorithms- Parametric conjugate gradient algorithms- Parametric conjugate gradient algorithms

ApplicationsApplications

Problem definitionProblem definition

( )min f x

: nf R R - continuously differentiable - gradient is available- n is large- Hessian is unavailable

Necessary optimality conditions: *( ) 0f x

Sufficient optimality conditions: 2 *( ) 0f x

Unconstrained optimization methodsUnconstrained optimization methods

1k k k kx x d

:k :kdStep length Search direction

1) Line search

2) Trust-Region algorithms

Quadratic approximation

Influences

Step length computation:Step length computation:

1) Armijo rule:

( ) ( ) ( )m m Tk k k k k k kf x f x d f x d

(0,1) (0,1/ 2)

2( ( ) ) /T

k k k kf x d d

2) Goldstein rule:

1 2( ) ( )T Tk k k k k k k k k kg d f x d f x g d

12 12

0 1

3) Wolfe conditions:

( ) ( ) Tk k k k k k kf x d f x g d

( )T Tk k k k k kf x d d g d

0 1

Implementations:

Shanno (1978)

Moré - Thuente (1992-1994)

kd

f ( ) ( )k kf x f x L x x

x kx 1,kx

L

12

(1 )Tk k

k

k

g d

L d

2

(1 )Tk k

k

k

g d

L d

Proposition Assume that is a descent direction and

satisfies the Lipschitz condition

for all on the line segment connecting and

where is a positive constant.

If the line search satisfies the Wolfe conditions, then

If the line search satisfies the Goldstein conditions then

Remarks:

2) In conjugate gradient methods the step lengths may differ from 1 in a very unpredictable manner. They can be larger or smaller than 1 depending on how the problem is scaled*.

1) The Newton method, the quasi-Newton or the limited memory quasi-Newton methods has the ability to accept unity step lengths along the iterations.

*N. Andrei, (2007) Acceleration of conjugate gradient algorithms for unconstrained optimization.(submitted JOTA)

Methods for Unconstrained Optimization

1) Steepest descent (Cauchy, 1847)

( )k kd f x

2 1( ) ( )k k kd f x f x

2) Newton

3) Quasi-Newton (Broyden, 1965; and many others)

( )k k kd H f x 2 1( )k kH f x

4) Conjugate Gradient Methods (1952)

1 1k k k kd g s

1k k ks x x

2 1( )k k kd f x g

k is known as the conjugate gradient parameter

5) Truncated Newton method (Dembo, et al, 1982)

2 ( )k k k kr f x d g

6) Trust Region methods

7) Conic model method (Davidon, 1980)

2

1( ) ( )

1 2 (1 )

T Tk k

k T T

g d d A dc d f x

b d b d

1( ) ( )

2T T

k k kq d f x g d d B d

8) Metode tensoriale (Schnabel & Frank, 1984)

2 21( ) ( ) ( ) ( )

2T c c c cm x d f x f x d f x d

3 41 1

6 24c cT d V d

10) Direct searching methods

9) Methods based on systems of Differential Equations Gradient flow Method (Courant, 1942)

2 1d( ) ( )

d

xf x f x

t

0(0)x x

Hooke-Jevees (form searching) (1961)Powell (conjugate directions) (1964)Rosenbrock (coordinate system rotation)(1960)Nelder-Mead (rolling the simplex) (1965)Powell –UOBYQA (quadratic approximation) (1994-2000)

N. Andrei, Critica Retiunii Algoritmilor de Optimizare fara RestrictiiEditura Academiei Romane, 2008.

Conjugate Gradient Methods

1k k k kx x d

1 1k k k kd g s

1k k ks x x 1k k ky g g

Magnus Hestenes (1906-1991) Eduard Stiefel (1909-1978)

The prototype of Conjugate Gradient Algorithm

Step 1. Select the initial starting point: 0x dom f

Step 2. Test a criterion for stopping the iterations. kg

Step 3. Determine the steplength k by Wolfe conditions.

Step 4. Update the variables: 1k k k kx x d

Step 5. Compute: k

Step 6. Compute the search direction: 1 1k k k kd g s

Step 7. Restart. If:2

1 10.2Tk k kg g g then set 1 1k kd g

Step 8. Compute the initial guess: 1 1 / ,k k k kd d set

1k k and continue with step 2.

►

Convergence Analysis

kd

k

3) the gradient is Lipschitz continuous, i.e.

is a descent direction,

is obtained by the strong Wolfe line search.

0: ( ) ( )nS x R f x f x

2) the function fis continuously differentiable,

Consider any conjugate gradient method where:

( ) ( ) .f x f y L x y

Theorem. Suppose that:

1) the level set is bounded,

1)

2)

If 21

1,

k kd

then liminf 0.kk

g

1. Hestenes - Stiefel (HS) 1T

HS k kk T

k k

y g

y s

2. Polak – Ribière - Polyak (PRP) 1T

PRP k kk T

k k

y g

g g

3. Liu - Storey (LS) 1

TLS k kk T

k k

y g

g d

4. Fletcher - Reeves (FR) 1 1

TFR k kk T

k k

g g

g g

5. Conjugate Descent – Fletcher (CD) 1 1

TCD k kk T

k k

g g

g d

6. Dai – Yuan (DY) 1 1

TDY k kk T

k k

g g

y s

ClassicalClassical conjugate gradient algorithms conjugate gradient algorithms

Performance Profiles

Classical conjugate gradient algorithmsClassical conjugate gradient algorithms


7. Dai – Liao (DL) 1( )TDL k k kk T

k k

g y ts

y s

8. Dai – Liao plus (DL+)

1 1max 0,T T

DL k k k kk T T

k k k k

y g s gt

y s y s

9. Andrei - Sufficient Descent Condition (CGSD)*

1 1 1 12

( )( )

( )

T T TCGSD k k k k k kk T T

k k k k

g g y g s g

y s y s

* N. Andrei, A Dai-Yuan conjugate gradient algorithm with sufficient descent and conjugacy conditions for unconstrained optimization. Applied Mathematics Letters, vol.21, 2008, pp.165-171.


HybridHybrid conjugate gradient algorithms - projections conjugate gradient algorithms - projections

10. Hybrid Dai - Yuan (hDY)

, ,hDY DY HS DYk max c min

(1 ) /(1 )c

11. Hybrid Dai – Yuan zero (hDYz)

0, ,hDYz HS DYk max min

12. Gilbert – Nocedal (GN)

, ,GN FR PRP FRk max min

13. Hu – Storey (HuS)

Hybrid conjugate gradient algorithms - projectionsHybrid conjugate gradient algorithms - projections

0, ,HuS PRP FRk max min

14. Touati-Ahmed and Storey (TaS)

0 ,

otherwise

PRP PRP FRTaSk FR

15. Hybrid LS – CD (LS-CD)

0, ,LS CD LS CDk max min

Hybrid conjugate gradient algorithms - projectionsHybrid conjugate gradient algorithms - projections

Hybrid conjugate gradient algorithms - convex combinationHybrid conjugate gradient algorithms - convex combination

16. Convex combination of PRP and DY from conjugacy condition (CCOMB - Andrei)

(1 ) ,CCOMB PRP DYk k k k k

1 12 2

1 1

( )( ) ( )( )

( )( )

T T T TCCOMB k k k k k k k k

k k T Tk k k k k k

y g y s y g g g

y g y s g g

If 0,CCOMBk then .CCOMB PRP

k k

If 1,CCOMBk then .CCOMB DY

k k

N. Andrei, New hybrid conjugate gradient algorithms for unconstrained optimization. Encyclopedia of Optimization, 2nd Edition, Springer, August 2008, Entry 761.


17. Convex combination of PRP and DY from Newton direction (NDOMB - Andrei)

(1 ) ,NDOMB PRP DYk k k k k

2

1 1 12 2

1 1

( ) ( )( )

( )( )

T T T Tk k k k k k k k kNDOMB

k k T Tk k k k k k

y g s g g g y y s

g g g y y s

If 0,NDOMBk then .NDOMB PRP

k k

If 1,NDOMBk then .NDOMB DY

k k

N. Andrei, New hybrid conjugate gradient algorithms as a convex combination of PRP and DY for unconstrained optimization. ICI Technical Report, October 1, 2007. (submitted AML)




18. Convex combination of HS and DY from Newton direction (HYBRID - Andrei)

(1 )HYBRID HS DYk k k k k

(Secant condition)1

1

Tk k

k Tk k

s g

g g

If 0,k then .HYBRID PRPk k

If 1,k then .HYBRID DYk k

N. Andrei, A hybrid conjugate gradient algorithm for unconstrained optimization as a convex combination of Hestenes-Stiefel and Dai-Yuan.Studies in Informatics and Control, vol.17, No.1, March 2008, pp.55-70.

19. Convex combination of HS and DY from Newton direction with modified secant condition (HYBRIDM - Andrei)


(1 )HYBRIDM HS DYk k k k k

11

11

1T

Tk k kk k kT T

k k k kk T

T k kk k kT

k k

y gs g

s s y s

g gg g

y s

If 0,k then .HYBRIDM HSk k

If 1,k then .HYBRIDM DYk k

N. Andrei, A hybrid conjugate gradient algorithm with modified secant condition for unconstrained optimization. ICI Technical Report, February 6, 2008(submitted to Numerical Algorithms)


ScaledScaled conjugate gradient algorithms conjugate gradient algorithms

1k k k kx x d

1 1 1k k k k kd g s

21 1 1 12

1

( )

( )

T Tk k k k k k

k Tk k k

s f x g s g

s f x s

2 11 1 1 1( )k k k k k kf x g g s

N. Andrei, Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization. Optimization Methods and Software, 22 (2007), pp.561-571.

1:k 1 1k 2 11 1( )k kf x

A) Secant Condition

k k kB s y

B) Modified secant Condition

1 ˆk k kB s y

ˆ kk k kT

k k

y y us u

1 16( ) 3( )Tk k k k k kf f g g s

321( ) ( )T T

k k k k k ks f x s s y O s

421 ˆ( ) ( )T T

k k k k k ks f x s s y O s

N. Andrei, Accelerated conjugate gradient algorithm with modified secant condition for unconstrained optimization. ICI Technical Report, March 3, 2008. (Submitted, JOTA, 2007)

Scaled conjugate gradient algorithmsScaled conjugate gradient algorithms

C) Hessian / vector approximation by finite difference

N. Andrei, Accelerated conjugate gradient algorithm with finite difference Hessian / vector product approximation for unconstrained optimization. ICI Technical Report, March 4, 2008 (submitted Math. Programm.)

2 1 11

( ) ( )( ) k k k

k k

f x s f xf x s

12 (1 )m k

k

x

s

max ,100max 10 , ks

12 (1 )m kx n


20. Birgin – Martínez (BM)

1( )TBM k k k kk T

k k

y s g

y s

Tk k

k Tk k

s s

y s

21. Birgin – Martínez plus (BM+)

1 10,T T

BM k k k kk T T

k k k k

y g s gmax

y s y s

22. Scaled Polak-Ribière-Polyak (sPRP)

1 1T

sPRP k k kk T

k k k k

y g

g g


23. Scaled Fletcher – Reeves (sFR)

1 1 1T

sFR k k kk T

k k k k

g g

g g

11

TsHS k kk k T

k k

g y

y s

24. Scaled Hestenes – Stiefel (sHS)


N. Andrei, Scaled conjugate gradient algorithms for unconstrained optimization. Computational Optimization and Applications, vol. 38, no. 3, (2007), pp.401-416.

25. SCALCG (secant condition)

1 1 11 1 1 1 1 11

T T T Tk k k k k k k k

k k k k k k k kT T T Tk k k k k k k k

g s y y g s g yd g y s

y s y s y s y s

1 11 1 1

( )Tk k k kk k k kT

k k

y s gd g s

y s

1 1 1 1 1 1

T Tk k k k

k k k k k kT Tk k k k

s y s sd I g Q g

y s y s


N. Andrei, A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization. Applied Mathematics Letters, 20 (2007), pp.645-650.

Theorem Suppose that k satisfies the Wolfe conditions

then the direction 1kd is a descent direction.

2 21 1 1 1 1 1 12

1( ) 2 ( )( )( )

( )T T T T Tk k k k k k k k k k k k kT

k k

g d g y s g y g s y sy s

2 21 1 1( ) ( ) ( )( )T T T T

k k k k k k k k kg s y s y y g s

21

1 1

( )TT k kk k T

k k

g sg d

y s


2

1 10.2Tr r rg g g The Powell restarting procedure :


1 1 1k k kd H g

1 1 11 1 1

T T T Tr k k k k r k r k k k

k r T T Tk k k k k k

H y s s y H y H y s sH H

y s y s y s

1 1 1 11T T T T

r r r r r r r rr r r rT T T

r r r r r r

y s s y y y s sH I

y s y s y s

The direction 1kd is computed using a double update scheme:

for 1k r

ANDREI, N., A scaled nonlinear conjugate gradient algorithm for unconstrained optimization. Optimization. A journal of mathematical programming and operations research, DOI, accepted.


11 1 1 1 1

Tk r

r k r k r rTr r

g sv H g g y

y s

1 11 11

T TTk r k rr r

r r rT T Tr r r r r r

g s g yy ys

y s y s y s

1 1 1

Tk r

r k r k r rTr r

y sw H y y y

y s

1 11T TTk r k rr r

r r rT T Tr r r r r r

y s y yy ys

y s y s y s

N. Andrei, Scaled memoryless BFGS preconditioned conjugate gradient algorithm forunconstrained optimization. Optimization Methods and Software, 22 (2007), pp.561-571.


1 1 11

( ) ( )1

T T T Tk k k k k k k

k kT T Tk k k k k k

g s w g w s y w g sd v s

y s y s y s

Lemma:2

1 12 3

2 2k k

L Ld g

fIf Lipschitz continuous, then

Theorem: For strongly convex functions,

with Wolfe line search

lim 0kgk



26. ASCALCG (secant condition)

In conjugate gradient methods the step lengths may differ from 1 in a very unpredictable manner. They can be larger or smaller than 1 depending on how the problem is scaled.

1k k k k kx x d

22 21( ) ( ) ( )

2T T

k k k k k k k k k k k k kf x d f x g d d f x d o d

22 2 21( ) ( ) ( )

2T T

k k k k k k k k k k k k kf x d f x g d d f x d o d

( ) ( ) ( )k k k k k k kf x d f x d

General Theory of Acceleration

N. Andrei, (2007) Acceleration of conjugate gradient algorithms for unconstrained optimization. ICI Technical Report, October 24, 2007. (submitted JOTA, 2007)

2 2 21( ) ( 1) ( ) ( 1)

2T T

k k k k k k k kd f x d g d

2 22k k k k k ko d o d

0Tk k k ka g d

2 2 ( )Tk k k k kb d f x d

2

k k ko d

2 21( ) ( 1) ( 1)

2k k k k k k kb a

( ) ( 2 )k k k k kb a

(0) 0k ka


2k

mk k k

a

b

( ) 0k m

2( ( 2 ))( ) 0

2( 2 )k k k k

k mk k k

a b

b

2

1

( ( 2 ))( ) ( ) ( )

2( 2 )T k k k k

k k k k k k k k kk k k

a bf x f x d f x g d

b

2( ( 2 ))( ) ( ) ( )

2( 2 )k k k k

k m k k k k k k k kk k k

a bf x d f x d f x d

b

2( ( 2 ))( ) ( )

2( 2 )k k k k

k k kk k k

a bf x a f x

b


If kd is a descent direction, then

2 2( ( 2 )) ( )

2( 2 ) 2k k k k k k

k k k k

a b a b

b b

2

1

( ( 2 ))( ) ( )

2( 2 )k k k k

k k kk k k

a bf x f x a

b

2( )( ) ( )

2k k

k k kk

a bf x a f x

b

kk m

k

a

b


kb computation:

k k kz x d

2 21( ) ( ) ( ) ( )

2T T

k k k k k k k k k k kf z f x d f x g d d f x d

k k kx z d

2 21( ) ( ) ( ) ( )

2T T

k k k k z k k k k kf x f z d f z g d d f x d

Tk k k kb y d k k zy g g


Proposition. Suppose that

ff

f is a uniformly convex function

on the level set 0: ( ) ( )S x f x f x kdand satisfies

the sufficient descent condition2

1 ,Tk k kg d c g

1 0c

where2 2

2 ,k kd c gand 2 0.c where

Then the sequence generated by ACG converges linearly to

*x solution to the optimization problem.


N. Andrei, Accelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization. ICI Technical Report, March 10, 2008(submitted – Numerischke Mathematik, 2008)



ModifiedModified conjugate gradient algorithms conjugate gradient algorithms

27. Andrei – Sufficient Descent Condition from PRP (A-prp)

11

( )( )1 T TA prp T k k k kk k kT T

k k k k

y y s gy g

y s g g

28. Andrei – Sufficient Descent Condition from DY (ACGA)

1 1 12

( )( )

( )

T T TACGA k k k k k kk T T

k k k k

y g y g s g

y s y s

29. Andrei – Sufficient Descent Condition from DY zero (ACGA+)

1 10, 1T T

ACGA k k k kk T T

k k k k

y g s gmax

y s y s

30) CG_DESCENT (Hager-Zhang, 2005, 2006)

1 1HZ

k k k kd g d

,HZ HZk k kmax

1

,k

k kd min g

2

1

12

T

kHZk k k kT T

k k k k

yy d g

y d y d

0 0d g

0.01

Modified conjugate gradient algorithmsModified conjugate gradient algorithms






31) ACGMSEC

N. Andrei, Accelerated conjugate gradient algorithm with modified secant condition for unconstrained optimization. ICI Technical Report, March 3, 2008.(Submitted: Applied Mathematics and Optimization, 2008)

(Slide 30)

1 12max ,0 1

T Tk k k k k

k T Tk k k k k kk

y g s g

y s y ss

For 0 we get exactly the Perry method.

10

3(1 2 )

Theorem If then liminf 0kk

g




32) ACGHES

1 1T Tk k k k

k Tk k

y g s g

s y

1 1( ) ( )k k kk

f x s f xy

12 (1 )m k

k

x

s

N. Andrei, Accelerated conjugate gradient algorithm with finite difference Hessian / vector product approximation for unconstrained optimization. ICI Technical Report, March 4, 2008 (submitted Math. Programm.)

max ,100max 10 , ks

12 (1 )m kx n



Comparisons with other UO methods

ParametricParametric conjugate gradient algorithms conjugate gradient algorithms

1( )TYT k k kk T

k k

g z ts

d z

33) Yabe-Takano

kk k kT

k k

z y us u

1 16( ) 3( )Tk k k k k kf f g g s

34) Yabe-Takano +

1 1max 0,T T

YT k k k kk T T

k k k k

g z g st

d z d z

Parametric conjugate gradient algorithmsParametric conjugate gradient algorithms

2

12

(1 )

kk T

k k k k k

g

g d y

35) Parametric CG suggested by Dai-Yuan

36) Parametric CG suggested by Nazareth

2

1 12

(1 )

(1 )

Tk k k k k

k Tk k k k k

g g y

g d y

[0,1]k

, [0,1]k k

Parametric conjugate gradient algorithmsParametric conjugate gradient algorithms

2

1 12

(1 )

(1 )

Tk k k k k

k T Tk k k k k k k k k

g g y

g d y d g

37) Parametric CG suggested by Dai-Yuan

, [0,1]k k

[0,1 ]k k

ApplicationsA1) Elastic-Plastic Torsion (c=5) (nx=200, ny=200)

min ( ) :q v v K

21( ) ( ) d ( )d

2 D D

q v v x x c v x x

10 ( ) : ( ) ( , ),K v H D v x dist x D x D

MINPACK2 Collection

SCALCG: #iter=445, #fg=584, cpu=8.49(s)ASCALCG: #iter=240, #fg=269, cpu=6.93(s)

n=40000 variables

A2) Pressure Distribution in a Journal Bearing (ecc=0.1 b=10) (nx=200, ny=200)

( ) :min q v v K21

( ) ( ) ( ) d ( ) ( )d2 q l

D D

q v w x v x x w x v x x

31 2 1( , ) (1 cos )qw z z z

1 2 1( , ) sinlw z z z

(0,2 ) (0,2 )D b

A3) Optimal Design with Composite Materials ( 0.008)

10( , ) : ( ),min F v v H D w

21( , ) ( ) ( ) ( ) d

2D

F v x v x v x x

1( ) ,x x

2( ) ,x x

A4) Steady State Combustion - Bratu Problem ( 5)

10( ) : ( )min f v v H D

10: ( )f H D R

21( ) ( ) exp[ ( )] d

2D

f v v x v x x

( ) exp[ ( )]v x v x x D

( ) 0v x x D

A5) Ginzburg-Landau (1-dimensional)

1( ) : ( ) ( ), [ , ]min f v v d v d v C d d

22 4 21 1

( ) ( ) ( ) ( ) ( ) ( ) d2 2 4

d

d

f v v v vd m

Free Gibbs energy:

n=1000 variables

Thank you !

Conjugate Gradient Methods for Large-scale Unconstrained Optimization Dr. Neculai Andrei Research Institute for Informatics Bucharestand Academy of Romanian.

Documents

conjugate gradient parameter

newton methods

differentiable gradient

jota slide

powell conjugate directions

trust region methods

unity step lengths

direct searching methods