Top Banner
9 1 Performance Optimization
22

REDES NEURONALES Performance Optimization

Dec 03, 2014

Download

Education

Sarai González

Performance Optimization
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: REDES NEURONALES Performance Optimization

9

1

Performance Optimization

Page 2: REDES NEURONALES Performance Optimization

9

2

Basic Optimization Algorithm

xk 1+ xk kpk+=

x k xk 1+ x k– kpk= =

pk - Search Direction

k - Learning Rate

or

xk

x k 1+

kpk

Page 3: REDES NEURONALES Performance Optimization

9

3

Steepest Descent

F x k 1+ F xk

Choose the next step so that the function decreases:

F xk 1+ F xk x k+ F xk gkT x k+=

For small changes in x we can approximate F(x):

g k F x x xk=

where

g kTx k kg k

Tpk 0=

If we want the function to decrease:

pk g– k=

We can maximize the decrease by choosing:

x k 1+ xk kg k–=

Page 4: REDES NEURONALES Performance Optimization

9

4

Example

F x x12

2 x1x 22x2

2x1+ + +=

x00.5

0.5=

F x x1F x

x2F x

2x1 2x2 1+ +

2x1 4x2+= = g0 F x

x x0=

3

3= =

0.1=

x1 x0 g0– 0.5

0.50.1 3

3– 0.2

0.2= = =

x2 x1 g1– 0.2

0.20.1 1.8

1.2– 0.02

0.08= = =

Page 5: REDES NEURONALES Performance Optimization

9

5

Plot

-2 -1 0 1 2-2

-1

0

1

2

Page 6: REDES NEURONALES Performance Optimization

9

6

Stable Learning Rates (Quadratic)

F x 12---xTAx dTx c+ +=

F x Ax d+=

x k 1+ xk gk– x k Ax k d+ –= = xk 1+ I A– x k d–=

I A– zi z i Azi– zi izi– 1 i– zi= = =

1 i– 1 2i----

2max------------

Stability is determinedby the eigenvalues of

this matrix.

Eigenvalues of [I - A].

Stability Requirement:

(i - eigenvalue of A)

Page 7: REDES NEURONALES Performance Optimization

9

7

Example

A 2 2

2 4= 1 0.764= z1

0.851

0.526–=

2 5.24 z20.526

0.851=

=

2

max------------ 2

5.24---------- 0.38= =

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2 0.37= 0.39=

Page 8: REDES NEURONALES Performance Optimization

9

8

Minimizing Along a Line

F xk kpk+

ddk--------- F xk kpk+ ( ) F x T

x xk=pk kpk

TF x 2

x xk=pk+=

k F x T

x x k=pk

pkT

F x 2x xk=

pk------------------------------------------------–

g kTpk

pkTAkpk

--------------------–= =

Ak F x 2

x xk=

Choose k to minimize

where

Page 9: REDES NEURONALES Performance Optimization

9

9

Example

F x 12---xT 2 2

2 4x 1 0 x+= x0

0.5

0.5=

F x x1 F x

x2 F x

2x1 2x2 1+ +

2x1 4x2+= = p0 g– 0 F x –

x x0=

3–3–

= = =

0

3 33–

3–

3– 3–2 2

2 4

3–

3–

--------------------------------------------– 0.2= = x1 x0 0g0– 0.50.5

0.2 33

– 0.1–0.1–

= = =

Page 10: REDES NEURONALES Performance Optimization

9

10

Plot

Successive steps are orthogonal.

kddF xk kpk+

kddF x k 1+ F x

T

x xk 1+= kdd xk kpk+ = =

F x T

x xk 1+=pk gk 1+

Tpk= =

-2 -1 0 1 2-2

-1

0

1

2Contour Plot

x1

x2

Page 11: REDES NEURONALES Performance Optimization

9

11

Newton’s Method

F xk 1+ F xk xk+ F xk g kTx k

12---xk

TAkx k+ +=

gk Akxk+ 0=

Take the gradient of this second-order approximationand set it equal to zero to find the stationary point:

x k Ak1–

– g k=

xk 1+ xk Ak1– gk–=

Page 12: REDES NEURONALES Performance Optimization

9

12

Example

F x x12

2 x1x 22x2

2x1+ + +=

x00.5

0.5=

F x x1 F x

x2 F x

2x1 2x2 1+ +

2x1 4x2+= =

g0 F x x x0=

3

3= =

A 2 2

2 4=

x10.5

0.5

2 2

2 4

1–3

3–

0.5

0.5

1 0.5–

0.5– 0.5

3

3–

0.5

0.5

1.5

0–

1–

0.5= = = =

Page 13: REDES NEURONALES Performance Optimization

9

13

Plot

-2 -1 0 1 2-2

-1

0

1

2

Page 14: REDES NEURONALES Performance Optimization

9

14

Non-Quadratic ExampleF x x2 x1–

48x1x2 x1– x2 3+ + +=

x1 0.42–

0.42= x

2 0.13–

0.13= x

3 0.55

0.55–=Stationary Points:

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

F(x) F2(x)

Page 15: REDES NEURONALES Performance Optimization

9

15

Different Initial Conditions

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

F(x)

F2(x)

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

Page 16: REDES NEURONALES Performance Optimization

9

16

Conjugate Vectors

F x 12---x

TAx d

Tx c+ +=

pkTAp j 0= k j

A set of vectors is mutually conjugate with respect to a positivedefinite Hessian matrix A if

One set of conjugate vectors consists of the eigenvectors of A.

zkTAz j jzk

Tz j 0 k j= =

(The eigenvectors of symmetric matrices are orthogonal.)

Page 17: REDES NEURONALES Performance Optimization

9

17

For Quadratic Functions

F x Ax d+=

F x 2 A=

g k gk 1+ g k– Ax k 1+ d+ Axk d+ – A xk= = =

xk xk 1+ xk– kpk= =

kpkTApj xk

T Apj gk

T p j 0= = = k j

The change in the gradient at iteration k is

where

The conjugacy conditions can be rewritten

This does not require knowledge of the Hessian matrix.

Page 18: REDES NEURONALES Performance Optimization

9

18

Forming Conjugate Directions

p0 g0–=

pk gk– kpk 1–+=

kgk 1–T gk

g k 1–T

pk 1–

-----------------------------= kg kTg k

g k 1–T gk 1–

-------------------------= kg k 1–T gk

g k 1–T gk 1–

-------------------------=

Choose the initial search direction as the negative of the gradient.

Choose subsequent search directions to be conjugate.

where

or or

Page 19: REDES NEURONALES Performance Optimization

9

19

Conjugate Gradient algorithm

• The first search direction is the negative of the gradient.

• Select the learning rate to minimize along the line.

• Select the next search direction using

• If the algorithm has not converged, return to second step.

• A quadratic function will be minimized in n steps.

p0 g0–=

pk gk– kpk 1–+=

k F x T

x x k=pk

pkT

F x 2x xk=

pk------------------------------------------------–

g kTpk

pkTAkpk

--------------------–= = (For quadraticfunctions.)

Page 20: REDES NEURONALES Performance Optimization

9

20

Example

F x 12---xT 2 2

2 4x 1 0 x+= x0

0.5

0.5=

F x x1 F x

x2 F x

2x1 2x2 1+ +

2x1 4x2+= = p0 g– 0 F x –

x x0=

3–3–

= = =

0

3 33–

3–

3– 3–2 2

2 4

3–

3–

--------------------------------------------– 0.2= = x1 x0 0g0– 0.50.5

0.2 33

– 0.1–0.1–

= = =

Page 21: REDES NEURONALES Performance Optimization

9

21

Example

g1 F x x x1=

2 2

2 4

0.1–

0.1–

1

0+ 0.6

0.6–= = =

1

g1Tg1

g0Tg0

------------

0.6 0.6–0.60.6–

3 333

-----------------------------------------0.7218

---------- 0.04= = = =

p1 g1– 1p0+0.6–

0.60.04

3–

3–+

0.72–

0.48= = =

1

0.6 0.6–0.72–

0.48

0.72– 0.482 2

2 4

0.72–

0.48

---------------------------------------------------------------–0.72–

0.576-------------– 1.25= = =

Page 22: REDES NEURONALES Performance Optimization

9

22

Plots

-2 -1 0 1 2-2

-1

0

1

2Contour Plot

x1

x2

-2 -1 0 1 2-2

-1

0

1

2

Conjugate Gradient Steepest Descent

x2 x1 1p1+ 0.1–

0.1–1.25 0.72–

0.48+ 1–

0.5= = =