numerical derivatives scilab 2

8/10/2019 numerical derivatives scilab 2

1/59

Numerical Derivatives in Scilab

Michael Baudin

May 2009

Abstract

This document present the use of numerical derivatives in Scilab. In therst part, we present a result which is surprising when we are not familiar with

oating point numbers. In the second part, we analyse the method to use theoptimal step to compute derivatives with nite differences on oating pointsystems. We present several formulas and their associated optimal steps.In the third part, we present the derivative function, its features and itsperformances.

Contents

1 Introduction 41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 A surprising result 42.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Taylors formula for univariate functions . . . . . . . . . . . . 52.1.2 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Analysis 83.1 Errors in function evaluations . . . . . . . . . . . . . . . . . . . . . . 83.2 Various results for sin(264) . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Floating point implementation of the forward formula . . . . . . . . . 103.4 Numerical experiments with the robust forward formula . . . . . . . . 153.5 Backward formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Centered formula with 2 points . . . . . . . . . . . . . . . . . . . . . 163.7 Centered formula with 4 points . . . . . . . . . . . . . . . . . . . . . 193.8 Some nite difference formulas for the rst derivative . . . . . . . . . 213.9 A three points formula for the second derivative . . . . . . . . . . . . 223.10 Accuracy of nite difference formulas . . . . . . . . . . . . . . . . . . 243.11 A collection of nite difference formulas . . . . . . . . . . . . . . . . . 26

1


2/59

4 Finite differences of multivariate functions 284.1 Multivariate functions . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Numerical derivatives of multivariate functions . . . . . . . . . . . . . 304.3 Derivatives of a multivariate function in Scilab . . . . . . . . . . . . . 314.4 Derivatives of a vectorial function with Scilab . . . . . . . . . . . . . 334.5 Computing higher degree derivatives . . . . . . . . . . . . . . . . . . 354.6 Nested derivatives with Scilab . . . . . . . . . . . . . . . . . . . . . . 374.7 Computing derivatives with more accuracy . . . . . . . . . . . . . . . 394.8 Taking into account bounds on parameters . . . . . . . . . . . . . . . 41

5 The derivative function 415.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Varying order to check accuracy . . . . . . . . . . . . . . . . . . . . . 425.3 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4 Performance of nite differences . . . . . . . . . . . . . . . . . . . . . 446 One more step 47

7 Automatically computing the coefficients 497.1 The coefficients of nite difference formulas . . . . . . . . . . . . . . . 497.2 Automatically computings the coefficients . . . . . . . . . . . . . . . 517.3 Computings the coefficients in Scilab . . . . . . . . . . . . . . . . . . 52

8 Notes and references 54

9 Exercises 5510 Acknowledgments 57

Bibliography 57

Index 58

2


3/59

Copyright c 2008-2009 - Michael BaudinThis le must be used under the terms of the Creative Commons Attribution-

ShareAlike 3.0 Unported License:

http://creativecommons.org/licenses/by-sa/3.0

3
http://creativecommons.org/licenses/by-sa/3.0http://creativecommons.org/licenses/by-sa/3.0


4/59

1 Introduction

1.1 Introduction

This document is an open-source project. The LATEX sources are available on theScilab Forge:

http://forge.scilab.org/index.php/p/docnumder/

The LATEX sources are provided under the terms of the Creative Commons Attribu-tion ShareAlike 3.0 Unported License:

http://creativecommons.org/licenses/by-sa/3.0

The Scilab scripts are provided on the Forge, inside the project, under the scriptssub-directory. The scripts are available under the CeCiLL licence:

http://www.cecill.info/licences/Licence_CeCILL_V2-en.txt

1.2 Overview

In this document, we analyse the computation of the numerical derivative of a givenfunction. Before getting into the details, we briey motivate the need for approxi-mate numerical derivatives.

Consider the situation where we want to solve an optimization problem with amethod which requires the gradient of the cost function. In simple cases, we can

provide the exact gradient. The practical computation may be performed by handwith paper and pencil. If the function is more complicated, we can perform thecomputation with a symbolic computing system (such as Maple or Mathematica).If some situations, this is not possible. In most practical situations, indeed, the for-mula involved in the computation is extremely complicated. In this case, numericalderivatives can provide an accurate evaluation of the gradient. Other methods tocompute the gradient are base on adjoint equations and on automatic differentia-tion. In this document, we focus on numerical derivatives methods because Scilabprovide commands for this purpose.

2 A surprising resultIn this section, we present surprising results which occur when we consider a functionof one variable only. We derive the forward numerical derivative based on the Taylorexpansion of a function with one variable. Then we present a numerical experimentbased on this formula, with decreasing step sizes.

This section was rst published in [ 3].

2.1 Theory

Finite differences methods approximate the derivative of a given function f based onfunction values only. In this section, we present the forward derivative, which allows

4
http://forge.scilab.org/index.php/p/docnumder/http://creativecommons.org/licenses/by-sa/3.0http://www.cecill.info/licences/Licence_CeCILL_V2-en.txthttp://www.cecill.info/licences/Licence_CeCILL_V2-en.txthttp://creativecommons.org/licenses/by-sa/3.0http://forge.scilab.org/index.php/p/docnumder/


5/59

to compute an approximation of f (x), based on the value of f at well chosen points.The computations are based on a local Taylors expansion of f in the neighbourhoodof the point x. This assumes that f is continuously derivable, an assumption whichis used throughout this document.

2.1.1 Taylors formula for univariate functions

Taylors theorem is of fundamental importance because it shows that the local be-haviour of the function f can be known from the function and its derivatives at asingle point.

Theorem 2.1. Assume that f : R R is a continuously derivable function of one variable. Assume that f is continuously differentiable d times, i.e. f C d, where dis a positive integer. There exists a scalar [0, 1], such that

f (x + h) = f (x) + hf (x) + 12h2f (x) + . . . (1)

+ 1

(d 1)!h(d 1) f (d 1) (x) +

1d!

hdf (d)(x + h), (2)

where x, h R and f (d)(x) denotes the d-th derivative of f evaluated at x.

This theorem will not be proved here [ 10].We can expand Taylors formula up to order 4 derivatives of f and get

f (x + h) = f (x) + hf (x) + h2

2 f (x) +

h3

6 f (x) +

h4

24f (x) + O(h5) (3)

This formula can be used to derive nite differences formulas, which approximatethe derivatives of f using function values only.

2.1.2 Finite differences

In this section, we derive the forward 2 points nite difference formula and provethat it is an order 1 formula for the rst derivative of the function f .

Proposition 2.2. Let f : R R be a continuously derivable function of one variable. Therefore,f (x) = f (x + h) f (x)h h2 f (x) + O(h2). (4)

Proof. Assume that f : R R is a function with continuous derivatives. If weneglect higher order terms, we havef (x + h) = f (x) + hf (x) +

h2

2 f (x) + O(h3). (5)

Therefore,

f (x + h) f (x)h

= f (x) + h2

f (x) + O(h2), (6)which concludes the proof.

5


6/59

Denition 2.3. ( Forward nite difference for f ) The nite difference formula

Df (x) = f (x + h) f (x)

h (7)

is the forward 2 points nite difference for f .

The following denition denes the order of a nite difference formula, whichmeasures the accuracy of the formula.

Denition 2.4. ( Order ) A nite difference formula Df is of order p > 0 for f (d)if

Df (x) = f (d)(x) + O(h p). (8)The equation 4 indicates that the forward 2 points nite difference is an order 1

formula for f .Denition 2.5. ( Truncation error ) The truncation error of a nite difference for-mula for f (d)(x) is

E t (h) = Df (x) f (d)(x) (9)The equation 4 indicates that the truncation error of the 2 points forward formula

is:

E t (h) = h2 |f (x)|, (10)

The truncation error of the equation 10 depends on step h so that decreas-ing the step reduces the truncation error. The previous discussion implies that a(naive) algorithm to compute the numerical derivative of a function of one variableis

f (x) (f (x + h) f (x)) /hAs we are going to see, the previous algorithm is much more naive that it appears,

as it may lead to very inaccurate numerical results.

2.2 Experiments

In this section, we present numerical experiments based on a naive implementationof the forward nite difference formula. We show that a wrong step size h may leadto very inacurate results.

The following Scilab function is a straightforward implementation of the forwardnite difference formula.

f u nc t io n f p = my fp r ime ( f ,x , h )f p = ( f (x + h) - f ( x )) / h;

endfunct ion

In the following numerical experiments, we consider the square function f (x) =

x2

, which derivative is f (x) = 2x. The following Scilab script implements the squarefunction.

6


7/59

f u nc t io n y = my fu n ct i on ( x )y = x * x ;

endfunct ion

The naive idea is that the computed relative error is small when the step h issmall. Because small is not a priori clear, we take M 10 16 in double precisionas a good candidate for small .

In order to compare our results, we use the derivative function provided byScilab. The most simple calling sequence of this function is

J = der ivative ( F , x )

where F is a given function, x is the point where to compute the derivative and Jis the Jacobian, i.e. the rst derivative when the variable x is a simple scalar. Thederivative function provides several methods to compute the derivative. In orderto compare our method with the method used by derivative , we must specify theorder of the method. The calling sequence is then

J = der ivative ( F , x , order = o )

where o can be equal to 1, 2 or 4. Our forward formula corresponds to order 1.In the following script, we compare the computed relative error produced by our

naive method with step h = M and the derivative function with default step andthe order 1 method.

x = 1.0;fp ref = de r iva t ive (myfunc tion ,x ) ;f p ex a ct = 2 .;e = abs ( fpref - fpexact ) / fpexac t ; mprintf ( " Scilab f = %e , error = %e \n" , fp re f ,e );h = 1 .e - 16 ;fp = myfprime(myfunc t ion ,x ,h ) ;e = abs ( fp - fpexac t ) / fpexac t ; mprintf ( " Naive f = %e , error = %e \n" , fp ,e );

When executed, the previous script prints out :Sc i lab f =2 .000000e+000 , e r ro r=7 .450581e-009Naive f =0 .000000e+000 , e r ror=1 .000000e+000

Our naive method seems to be quite inaccurate and has not even 1 signicantdigit ! The Scilab primitive, instead, has approximately 9 signicant digits.

Since our faith is based on the truth of the mathematical theory, which leads toaccurate results in many situations, we choose to perform additional experiments...

Consider the following experiment. In the following Scilab script, we take aninitial step h = 1.0 and then divide h by 10 at each step of a loop made of 20iterations.

x = 1.0;f p ex a ct = 2 .;fp ref = de r iva t ive (myfunc tion ,x ,o rder=1) ;e = abs ( fpref - fpexact ) / fpexac t ; mprintf ( " Scilab f = %e , error = %e \n" , fp re f ,e );h = 1.0;fo r i=1 :20

h=h/10.0;fp = myfprime(myfunc t ion ,x ,h ) ;

7


8/59

e = abs ( fp - fpexac t ) / fpexac t ; mprintf ( " Naive f = %e , h=%e , error = %e \n" , fp ,h ,e );

en d

Scilab then produces the following output.Sc i lab f =2 .000000e+000 , e r ro r=7 .450581e-009Naive f =2 .100000e+000 , h=1 .000000e-001 , e r ror=5 .000000e-002Naive f =2 .010000e+000 , h=1 .000000e-002 , e r ror=5 .000000e-003Naive f =2 .001000e+000 , h=1 .000000e-003 , e r ror=5 .000000e-004Naive f =2 .000100e+000 , h=1 .000000e-004 , e r ror=5 .000000e-005Naive f =2 .000010e+000 , h=1 .000000e-005 , e r ror=5 .000007e-006Naive f =2 .000001e+000 , h=1 .000000e-006 , e r ror=4 .999622e-007Naive f =2 .000000e+000 , h=1 .000000e-007 , e r ror=5 .054390e-008Naive f =2 .000000e+000 , h=1 .000000e-008 , e r ror=6 .077471e-009Naive f =2 .000000e+000 , h=1 .000000e-009 , e r ror=8 .274037e-008Naive f =2 .000000e+000 , h=1 .000000e-010 , e r ror=8 .274037e-008Naive f =2 .000000e+000 , h=1 .000000e-011 , e r ror=8 .274037e-008Naive f =2 .000178e+000 , h=1 .000000e-012 , e r ror=8 .890058e-005Naive f =1 .998401e+000 , h=1 .000000e-013 , e r ror=7 .992778e-004Naive f =1 .998401e+000 , h=1 .000000e-014 , e r ror=7 .992778e-004Naive f =2 .220446e+000 , h=1 .000000e-015 , e r ror=1 .102230e-001Naive f =0 .000000e+000 , h=1 .000000e-016 , e r ror=1 .000000e+000Naive f =0 .000000e+000 , h=1 .000000e-017 , e r ror=1 .000000e+000Naive f =0 .000000e+000 , h=1 .000000e-018 , e r ror=1 .000000e+000Naive f =0 .000000e+000 , h=1 .000000e-019 , e r ror=1 .000000e+000Naive f =0 .000000e+000 , h=1 .000000e-020 , e r ror=1 .000000e+000

We see that the relative error decreases, then increases. Obviously, the optimumstep is approximately h = 10 8, where the relative error is approximately er =

6.10 9

. We should not be surprised to see that Scilab has computed a derivativewhich is near the optimum.

3 Analysis

In this section, we analyse the oating point implementation of a numerical deriva-tive. In the rst part, we take into account rounding errors in the computation of thetotal error of the numerical derivative. Then we derive several numerical derivativeformulas and compute their optimal step and optimal error. We nally present themethod which is used in the derivative function.

3.1 Errors in function evaluations

In this section, we analyze the error that we get when we evaluate a function on aoating point system such as Scilab.

Assume that f is a continuously differentiable real function of one real variable x.When Scilab evaluates the function f at the point x, it makes an error and computesf (x) instead of f (x). Let us dene the relative error as

e(x) =f (x) f (x)

f (x), (11)

8


9/59

if f (x) is different from zero. The previous denition implies:

f (x) = (1 + (x))f (x), (12)

where (x) R is such that | (x)| = e(x). We assume that the relative error issatisfying the inequalitye(x) c(x) M , (13)

where M is the machine precision and c is a function depending on f and the pointx.

In Scilab, the machine precision is M 10 16 since Scilab uses double precisionoating point numbers. See [ 4] for more details on oating point numbers in Scilab.The base ten logarithm of c approximately measures the number of signicant

digits which are lost in the computation. For example, assume that, for some x R ,we have M 10 16 and c(x) = 105. Then the relative error in the function valueis lower than c(x) M = 1016 + 5 = 10 11 . Hence, ve digits have been lost in thecomputation.

The function c depends on the accuracy of the function, and can be zero, smallor large.

At best, the compute function value is exactly equal to the mathematical value.For example, the function f (x) = ( x 1)2 + 1 is exactly evaluated as f (x) = 1when x = 1. In other words, we may have c(x) = 0.

In general, the mathematical function value is between two consecutive oatingpoint numbers. In this case, the relative error is bounded by the unit roundoff u = M 2 . For example, the operators +, - , *, / and sqrt are guaranteed tohave a relative error no greater than u by the IEEE 754 standard [21]. In otherwords, we may have c(x) = 12 .

At worst, there is no signicant digit in f (x). This may happen for examplewhen some intermediate algorithm used within the function evaluation (e.g.the range reduction algorithm) cannot get a small relative error. An exampleof such a situation is given in the next section. In other words, we may havec(x) 1016.

3.2 Various results for sin(264)In this section, we compute the result of the computation sin(2 64) on various com-putation softwares on several operating systems. This particular computation isinspired by the work of Soni and Edelman [18] where the authors performed variouscomparisons of numerical computations across different softwares. Here, the partic-ular input x = 264 has been chosen because this number can be exactly representedas a oating point number.

In order to get all the available precision, we often have to congure the display,so that all digits are printed. For example, in Scilab, we must use the formatfunction, as in the following session.

9


10/59

Software Operating System ResultWolfram Alpha Web Service 0.0235985...Octave 3.2.4 Win. XP 32 bits 0.247260646...Matlab 7.7.0 R 2008 Win. XP 32 bits 0.0235985...Scilab 5.2.2 Win. XP 32 bits 0.2472606463...Scilab 5.2.2 Linux 32 bits glibc 2.10.1 -0.35464734997...Scilab 5.2.2 Linux 64 bits eglibc 2.11.2-1 0.0235985...

Figure 1: A family of results for sin(264).

- ->format("e",25)-->sin(2^64)

ans =2.472606463094176865D-01

In Matlab, for example, we use the format long statement.We used Wolfram Alpha [15] in order to compute the exact result for this com-

putation. The results are presented in gure 1.This table can be compared with the Table 20, p. 28 in [ 18].One of the reasons behind these discrepancies may be the cumulated errors in the

range reduction algorithm. Anyway, this value of x is so large that a small changein x induces a large number of cycles in the trigonometric circle: this is not a safezone of computation for sine.

It can be proved that the condition number of the sine function is

x cos(x)sin(x)

.

Therefore, the sine function has a large condition number

if x is large, if x is an integer multiple of (where sin(x)=0).The example presented in this section is rather extreme. For most elementary

function and for most inputs x, the number of signicant binary digits is in the range[50, 52]. But there are many situations where this accuracy is not achieved.

3.3 Floating point implementation of the forward formula

In this section, we derive the oating point implementation of the forward formulagiven by

Df (x) = f (x + h) f (x)

h . (14)

In other words, given x and f , we search the step h > 0 so that the error in thenumerical derivative is minimum.

In the IEEE 754 standard[ 21, 9], double precision oating point numbers arestored as 64 bits oating point numbers. More precisely, these numbers are stored

10


11/59

with 52 bits in the mantissa, 1 sign bit and 11 bits in the exponent. In Scilab,which uses double precision numbers, the machine precision is stored in the globalvariable %eps, which is equal to M = 1252 = 2.220.10

16. This means that, anyvalue x has 52 signicants binary digits, corresponds to approximately 16 decimaldigits. If IEEE 754 single precision oating point numbers were used (i.e. 32 bitsoating point numbers with 23 bits in the mantissa), the precision to use would be

M = 1223 10 7.We can, as Dumontet and Vignes[ 6], consider the forward difference formula 7very closely. Indeed, there are many sources of errors which can be considered:

the point x is represented in the machine by x, the step h is represented in the machine by h,

the point x + h is computed in the machine as x h, where the

operation

is the addition,

the function value of f at point x is computed by the machine as f (x), the function value of f at point x + h is computed by the machine as f (x h), the difference f (x + h) f (x) is computed by the machine as f (x h) f (x),where the operation is the subtraction, the factor ( f (x + h) f (x)) /h is computed by the machine as ( f (x + h)f (x)) h, where the operation is the division.

All in all, the forward difference formula

Df (x) = f (x + h) f (x)

h (15)

is computed by the machine as

Df (x) = ( f (x h) f (x)) h. (16)

For example, consider the error which is associated with the sum x h. If thestep h is too small, the sum x h is equal to x. On the other side, if the step h is

too large then the sum xh is equal to

h. We may require that the step

h is in theinterval [2 52x, 252x] so that x are not too far away from each other in magnitude.

We will discuss this assumption later in this chapter.Dumontet and Vignes show that the most important source of error in the com-

putation is the function evaluation. That is, the addition , subtraction anddivision operations and the nite accuracy of x and h, produce most of the timea much lower relative error than the error generated by the function evaluation.With a oating point computer, the total error that we get from the forward

difference approximation 14 is (skipping the multiplication constants) the sum of two terms :

the truncation error caused by the term h2 f (x),

11


12/59

and the rounding error M |f (x)| on the function values f (x) and f (x + h).Therefore, the error associated with the forward nite difference is

E (h) = M |f (x)|

h +

h

2 |f (x)

| (17)

The total error is then the balance between the positive functions M |f (x)|h andh2 |f (x)|.

When h , the error is dominated by the truncation error h2 |f (x)|. When h 0, the error is dominated by the rounding error M | f (x)|h .The following Scilab script allows to create the plot of the function E (h) which

is presented in gure 2. The graph is plot in logarithmic scale for the functionf (x) = x2. When this function is considered at point x = 1, we have f (x) = 1 and

f (x) = 2.f un ct io n e = t ot al er ro r ( h )

f = 1f p p = 2e = %eps * f / h + h * fpp / 2.0

endfunct ion

n = 1 00 0;x = l i ns p ac e ( - 16 , 0 , n );y = z e ro s ( n ,1 );f o r i = 1 : n

h = 1 0^ ( x ( i )) ;

y ( i) = l og 10 ( t o ta le rr or ( h ) );en dplot ( x , y )

Proposition 3.1. Let f : R R be a continuously derivable function of one variable. Consider the forward nite difference of f dened by 7 . Assume that the associated error implied by truncation and rounding is dened by

E (h) = M |f (x)|

h +

h2 |f (x)|. (18)

Then the unique step which minimizes the error is

h = 2 M |f (x)||f (x)| . (19)Furthermore, assume that f satises

|f (x)| 1 and 12|f (x)| 1. (20)

Therefore, the approximate optimal step is

h M , (21)where the approximate error is

E (h) 2 M . (22)12


13/59

Figure 2: Total error of the numerical derivative as a function of the step in loga-rithmic scale - Theory.

13


14/59

Proof. The total error is minimized when the derivative of the function E is zero.The rst derivative of the function E is

E (h) =

M |f (x)|

h2 +

1

2|f (x)

|. (23)

The second derivative of E is

E (h) = 2 M |f (x)|h3

. (24)

If we assume that f (x) = 0, then the second derivative E (h) is strictly positive,since h > 0 (i.e. we consider only non-zero steps). This rst derivative E (h) is zeroif and only if

M |f (x)|

h2 +

1

2|f (x)

|= 0 (25)

Therefore, the optimal step is 19. If we make the additionnal assumptions 20, thenthe optimal step is given by 21. If we plug the equality 21 into the denition of thetotal error 17 and use the assumptions 20, we get the error as in 22, which concludesthe proof.

The previous analysis shows that a more robust algorithm to compute the nu-merical rst derivative of a function of one variable is:

h = s qr t ( %e ps )f p = ( f ( x+ h ) -f ( x )) / h

In order to evaluate f (x), two evaluations of the function f are required byformula 14 at points x and x + h. In practice, the computational time is mainlyconsumed by the evaluation of f . The practical computation of 21 involves only theuse of the elementary function ., which is negligible.

In Scilab, we use double precision oating point numbers so that the roundingerror is

M 10 16. (26)We are not concerned here with the exact value of M , since only the order of

magnitude matters. Therefore, based on the simplied formula 21, the optimal stepassociated with the forward numerical difference is

h 10 8. (27)This is associated with the approximate error

E (h) 2.10 8. (28)

14


15/59

3.4 Numerical experiments with the robust forward formula

We can introduce the accuracy of the function evaluation by modifying the equation19. Indeed, if we take into account for 12 and 13, we get:

h = 2c(x) M |f (x)||f (x)| (29)= 2c(x)|f (x)||f (x)| M . (30)

In pratice, it is, unfortunately, not possible to compute the optimum step. It is stillpossible to analyse what happens in simplied situations where the exact derivativeis known.

We now consider the function f (x) = x, for x

0 and evaluate its numericalderivative at the point x = 1. In the following Scilab functions, we dene thefunctions f (x) = x, f (x) = 1 / 2x 1/ 2 and f (x) = 1/ 4x 3/ 2.

f un ct io n y = m ys qr t ( x )y = s qr t( x)

endfunct ionf un ct io n y = m yd sq rt ( x )

y = 0. 5 * x ^( - 0.5 )endfunct ionf un ct io n y = m yd ds qr t ( x )

y = - 0. 25 * x ^ ( -1 .5 )endfunct ion

The following Scilab functions dene the approximate step h dened by h = M and the optimum step h dened by 29.f un ct io n y = s te p_ ap pr ox im at e ( )

y = s qr t ( %e ps )endfunct ionfu nc tio n y = s te p_e xa ct ( f , fpp , x )

y = s qr t( 2 * % ep s * a bs ( f( x)) / a bs ( fpp ( x)) )endfunct ion

The following functions dene the forward numerical derivative and the relativeerror. The relative error is not dened for points x so that f (x) = 0, but we willnot consider this situation in this experiment.

funct ion y = forward ( f , x , h )y = ( f (x +h ) - f (x )) /h

endfunct ionfu nc tio n y = r el at iv ee rr or ( f , fp ri me , x , h )

e xp ec te d = f pr im e ( x )co mp ute d = f or wa rd ( f , x , h )y = ab s ( c om pu te d - e xp ec te d ) / ab s( e xp ec te d )

endfunct ion

The following Scilab functions plots the relative error for several steps h fromh = 10 16 to h = 1. The resulting data is plot in logarithmic scale.

f un ct io n d ra wr el at iv ee r ro r ( f , f pr im e , x , m yt it le )n = 1000

15


16/59

l o gh a rr a y = l i ns p ac e ( - 16 , 0 , n )f o r i = 1 : n

h = 1 0^ ( l o gh a rr a y ( i ))logearray( i )= log10(re la t iveerror ( f , fpr ime ,x ,h))

en dp lo t ( l og ha rr ay , l og ea rr ay )x t i t le (myt i t le ," log(h)" ," log(E)")

endfunct ion

We now use the previous functions and execute the following Scilab statements.x = 1.0;d ra wr el a ti ve er ro r ( m ys qr t , m yd sq rt , x ,

" Re la t iv e e r ro r o f n u me r ic a l d e ri v at i ve i n x = 1 .0 " ) ;h 1 = s te p_ ap pr ox im at e ( ) ; mprintf ( " St ep Approximate = %e \n" , h1 )h2 = s te p_ ex ac t ( m ys qr t , m yd ds qr t , x ); mprintf ( " St ep Exact = %e \n" , h2 )

The previous script produces the following output:Ste p Ap pr o xi mat e = 1 . 49 0 11 6 e - 00 8Ste p Exa ct = 4 . 21 4 68 5 e - 00 8

and plots the relative error presented in the gure 3.We can compare the gures 2 and 3 and see that, indeed, the theory produces a

maximal bound for the relative error. We also see that the difference between theapproximate and the exact step is small in this particular case.

3.5 Backward formula

Let us consider Taylors expansion from equation 5, and use h instead of h. Wegetf (x h) = f (x) hf (x) +

h2

2 f (x) + O(h3) (31)

This leads to the backward formula

f (x) = f (x) f (x h)

h + O(h) (32)

As the forward formula, the backward formula is order 1. The analysis presented

for the forward formula leads to the same results and will not be repeated, since thebackward formula does not increase the accuracy.

3.6 Centered formula with 2 points

In this section, we derive the centered formula based on the two points x h. Wegive the optimal step in double precision and the associated error.Proposition 3.2. Let f : R R be a continuously derivable function of one variable. Therefore,

f (x) f (x + h) f (x h)2h + h2

6 f (x) + O(h3). (33)

16


17/59

Figure 3: Total error of the numerical derivative as a function of the step in loga-rithmic scale - Numerical experiment.

17


18/59

Proof. The Taylor expansion of the function f at point x is

f (x + h) = f (x) + hf (x) + h2

2 f (x) +

h3

6 f (x) + O(h4). (34)

If we replace h by h in the previous equation we getf (x h) = f (x) hf (x) +

h2

2 f (x)

h3

6 f (x) + O(h4). (35)

We subtract the two equations 34 and 35 and get

f (x + h) f (x h) = 2 hf (x) + h3

3 f (x) + O(h4). (36)

We immediately get 33, which concludes the proof, or, more simply, the centered 2 points nite difference

f (x) = f (x + h) f (x h)2h + O(h2), (37)

which approximates f at order 2.

Denition 3.3. ( Centered two points nite difference for f ) The nite difference formula

Df (x) = f (x + h) f (x h)

2h (38)

is the centered 2 points nite difference for f and is an order 2 approximation for f .Proposition 3.4. Let f : R R be a continuously derivable function of one variable. Consider the centered 2 points nite difference of f dened by 38 . Assume that the total error implied by truncation and rounding is

E (h) = M |f (x)|

h +

h2

6 |f (x)|. (39)Therefore, the unique step which minimizes the error is

h =3r |f (x)||f (x)|

1/ 3

. (40)

Assume that f satises

|f (x)| 1 and 13|f (x)| 1. (41)

Therefore, the approximate step which minimizes the error is

h 1/ 3M . (42)

which is associated with the approximate error

E (h) 32 2/ 3M . (43)

18


19/59

Proof. The rst derivative of the error is

E (h) = M |f (x)|

h2 +

h3 |f (x)|. (44)

The error is minimum when the rst derivative of the error is zero

M |f (x)|

h2 +

h2

3 |f (x)| = 0 . (45)The solution of this equation is 40. By the hypothesis 41, the optimal step is givenby 42, which concludes the rst part of the proof. If we plug the previous equalityinto the denition of the total error 39 and use the assumptions 41, we get the errorgiven by 43, which concludes the proof.

With double precision oating point numbers, the optimal step associated withthe centered numerical difference is

h 6.10 6. (46)This is associated with the error

E (h) 5.10 11 . (47)

3.7 Centered formula with 4 points

In this section, we derive the centered formula based on the fours points x

h and

x 2h. We give the optimal step in double precision and the associated error.Proposition 3.5. Let f : R R be a continuously derivable function of one variable. Therefore,

f (x) = 8f (x + h) 8f (x h) f (x + 2h) + f (x 2h)

12h

+h4

30f (5) (x) + O(h5). (48)


f (x + h) = f (x) + hf (x) + h2

2 f (2) (x) +

h3

6 f (3) (x) +

h4

24f (4) (x)

+ h5

120f (5) (x) + O(h6). (49)

If we replace h by h in the previous equation we getf (x h) = f (x) hf (x) +

h2

2 f (2) (x)

h3

6 f (3) (x) +

h4

24f (4) (x)

h5

120f (5)

(x) + O(h6

). (50)

19


20/59

We subtract the two equations 49 and 50 and get

f (x + h) f (x h) = 2hf (x) + h3

3 f (3) (x) +

h5

60f (5) (x) + O(h6). (51)

We replace h by 2h in the previous equation and get

f (x + 2h) f (x 2h) = 4hf (x) + 8h3

3 f (3) (x) +

8h5

15 f (5) (x) + O(h6). (52)

In order to eliminate the term f (3) (x), we multiply the equation 51 by 8 and get

8 (f (x + h) f (x h)) = 16 hf (x) + 8h3

3 f (3) (x) +

2h5

15 f (5) (x) + O(h6). (53)

We subtract equations 52 and 53 and we have

8 (f (x + h) f (x h)) (f (x + 2h) f (x 2h))= 12hf (x)

6h5

15 f (5) (x) + O(h6). (54)

We divide the previous equation by 12 h and get

8 (f (x + h) f (x h)) (f (x + 2h) f (x 2h))12h

= f (x) h4

30f (5) (x) + O(h5), (55)

which implies the equation 48 or, more simply,

f (x) = 8f (x + h) 8f (x h) f (x + 2h) + f (x 2h)

12h + O(h4), (56)

which is the centered 4 points formula of order 4.

Denition 3.6. ( Centered 4 points nite difference for f ) The nite difference formula

Df (x) = 8f (x + h) 8f (x h) f (x + 2h) + f (x 2h)

12h (57)

is the centered 4 points nite difference for f .

Proposition 3.7. Let f : R R be a continuously derivable function of one variable. Consider the centered centered 4 points nite difference of f dened by 57 .Assume that the total error implied by truncation and rounding is

E (h) = M |f (x)|

h +

h4

30|f (5) (x)|. (58)

Therefore, the optimal step is

h = 15 M |f (x)|2|f (5) (x)|1/ 5

. (59)

20


21/59


|f (x)| 1 and 215|f

(5) (x)| 1, (60)Therefore, the approximate step

h 1/ 5M , (61)

which is associated with the error

E (h) 54

4/ 5M . (62)


E (h) = M

|f (x)

|h2 + 2h3

15 |f (5) (x)|. (63)The error is minimum when the rst derivative of the error is zero

M |f (x)|

h2 +

2h3

15 |f (5) (x)| = 0 . (64)

The solution of the previous equation is the step 59. If we make the assumptions 60,then the optimal step is 61, which concludes the rst part of the proof. If we plugthe equality 61 into the denition of the total error 58 and use the assumptions 60,we get the error 62, which concludes the proof.

With double precision oating point numbers, the approximate optimal stepassociated with the centered 4 points numerical difference is

h 4.10 4. (65)This is associated with the approximate error

E (h) 3.10 13. (66)

3.8 Some nite difference formulas for the rst derivative

In this section, we present several formulas to compute the rst derivative of afunction of several parameters. We present and compare the associated optimalsteps and optimal errors.

The gure 4 present various formulas for the computation of the rst derivativeof a continuously derivable function f . The approximate optimum step h and theapproximate minimum error E (h) are computed for double precision oating pointnumbers. We do not take into account for the scaling with respect to x (see below).

The gure 5 present the optimal steps and the associated errors for various nitedifference formulas.

We notice that with increasing accuracy (i.e. with order from 1 to 4), the size of the step increases, while the error decreases.

21


22/59

Name Formula hForward 2 points f (x+ h) f (x)h M Centered 2 points f (x+ h) f (x h)2h

1/ 3M

Centered 4 points f (x+2 h)+8 f (x+ h) 8f (x h)+ f (x 2h)12h

1/ 5M

Figure 4: Various formulas for the computation of the Jacobian of a given functionf .

Name h E (h)Forward 2 points 10 8 2.10 8Centered 2 points 6 .10 6 5.10 11Centered 4 points 4 .10 4 3.10 13

Figure 5: Optimal steps and error of nite difference formulas for the computationof the Jacobian of a given function f with double precision oating point numbers.We do not take into account for the scaling with respect to x.

3.9 A three points formula for the second derivative

In this section, we present a three points formula for the second derivative of afunction of one variable. We present the error analysis and compute the optimumstep and minimum error.

Proposition 3.8. Let f : R R be a continuously derivable function of one variable. Therefore,

f (x) = f (x + h) 2f (x) + f (x h)

h2 +

h2

12f (4) (x) + O(h3). (67)


f (x + h) = f (x) + hf (x) + h2

2 f (2) (x) +

h3

6 f (3) (x) +

h4

24f (4) (x)

+ h5

120f (5) (x) + O(h6). (68)

If we replace h by

h in the previous equation we get

f (x h) = f (x) hf (x) + h2

2 f (2) (x)

h3

6 f (3) (x) +

h4

24f (4) (x)

h5

120f (5) (x) + O(h6). (69)

We sum the two equations 68 and 69 and get

f (x + h) + f (x h) = 2 f (x) + h2f (x) + h4

12f (4) (x) + O(h5). (70)

This leads to the three points nite difference formula 67, or, more simply,

f (x) = f (x + h) 2f (x) + f (x h)h2 + O(h2). (71)

22


23/59

The formula 71 shows that this three points nite difference is order 2.

Denition 3.9. ( Centered 3 points nite difference for f ) The nite difference formula

Df (x) = f (x + h) 2f (x) + f (x h)

h2 (72)

is the centered 3 points nite difference for f .

Proposition 3.10. Let f : R R be a continuously derivable function of one variable. Consider the centered centered 4 points nite difference of f dened by 72 .Assume that the total error implied by truncation and rounding is

E (h) = M |f (x)|

h2 +

h2

12|f (4) (x)|. (73)

Therefore, the unique step which minimizes the error is

h =12 M |f (x)||f (4) (x)|

1/ 4

. (74)


|f (x)| 1 and 112|f

(4) (x)| 1, (75)Therefore, the approximate step is

h 1/ 4M , (76)

which is associated with the approximate error

E (h) 21/ 2M . (77)


E (h) = 2r |f (x)|

h3 +

h6 |f

(4) (x)|. (78)Its second derivative is

E (h) = 6r

|f (x)

|h4 + 16|f (4) (x)|. (79)

The second derivative is positive, since, by hypothesis, we have h > 0. Therefore,the function E is convex and has only one global minimum. The error E is minimumwhen the rst derivative of the error is zero

2r |f (x)|

h3 +

h6 |f

(4) (x)| = 0 . (80)Therefore, the optimal step is given by the equation 74. By the hypothesis 75, theoptimal step is given by 76, which concludes the rst part of the proof. If we plug

the equality 76 into the denition of the total error 73 and use the assumptions 75,we get the error 77, which concludes the proof.

23


24/59

With double precision oating point numbers, the optimal step associated withthe centered 4 points numerical difference is

h

1.10 4. (81)

This is associated with the error

E (h) = 3 .10 8. (82)

3.10 Accuracy of nite difference formulas

In this section, we give a proposition which computes the order of magnitude of many nite difference formulas.

Proposition 3.11. Let f : R R be a continuously derivable function of one variable. We consider the derivative f

(d), where d 1 is a positive integer. Assume that the derivative f (d) is approximated by a nite difference formula. Assume that

the rounding error associated with the nite difference formula is

E r (h) = M |f (x)|

hd . (83)

Assume that the associated truncation error is

E t (h) = h p

|f (d+ p)(x)|, (84)

where > 0 is a positive constant, p 1 is a strictly positive integer associated with the order of the nite difference formula. Therefore, the unique step which minimizes the total error is

h = M d p

|f (x)||f (d+ p)(x)|

1d + p

. (85)

Assume that the function f is so that

|f (x)| 1 and 1 |f

(d+ p)(x)| 1. (86)Assume that the ratio d/p has an order of magnitude which is close to 1, i.e.

d p 1. (87)

Then the unique approximate optimal step is

h 1

d + pM , (88)

and the associated error is

E (h) 2p

d + pM . (89)

24


25/59

This proposition allows to compute the optimum step much faster than with acase by case analysis. The assumptions 86 might seem to be strong at rst, but, aswe have allready seen, are reasonable in practice.

Proof. The total error is

E (h) = M |f (x)|

hd +

h p

|f (d+ p)(x)|. (90)

The rst derivative of the error E is

E (h) = dM |f (x)|hd+1

+ ph p 1 |f (d+ p)(x)|

. (91)

The second derivative of the error E is

E (h) = d(d + 1)M | f (x)|h d +2 , if p = 1

d(d + 1) M | f (x)|h d +2 + p( p1)h p 2 |f ( d + p ) (x)|

, if p 2 (92)

Therefore, whatever the value of p 1, the second derivative of the error E ispositive. Hence, the function E is convex for h > 0. This implies that there is onlyone global minimum, which is the solution of the equation E (h) = 0. The optimumstep h satises the equation

dM |f (x)|h

d+1 + ph p 1 |f (d+ p)(x)|

= 0. (93)

This leads to the equation 85. Under the assumptions on the function f given by86 and on the factor d p given by 87, the previous equality simplies into

h = 1

d + pM , (94)

which proves the rst result. The same assumptions simplify the approximate errorinto

E (h) M hd

+ h p. (95)

If we introduce the optimal step 94 into the previous equation, we get

E (h) M

dd + pM

+ p

d + pM (96)

p

d + pM +

pd + pM (97)

2p

d + pM , (98)

which concludes the proof.

25


26/59

Example 3.1 Consider the following centered 3 points nite difference for f

f (x) = f (x + h) 2f (x) + f (x h)

h2 +

h2

12f (4) (x) + O(h3). (99)

The error implied by truncation and rounding is

E (h) = M |f (x)|

h2 +

h2

12|f (4) (x)|, (100)

which can be interpreted in the terms of the proposition 3.11 with d = 2, p = 2 and = 12. Then the unique approximate optimal step is

h 14M , (101)

and the associated approximate error is

E (h) 212M . (102)

This result corresponds to the proposition 3.10, as expected.

3.11 A collection of nite difference formulas

In this section, we present some nite difference formulas which compute variousderivatives with various orders of precision. For each formula, the optimum stepand the minimum error is presented, under the assumptions of the proposition 3.11.

First derivative : forward 2 pointsf (x) =

f (x + h) f (x)h

+ O(h) (103)

Optimal step : h 1/ 2M and error E

1/ 2M .

Double precision h 10 8 and E 10 8. First derivative : backward 2 points

f (x) = f (x)

f (x

h)

h + O(h) (104)Optimal step : h

1/ 2M and error E

1/ 2M .

Double precision h 10 8 and E 10 8. First derivative : centered 2 points

f (x) = f (x + h) f (x h)

2h + O(h2) (105)

Optimal step : h = 1/ 3M and error E

2/ 3M .

Double precision h 10 5 and E 10 10.

26


27/59

First derivative : double forward 3 pointsf (x) = f (x + 2h) + 4 f (x + h) 3f (x)

2h + O(h2) (106)


2/ 3M .

Double precision h 10 5 and E 10 10. First derivative : double backward 3 points

f (x) = f (x 2h) 4f (x + h) + 3 f (x)

2h + O(h2) (107)


2/ 3M .

Double precision h 10 5 and E 10 10. First derivative : centered 4 points

f (x) = 112h

(f (x + 2h) + 8 f (x + h)8f (x h) + f (x 2h)) + O(h4) (108)


4/ 5M .

Double precision h 10 3 and E 10 12. Second derivative : forward 3 points

f (x) = f (x + 2h) 2f (x + h) + f (x)h2 + O(h) (109)


1/ 3M .

Double precision h 10 6 and E 10 6. Second derivative : centered 3 points

f (x) = f (x + h) 2f (x) + f (x h)

h2 + O(h2) (110)


1/ 2M .Double precision h 10 4 and E 10 8.

Second derivative : centered 5 pointsf (x) =

112h2

(f (x + 2h) + 16f (x + h) 30f (x)+16 f (x h) f (x 2h)) + O(h4) (111)


2/ 3M .

Double precision h 10 2 and E 10 10.

27


28/59

Third derivative : centered 4 pointsf (3) (x) =

12h3

(f (x + 2h) 2f (x + h)+2f (x h) f (x 2h)) + O(h

2) (112)


2/ 5M .

Double precision h 10 3 and E 10 6. Fourth derivative : centered 5 points

f (4) (x) = 1h2

(f (x + 2h) 4f (x + h) + 6 f (x)4f (x h) + f (x 2h)) + O(h2) (113)


1/ 3M .Double precision h 10 2 and E 10 5.

Some of the prevous formulas will be presented in the context of Scilab in thesection 4.3.

4 Finite differences of multivariate functions

In this section, we analyse methods to approximate the derivatives of multivariatefunctions with Scilab. In the rst part, we present the gradient and Hesssian of

a multivariate function. Then we analyze methods to compute the derivatives of multivariate functions with nite differences. We present Scilab functions to com-pute these derivatives. By composing the nite difference operators, it is possible toapproximate higher degree derivatives and we present how to use this method withScilab. Finally, we present Richardsons method to approximate derivatives withmore accuracy and discuss methods to take bounds into account.

4.1 Multivariate functions

In this section, we present formulas which allow to compute the numerical derivatives

of multivariate function.Assume that n is a positive integer representing the dimension of the space.Assume that f is a multivariate continuously differentiable function : f : R n R .We denote by x R n the current vector with n dimensions. The n-vector of partialderivatives of f is the gradient of f and will be denoted by f (x ) or g(x ):

f (x ) = g (x ) =

f x 1...f

x n

. (114)

Consider the function f : R n

R m , where m is a positive integer. Then the

partial derivatives form a n m matrix, which is called the Jacobian matrix. In this28


29/59

document, we will consider only the case m = 1, but the results which are presentedcan be applied directly to each component of f (x ). Hence, the case m > 1 doesnot introduce any new problem and we will not consider it in the remaining of thisdocument.

Higher derivatives of a multivariate function are dened as in the univariatecase. Assume that f has continous partial derivatives f/x i for i = 1, . . . , n andcontinous partial derivatives 2f/x ix j i, j = 1, . . . , n . Then the Hessian matrixof f is denoted by 2f (x ) of H (x ):

2f (x ) = H (x ) =

2 f x 21

. . . 2 f

x 1 x n...

... 2 f

x 1 x n . . . 2 f

x 2n

. (115)

The Taylor-series expansion of a general function f in the neighbourhood of apoint x can be derived as in the univariate case presented in the section 2.1.1. Letx R n be a given point, p R n a vector of unit length and h R a scalar. Thefunction f (x + hp ) can be regarded as a univariate function of h and the univariateexpansion can be applied directly:

f (x + hp ) = f (x ) + hg (x)T p + 12

h2p T H (x )p + . . .

+ 1

(n 1)!hn 1D n 1f (x ) +

1n!

hn D n f (x + hp ), (116)

for some [0, 1] and where

D s f (x ) =i1 =1 ,n i2 =1 ,n

. . .is =1 ,n

pi1 pi2 . . . pis s f (x )

x i1 x i2 . . . x i s. (117)

We can expand Taylors formula, keep only the rst three terms of this expansionand get:

f (x + hp ) = f (x ) + hg (x)T p + 12

h2p T H (x )p + O(h3). (118)The term hg (x)T p is the directional derivative of f and is an order 1 term whichdrives the rate of change of f at the point x . The order 2 term p T H (x )p is thecurvature of f along p . A direction p such that p t H (x )p > 0 is termed a direction of positive curvature .

In the particular case of a function of two variables, the previous general formula

29


30/59

can be written in integral form:

f (x1 + h1, x2 + h2) = f (x1, x2) + h1f x 1

+ h2f x 2

+ h21

2

2f

x 21+ h1h2

2f

x 1x 2+ h

22

2

2f

x 22

+h316

3f x 31

+ h21h2

2 3f

x 21x 2+

h1h222

3f x 1x 22

+ h326

3f x 32

+h4124

4f x 41

+ h31h2

6 4f

x 41x 2+

h21h224

4f x 21x 22

+ h1h32

6 4f

x 1x 32+

h4224

4f x 42

+ . . .

+m + n = p

hm1m!

hn2n!

1

0

pf x m1 xn2

(x1 + th1, x2 + th2) p(1 t) p 1dt, (119)

where the terms associated with the partial derivates of degree p have the form

m + n = p

hm1m!

hn2n!

pf x m1 xn2

. (120)

4.2 Numerical derivatives of multivariate functions

The Taylor-series expansion of a general function f allows to derive approximationof the function in a neighbourhood of x . Indeed, if we keep the rst term in theexpansion, we get

f (x + hp ) = f (x ) + hg (x)T p + O(h2). (121)This formula leads to an order 1 nite difference formula for the multivariate

function f . We emphasize that the equation 121 is an univariate expansion in thedirection p . This is why the univariate nite difference formulas can be directlyapplied for multivariate functions. Let h i be the step associated with the i-th com-ponent of x , and let e i R n be the vector e i = (( ei)1, (ei)2, . . . , (ei)n )T with

(e i) j = 1 if i = j,0 if i = j, (122)

for j = 1, . . . , n . Then,f (x + hi e i) = f (x ) + hi g (x )T e i + O(h2). (123)

The term g(x )T e i is the i-th component of the gradient g(x ), so that g(x )T e i =gi(x ). Therefore, we can approximate the gradient of the function f by the nitedifference formula

gi(x ) = f (x + hi e i) f (x )

h i+ O(h). (124)

The previous formula is a multivariate nite difference formula of order 1 for the

gradient of the function f . It is the direct analog of univariate nite differencesformulas that we have previously analyzed.

30


31/59

Similarily to the univariate case, the centered 2 points multivariate nite differ-ence for the gradient of f is

gi(x ) = f (x + hi e i) f (x h i e i)

h i+

O(h2) (125)

and the centered 4 points multivariate nite difference for the gradient of f is

gi(x ) = 8f (x + hi e i) 8f (x h i e i) f (x + 2 h i e i) + f (x 2h i e i)

12h i+ O(h4). (126)

We have alread noticed that the previous formulas are simply the univariateformula in the direction hi e i . The consequence is that the evaluation of the gradientvector g requires n univariate nite differences.

4.3 Derivatives of a multivariate function in ScilabIn this section, we present a function which computest the Jacobian of a multivariatefunction f .

The following derivativeJacobianStep function computes the approximate op-timal step for some of the formulas for the rst derivative. The function takes theformula name form as input argument and returns the approximate (scalar) optimalstep h.

funct ion h = de r iva tiveJacob ianStep ( fo rm)se lec t fo rmcase " fo rw ar d2 po in ts " then // Orde r 1

h=%eps^(1/2)c as e " b ac kw ar d2 po in ts " t he n // Or de r 1

h=%eps^(1/2)c as e " c en te re d2 po in ts " t he n // Or de r 2

h=%eps^(1/3)c as e " d ou bl ef or wa rd 3p oi nt s " t he n // Or de r 2

h=%eps^(1/3)c as e " d ou bl eb ac kw ar d3 po in ts " the n // O rd er 2


h=%eps^(1/5)else

e r ror (mspr in t f ( "Unknown fo rmula %s" , fo rm))en dendfunct ion

The following derivativeJacobian function computes an approximate Jaco-bian. It takes as input argument the function f , the vector point x, the vector steph and the formula form and returns the approximate Jacobian J .

funct ion J = de r ivat iveJacob ian ( f ,x ,h , form)n = s iz e (x , "* " )D = d ia g ( d ia g ( h ))for i = 1 : n

d = D ( : , i )

se lec t fo rmc as e " fo rw ar d2 po in ts " t he n // O r de r 1

31


32/59

J ( i ) = ( f ( x+ d ) -f ( x )) / h (i )c as e " b ac kw ar d2 po in ts " t he n // O rd er 1

J ( i ) = ( f ( x) - f (x - d ) )/ h ( i )c as e " c en te re d2 po in ts " t he n // O rd er 2

J ( i ) = ( f (x+d) - f (x -d ) ) / (2*h( i ) )c as e " d ou bl ef or wa rd 3p oi nt s " t he n // Or de r 2

J ( i ) = ( - f (x+2*d)+4*f (x+d) -3*f (x ) ) / (2*h( i ))c as e " d ou bl eb ac kw ar d3 po in ts " the n // Or de r 2

J ( i ) = ( f (x -2*d) -4*f (x -d )+3*f (x ) ) /(2*h( i ) )c as e " c en te re d4 po in ts " t he n // O rd er 4

J ( i ) = ( - f (x + 2* d ) + 8 * f( x + d ). .-8*f(x-d)+f(x-2*d)) / (12*h( i ) )

elsee r ror (mspr in t f ( "Unknown fo rmula %s" , fo rm))

en den d

endfunct ion

In the previous function, the statement D=diag(h) creates a diagonal matrix Dwhere the diagonal entries are equal to the vector h. Therefore, the i-th column of D is equal to hie i , as dened in the previous section.

We now experiment our approximate Jacobian function. The following functionquadf computes a quadratic function.

fu nc tio n f = quad f ( x )f = x (1 )^ 2 + x ( 2) ^2

endfunct ion

The quadJ function computes the exact Jacobian of quadf .fu nc tio n J = quad J ( x )

J ( 1 ) = 2 * x ( 1 )J ( 2 ) = 2 * x ( 2 )

endfunct ion

In the following session, we compute the exact Jacobian matrix at the point x =(1, 2)T .

- ->x=[1;2] ;- -> J = quadJ ( x )

J =2.4.

In the following session, we compute the approximate Jacobian matrix at the pointx = (1 , 2)T .- ->form = " fo rward2po in t s " ;- ->h = de r iva tiveJacob ianStep ( fo rm)

h =0.0007401

- - >h = h * o ne s ( 2 ,1 )h =

0.00074010.0007401

- ->Japprox = de r ivat iveJacob ian (quadf ,x ,h , fo rm)J ap pr ox =

2.4.

32


33/59

Although the derivativeJacobian function has interesting features, there aresome limitations.

We cannot compute the Jacobian matrix of a function which returns a m-by-1vector: only scalar functions can be differentiated.

We cannot differentiate a function f which requires extra-arguments.Both these limitations are addressed in the next section.

4.4 Derivatives of a vectorial function with Scilab

In this section, we present a Scilab script which computes the Jacobian matrix of avectorial function. This script will be used in the section 4.6,

where we compose derivatives.

In order to manage extra-arguments, we will make so that the function to bedifferentiated can be either

a function, with calling sequence y=f(x) , a list (f,a1,a2,...) . In this case, the rst element in the list is the function tobe differentiated with calling sequence y=f(x,a1,a2,...) , and the remaining

arguments a1,a2,... are automatically appended to the calling sequence.

Both cases are managed by the following derivativeEvalf function, which evalu-ates the function __derEvalf__ at the given point x.

funct ion y = de r ivat iveEva l f (__de rEva l f__ ,x )i f ( t y pe o f ( _ _d e rEv al f __ ) = = " f un c ti o n " ) t he ny = _ _ de r Ev a lf _ _ ( x )

e l se if ( t y pe o f ( _ _d e rEv al f __ ) = = " l is t " ) t he n__f_ fun__ = __de rEva l f__ (1 )y = __ f_ fun__(x ,__de rEva l f__ (2 :$ ))

elsee r ror (mspr in t f ( "Unknown funct ion type %s" , typeof ( f ) ) )

en dendfunct ion

The complicated name __derEvalf__ has been chosen in order to avoid conictsbetween the name of the argument and the name of the user-dened function. In-deed, such a conict may produce an innite recursion. This topic is presented inmore depth in [5].

The following derivativeJacobian function computes the Jacobian matrix of a given function __derJacf__ .

funct ion J = de r ivat iveJacob ian (__derJac f__ ,x ,h , fo rm)n = s iz e (x , "* " )D = d ia g( h)for i = 1 : n

d = D ( : , i )se lec t fo rmc as e " fo rw ar d2 po in ts " t he n // O r de r 1

y ( : ,1 ) = -de r iva t iveEva lf ( __de rJac f__ ,x )y ( : ,2 ) = de r ivat iveEva l f (__de rJac f__ ,x+d)

33


34/59

c as e " b ac kw ar d2 po in ts " t he n // O rd er 1y ( : ,1 ) = de r ivat iveEva l f (__de rJac f__ ,x )y ( : ,2 ) = -de r iva t iveEva lf ( __de rJac f__ ,x -d )

c as e " c en te re d2 po in ts " t he n // O rd er 2y ( : ,1 ) = 1 /2* de r ivat iveEva l f (__de rJac f__ ,x+d)y ( : ,2 ) = -1/2*de r iva t iveEva l f (__de rJac f__ ,x -d )

c as e " d ou bl ef or wa rd 3p oi nt s " t he n // Or de r 2y ( : ,1 ) = -3/2*de r iva t iveEva l f (__de rJac f__ ,x )y ( : ,2 ) = 2*de r iva t iveEva lf (__de rJac f__ ,x+d)y ( : ,3 ) = -1/2*de r iva t iveEva l f (__de rJac f__ ,x+2*d)

c as e " d ou bl eb ac kw ar d3 po in ts " the n // Or de r 2y ( : ,1 ) = 3 /2* de r ivat iveEva l f (__de rJac f__ ,x )y ( : ,2 ) = -2*de r iva t iveEva l f (__de rJac f__ ,x -d )y ( : ,3 ) = 1 /2* de r ivat iveEva l f (__de rJac f__ ,x -2*d)

c as e " c en te re d4 po in ts " t he n // O rd er 4y ( : ,1 ) = -1 /12*der ivat iveEva l f (__de rJac f__ ,x+2*d)y ( : ,2 ) = 2 /3* de r ivat iveEva l f (__de rJac f__ ,x+d)

y ( : ,3 ) = -2/3*de r iva t iveEva l f (__de rJac f__ ,x -d )y ( : ,4 ) = 1 /12*der iva t iveEva l f (__de rJac f__ ,x -2*d)


en dJ (: , i ) = s um( y, " c" ) / h( i )

en dendfunct ion

The following quadf function takes as input argument a 3-by-1 vector and returnsa 2-by-1 vector.

fu nc tio n y = quad f ( x )

f1 = x (1 )^ 2 + x (2 )^ 3 + x ( 3) ^4f 2 = e xp ( x ( 1) ) + 2 * si n ( x (2 )) + 3 * co s ( x (3 ))y = [ f1 ; f2 ]

endfunct ion

The quadJ function returns the Jacobian matrix of quadf .fu nc tio n J = quad J ( x )

J1 (1) = 2 * x (1)J1 (2 ) = 3 * x (2 )^ 2J1 (3 ) = 4 * x (3 )^ 3//J 2 (1 ) = e xp ( x ( 1) )

J 2 (2 ) = 2 * co s ( x (2 ))J 2 (3 ) = - 3* s in ( x ( 3) )//J = [ J1 ; J2 ]

endfunct ion

In the following session, we compute the exact Jacobian matrix of quadf at thepoint x = (1 , 2, 3)T .

- ->x=[1;2;3] ;- -> J = quadJ ( x )

J =2. 12. 108.

2 .7 18 28 18 - 0 .8 32 29 37 - 0 .4 23 36 00

34


35/59

In the following session, we compute the approximate Jacobian matrix of the functionquadf .

- ->x=[1;2;3] ;- ->form = " fo rward2po in t s " ;- ->h = de r iva tiveJacob ianStep ( fo rm) ;- - >h = h * o ne s ( 3 , 1) ;- ->Japprox = de r ivat iveJacob ian (quadf ,x ,h , fo rm)

J ap pr ox =2. 12. 108.2 .7 18 28 19 - 0 .8 32 29 37 - 0 .4 23 36 00

4.5 Computing higher degree derivatives

In this section, we present a result which allows to get a nite difference operatorfor f , based on a nite difference operator for f .

Consider the 2 points forward nite difference operator Df dened by

Df (x) = f (x + h) f (x)

h , (127)

which produce an order 1 approximation for f . Similarily, let us consider the nitedifference operator DDf dened by

DDf (x) = Df (x + h) Df (x)

h , (128)

that is, the composed operator DDf = ( D

D)f . It would be nice if DD was an

approximation for f . The previous formula simplies into

DDf (x) =f (x+2 h) f (x+ h)

h f (x+ h) f (x)hh

(129)

= f (x + 2h) f (x + h) f (x + h) + f (x)

h2 (130)

= f (x + 2h) 2f (x + h) + f (x)

h2 . (131)

It is straightforward to prove that the previous formula is, indeed, an order 1 formulafor f , that is, DDf dened by 128 is an order 1 approximation for f . The followingproposition presents this result in a more general framework.

Proposition 4.1. Let f : R R be a continuously derivable function of one variable. Let Df be a nite difference operator of order p > 0 for f . Therefore the nite difference operator DDf is of order p for f .

Proof. By hypothesis, Df is of order p, which implies that

Df (x) = f (x) + O(h p). (132)Let us dene g by

g(x) = Df (x). (133)

35


36/59

Since f is continuously derivable function, so is g. Therefore, Dg is of order p forg , which implies

Dg(x) = g (x) + O(h p). (134)We now plug the denition of g given by 133 into the previous equation and get

DDf (x) = ( Df ) (x) + O(h p) (135)= f (x) + O(h p), (136)

which concludes the proof.

Example 4.1 Consider the centered 2 points nite difference for f dened by

Df (x) = f (x + h) f (x h)

2h . (137)

We have proved in proposition 3.2 that Df is an order 2 approximation for f . We

can therefore apply the proposition 4.1 with p = 2 and get an approximation for f based on the nite difference

DDf (x) = Df (x + h) Df (x h)

2h . (138)

We can expand this formula and get

DDf (x) =f (x+2 h) f (x)

2h f (x) f (x 2h)2h2h

(139)

= f (x + 2h) 2f (x) + f (x 2h)

4h2 , (140)

which is, by proposition 4.1 a nite difference formula of order 2 for f .In practice, it may not be required to expand the nite difference in the way

of 139. Indeed, Scilab can manage callbacks (i.e. function pointers), so that it iseasy to use the proposition 4.1 so that the computation of the second derivativeis performed with the same source code that for the rst derivative. This methodis used in the derivative function of Scilab, as we will see in the correspondingsection.

We may ask if, by chance, a better result is possible for the nite difference DD .More precisely, we may ask if the order of the operator DD may be greater than theorder of the operator D. In fact, there is no better result, as we are going to see. Inorder to analyse if a higher order formula would be produced, we must explicitely

write higher order terms in the nite difference approximation. Let us write thenite difference operator Df by

Df (x) = f (x) + h p

f (d+ p)(x) + O(h p+1 ), (141)

where > 0 is a positive real and d 1 is an integer. We haveDDf (x) = ( Df ) (x) +

h p

(Df )(d+ p)(x) + O(h p+1 ) (142)

= f (x) + h p

f (d+ p+1) (x) + O(h p+1 )

+ h p

f (d+ p+1) (x) + h

p

f (2d+2 p)(x) + O(h p+1 ) + O(h p+1 ). (143)

36


37/59

Hence

DDf (x) = f (x) + 2h p

f (d+ p+1) (x) +

h2 p

2 f (2d+2 p)(x) + O(h p+1 ). (144)

We can see that the second term in the expansion is 2 hp

f (d+1) (x), which is of order

p. There is no assumption which may set this term to zero, which implies that DDis of order p, at best.

Of course, the process can be extended in order to compute more derivatives.Suppose that we want to compute an approximation for f (d)(x), where d 1 is aninteger. Let us dene the nite difference operator D(d)f by recurrence on d as

D (d+1) f (x) = D D (d)f (x). (145)By proposition 4.1, if Df is a nite difference operator of order p for f , therefore

D (d)f is a nite difference operator of order p for f (d)(x).We present how to implement the previous composition formulas in the section

4.6. But, for reasons which will be made clear later in this document, we must rstconsider derivatives of multivariate functions and derivatives of vectorial functions.

4.6 Nested derivatives with Scilab

In this section, we present how to compute higher derivatives with Scilab, based onrecursive calls.

We consider the same functions which were dened in the section 4.4.

The following derivativeHessianStep function returns the approximate opti-mal step for the second derivative, depending on the nite difference formula form .funct ion h = de r iva tiveHessianStep ( fo rm)

se lec t fo rmcase " fo rw ar d2 po in ts " then // Orde r 1

h=%eps^(1/3)c as e " b ac kw ar d2 po in ts " t he n // Or de r 1


h=%eps^(1/4)c as e " d ou bl ef or wa rd 3p oi nt s " t he n // Or de r 2

h=%eps^(1/4)

c as e " d ou bl eb ac kw ar d3 po in ts " the n // O rd er 2h=%eps^(1/4)

c as e " c en te re d4 po in ts " t he n // Or de r 4h=%eps^(1/6)


en dendfunct ion

We dene a function which returns the Jacobian matrix as a column vector.Moreover, we have to create a function with a calling sequence which is compatiblewith the one required by derivativeJacobian . The following derivativeFunc-

tionJ function returns the Jacobian matrix at the point x.

37


38/59

funct ion J = de r ivat iveFunc t ionJ (x , f ,h , form)J = de r iva tiveJacob ian ( f ,x ,h , form)J = J J = J ( : )

endfunct ion

Notice that the arguments x and f are switched in the calling sequence. The followingsession shows how the derivativeFunctionJ function changes the shape of J .

- ->x=[1;2;3] ;- -> H = quadH ( x );- ->h = de r iva tiveHessianStep (" fo rward2po int s " ) ;- - >h = h * o ne s ( 3 , 1) ;- ->Japprox = de r ivat iveFunc t ionJ (x ,quadf ,h , fo rm)

J ap pr ox =2.000006112.000036108.000332.7182901

- 0 .8322992- 0 .4233510

The following quadH function returns the Hessian matrix of the quadf function,which was dened in the section 4.4.

fu nc tio n H = quad H ( x )H1 = [2 0 00 6* x (2) 00 0 12* x (3)^2]//H2 = [e xp ( x (1 )) 0 00 - 2* s in ( x ( 2) ) 00 0 - 3* c os ( x ( 3) )]//H = [ H1 ; H2 ]

endfunct ion

In the following session, we compute the Hessian matrix at the point x = (1 , 2, 3)T .- ->x=[1;2;3] ;

- -> H = quadH ( x )H =

2. 0. 0.0. 12. 0.0. 0. 108.2.7182818 0. 0.0. - 1 .8185949 0.0. 0. 2 .9699775

Notice that the rows #1 to #3 contain the Hessian matrix of the rst component of quadf , while the rows #4 to #6 contain the Hessian matrix of the second componentof quadf .

In the following session, we compute the approximate Hessian matrix of quadf .We use the approximate optimal step and the derivativeJacobian function, which

38


39/59

was dened in the section 4.4. The trick is that we differentiate derivativeFunc-tionJ , instead of quadf .

- ->h = de r iva tiveHessianStep (" fo rward2po int s " ) ;- - >h = h * o ne s ( 3 , 1) ;

- ->fun li s t = l i st (de r iva t iveFunct ionJ ,quadf ,h , fo rm) ;- ->Happrox = de r ivat iveJacob ian ( funl i st , x ,h , fo rm)

H ap pr ox =1.9997533 0. 0.0. 12.00007 0 .0. 0. 108.000632.7182693 0. 0.0. - 1 .8185741 0.0. 0. 2 .9699582

Although the previous method seems interesting, it has a major drawback: itdoes not exploit the symmetry of the Hessian matrix, so that the number of function

evaluations is larger than required. Indeed, the Hessian matrix of a smooth functionf is symmetric, i.e.

H ij = H ji , (146)

for i, j = 1, 2, . . . , n . This relation comes as a consequence of the equality

2f x ix j

= 2f x j x i

, (147)

for i, j = 1, 2, . . . , n .The symmetry implies that only the coefficients for which i j , for example,need to be computed: the coefficients i < j can be deduced by symmetry of the

Hessian matrix. But the method that we have presented ignores this property. Thisleads to a number of function evaluations which could be divided roughly by a factor2.

4.7 Computing derivatives with more accuracy

In this section, we present a method to compute derivatives with more accuracy.This method, known as Richardsons extrapolation, improves the accuracy by usinga sequence of steps with decreasing sizes.

We may ask if there is a general method to get a increased accuracy for a given

derivative, from an existing nite difference formula. Of course, such a nite differ-ence will require more function evaluations, which is the price to pay for an increasedaccuracy. The following proposition gives such a method.

Proposition 4.2. Assume that the nite difference operator Df approximates the derivative f (d) at order p > 0 where d, p 1 are integers. Assume that

Df h (x) = f (d)(x) + h p

f (d+ p)(x) + O(hq), (148)

where > 0 is a real constant and q is an integer greater than p. Therefore, the nite difference operator

Df (x) = 2 pDf h (x) Df 2h (x)2 p 1 (149)

39


40/59

is an order q approximation for f (d) .

Proof. The proof is based on a direct use of the equation 148, with different stepsh. With 2 h instead of h in 148, we have

Df 2h (x) = f (d)(x) + 2 p

h pf (d+ p)(x) + O(hq). (150)

We multiply the equation 148 by 2 p and get:

2 pDf h (x) = 2 pf (d)(x) + 2 p

h pf (d+ p)(x) + O(hq), (151)

We subtract the equation 151 and the equation 150, and get

2 pDf h (x) Df 2h (x) = (2 p 1)f (d)(x) + O(hq). (152)We divide both sides of the previous equation by 2 p1 and get 149, which concludesthe proof.Example 4.2 Consider the following centered 2 points nite difference operator forf

Df h (x) = f (x + h) f (x h)

2h . (153)

We have proved in proposition 3.2 that this is an approximation for f (x) and

Df h (x) = f (x) + h26 f (3) (x) + O(h4). (154)

Therefore, we can apply the proposition 4.2 with d = 1, p = 2, = 6 and q = 4.Hence, the nite difference operator

Df (x) = 4Df h (x) Df 2h (x)

3 (155)

is an order q = 4 approximation for f (x). We can expand this new nite differenceformula and nd that we already have analysed it. Indeed, if we plug the denitionof the nite difference operator 153 into 155, we get

Df (x) = 4f (x+ h) f (x h)2h f (x+2 h) f (x 2h)4h

3 (156)

= 8 (f (x + h) f (x h)) (f (x + 2h) f (x 2h))

12h . (157)

The previous nite difference operator is the one which has been presented in propo-sition 3.5, which states that it is an order 4 operator for f .

The problem with the proposition 4.2 is that the optimal step is changed. Indeed,since the order of the modied nite difference method is changed, therefore, the

optimal step is changed too. In this case, the proposition 3.11 can be applied tocompute an approximate optimal step.

40


41/59

4.8 Taking into account bounds on parameters

The backward formula might be useful in some practical situations where the param-eters are bounded. This might happen when this parameter represents a physical

quantity which is physically bounded. For example, the real parameter x mightrepresent a fraction which is naturally in the interval [0 , 1].Assume that some parameter x is bounded in a given interval [a, b], with a, b R

and a < b. Assume that the step h is given, may be by the formula 21. If b > a + h,there is no problem at computing the numerical derivative with the forward formula

f (a) f (a + h) f (a)

h . (158)

If we want to compute the numerical derivative at b with the forward formula

f (b) f (b + h)

f (b)

h , (159)this leads to a problem, since b + h / [a, b]. In fact, any point x in the interval[bh, b] leads to the problem. For such points, the backward formula may be usedinstead.

5 The derivative function

In this section, we present the derivative function. We present the main featuresof this function and show how to change the order of the nite difference method.

We analyze of an orthogonal matrix may be used to change the directions of differ-entiation. Finally, we analyze the performances of derivative , in terms of functionevaluations.

5.1 Overview

The derivative function computes the Jacobian and the Hessian matrix of a givenfunction. We can use formulas of order 1, 2 or 4. Finally, the user can set the stepused in the nite difference formula. In this section, we will analyse all these points.

The following is the complete calling sequence for the derivative function.

[ J , H ] = der ivative ( F , x , h , order , H_form , Q )where the variables are

J , the Jacobian vector, H, the Hessian matrix, F, the multivariate function, x, the current point,

order , the order of the formula (1, 2 or 4),

H_form , the Hessian matrix storage (default, blockmat or hypermat),41


42/59

Q, a matrix used to scale the step.Since we are concerned here by numerical issues, we will use the blockmat

Hessian matrix storage.

The order 1, 2 and 4 formulas for the Jacobian matrix are implemented withformulas similar to the ones presented in gure 4, that is, the computations arebased on forward 2 points (order 1), centered 2 points (order 2) and centered 4points (order 4) formulas. The approximate optimal step h is computed dependingon the formulas in order to minimize the total error.

The derivative function takes into account for multivariate functions, so thatall points which have been detailed in section 4.2 can be applied here. In particular,the function uses modied versions 124, 125 and 126. Indeed, instead of using onestep hi for each direction i = 1, . . . , n , the same step h is used for all components.

5.2 Varying order to check accuracySince several accuracy are provided by the derivative function, it is easy and usefulto check the accuracy of a specic numerical derivative. If the derivative varies onlyslightly with various formula orders, that implies that the user can be condentin its derivatives. Instead, if the numerical derivatives varies greatly with differentformulas, that implies that the numerical derivative must be used with caution.

In the following Scilab script, we use various formulas to check the numericalderivative of the univariate quadratic function f (x) = x2.

f u nc t io n y = my fu n ct i on 3 ( x )y = x ^ 2 ;

endfunct ionx = 1.0;e x pe c te d = 2 .0 ;for o = [1 2 4]

f p = d e ri v at i ve ( my f un ct i on 3 , x , o rd e r = o ) ;e r r = abs ( fp - expected ) /abs (expected ) ; mprintf ( " Order = %d , Rela tive error : %e \n" ,or de r , er r )

en d

The previous script produces the following output, where the relative error isprinted.

Or de r = 1 , Re la t iv e e r ro r : 7 . 45 0 58 1 e - 00 9

Or de r = 2 , Re la t iv e e r ro r : 8 . 53 1 62 0 e - 01 2Or de r = 4 , Re la t iv e e r ro r : 0 . 00 0 00 0 e + 00 0

Increasing the order produces increasing accuracy, as expected in such a simplecase.

An advanced feature is provided by the derivative function, namely the trans-formation of the directions by an orthogonal matrix Q. This is the topic of thefollowing section.

5.3 Orthogonal matrix

In this section, we describe the mathematics behind the orthogonal n n matrix Q,which is an optionnal input argument of the derivative function. An orthogonal42


43/59

matrix is a square matrix satisfying QT = Q 1.In order to simplify the discussion, let us assume that the function is a multi-

variate scalar function, i.e. f : R n R . Second, we want to produce a result whichdoes not explicitely depends on the canonical vectors e i . The goal is to be able tocompute directionnal derivatives in directions which are combinations of the axisvectors. Then, Taylors expansion in the direction Qe i yields

f (x + hQe i) = f (x ) + hg (x )T Qe i + O(h2). (160)This leads to

g (x )T Qe i = f (x + hQe i) f (x )

h . (161)

Recall that in the classical formula, the term g (x )T e i can be simplied into gi(x ).But now, the matrix Q has been inserted in between, so that the direction is indeeddifferent. Let us denote by q i

R n the i-th column of the matrix Q. Let us denote

by d T R n the row vector of function differences dened by

di = f (x + hQe i) f (x )

h (162)

for i = 1, . . . , n . The equation 161 is transformed into g(x )T q i = di , or, in matrixform,

g (x )T Q = d T . (163)

We right multiply the previous equation by QT and get

g (x )T QQT = d T QT . (164)

By the orthogonality property of Q, this impliesg (x )T = d T QT . (165)

Finally, we transpose the previous equation and get

g (x ) = Qd . (166)

The Hessian matrix can be computed based on the method which has beenpresented in the section 4.5. Hence, the computation of the Hessian matrix can alsobe modied to take into account the orthogonal matrix Q.

Let us consider the case where the function f : R n R m , where m is a positiveinteger. We want to compute the Jacobian m n matrix J dened by

J =

f 1x 1 . . . f

1x n

... ...

f mx 1 . . .

f mx n

. (167)

In this case, the nite differences are dening a column vector, so that we mustconsider the m n matrix D with entries

D ij = f i(x + hQe j ) f i(x )

h , (168)

for i = 1, . . . , m and j = 1, . . . , n . The Jacobian matrix J is therefore computedfrom

J = DQT . (169)

43


44/59

Degree Order EvaluationsJacobian 1 n + 1Jacobian 2 2nJacobian 4 4nHessian 1 (n + 1) 2Hessian 2 4n2Hessian 4 16n2

Figure 6: The number of function evaluations for the Jacobian and Hessian matrices.

5.4 Performance of nite differences

In this section, we analyse the number of function evaluations required by the com-putation of the Jacobian and Hessian matrices with the derivative function.

The number of function evaluations required to perform the computation dependson the dimension n and the number of points in the formula. The table 6 summarizesthe results.

The following list analyzes the number of function evaluations required to com-pute the gradient of the function depending on the dimension and the order of theformula.

The order = 1 formula requires n+1 function evaluations. Indeed, the functionmust be evaluated at f (x ) and f (x + he i), for i = 1, . . . , n .

The order = 2 formula requires 2 n function evaluations. Indeed, the functionmust be evaluated at f (x he i) and f (x + he i), for i = 1, . . . , n .

The order = 4 formula requires 4 n function evaluations. Indeed, the functionmust be evaluated at f (x he i), f (x + he i), f (x 2he i) and f (x + 2 he i), fori = 1, . . . , n .Consider the quadratic function in n = 10 dimensions

f (x ) =i=1 ,10

x2i . (170)

In the following Scilab script, we dene the function and use a global variable tostore the number of function evaluations required by the derivative function.f u nc t io n y = my fu n ct i on 3 ( x )

g loba l nb feva ln bf ev al = n bf ev al + 1y = x . * x ;

endfunct ionx = ( 1: 10 ). ;for o = [1 2 4]

g loba l nb feva ln bf ev al = 0 ;J = de r iva t ive (myfunc t ion3 ,x ,o rder=o) ;

mprintf ( " Order = %d , Feval : %d \n" ,o , nbfeval )en d

44


45/59

Figure 7: Points used for the computation of the Jacobian with nite differencesand order 1, order 2 and order 4 formulas.

The previous script produces the following output.O rd er = 1 , F ev al : 11O rd er = 2 , F ev al : 20O rd er = 4 , F ev al : 40

In the following example, we consider a quadratic function in two dimensions.We dene the quadf function, which computes the value of the function and plotsthe input points.

fu nc tio n f = quad f ( x )f = x (1 )^ 2 + x ( 2) ^2plot(x(1)-1 ,x(2)-1 ,"bo")

endfunct ion

The following updateBounds function updates the bounds of the given graphicshandle h. This removes the labels of the graphics: if we keep them, very smallnumbers are printed, which is useless. We symmetrize the plot. We slightly increase

the bounds, in order to make visible points which would otherwise be at the limitof the plot. Finally, we set the background of the points to blue, so that the pointsare clearly visible.

funct ion upda teBounds (h )h c = h . c h il d re nhc .axes_vi s ib le=["o ff " "o ff " , "o ff " ] ;hc .da ta_bounds (1 , : ) = -hc .da ta_bounds (2 , : );hc .da ta_bounds = 1 .1*hc .da ta_bounds ;f or i = 1 : s iz e ( hc . c h il dr en , " * ")

hc .chi ldren( i ) .chi ldren .mark_background=2en d

endfunct ion

45


46/59

Figure 8: Points used in the computation of the Hessian with nite differences andorder 1, order 2 and order 4 formulas. The points clustered in the middle are fromthe numerical Jacobian.

Then, we make several calls to the derivative function, which creates the plotswhich are presented in gures 7 and 8.

/ / See p a tt e rn f or J a co b ia nh = sc f ();J 1 = d e r iv a ti v e ( qu ad f , x , o rd e r =1 ) ;updateBounds(h) ;scf( ) ;J 1 = d e r iv a ti v e ( qu ad f , x , o rd e r =2 ) ;updateBounds(h) ;scf( ) ;J 1 = d e r iv a ti v e ( qu ad f , x , o rd e r =4 ) ;updateBounds(h) ;/ / S ee p at te rn f or H es si an f or o rd er 2h = sc f ();[ J1 , H 1] = d er iv at iv e ( qu ad f , x , or de r =1 ) ;updateBounds(h) ;

h = sc f ();[ J1 , H 1] = d er iv at iv e ( qu ad f , x , or de r =2 ) ;updateBounds(h) ;h = sc f ();[ J1 , H 1] = d er iv at iv e ( qu ad f , x , or de r =4 ) ;updateBounds(h) ;

In the following example, we compute both the gradient and the Hessian matrixin the same case as previously.

f u nc t io n y = my fu n ct i on 3 ( x )g loba l nb feva ln bf ev al = n bf ev al + 1

y = x . * x ;endfunct ion

46


47/59

x = ( 1: 10 ). ;for o = [1 2 4]

g loba l nb feva ln bf ev al = 0 ;[ J , H ] = d e ri v at i ve ( my f un c ti o n3 , x , o rd e r =o ) ; mprintf ( " Order = %d , Feval : %d \n" ,o , nbfeval )

en d

The previous script produces the following output. Notice that, since we computeboth the gradient and the Hessian matrix, the number of function evaluations is thesum of the two, although, in practice, the cost of the Hessian matrix is the mostimportant.

O rd er = 1 , F ev al : 1 32O rd er = 2 , F ev al : 4 20O rd er = 4 , F ev al : 1 64 0

6 One more step

In this section, we analyse the behaviour of derivative when the point x is eitherlarge x , when x is small x 0 and when x = 0. We compare these resultswith the numdiff function, which does not use the same step strategy. As we aregoing to see, both commands performs the same when x is near 1, but perf

numerical derivatives scilab 2

Documents