Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative How to Utilize Sparsity in the Problem Numerical Results Newton and Halley are one step apart Trond Steihaug Department of Informatics, University of Bergen, Norway 4th European Workshop on Automatic Differentiation December 7 - 8, 2006 Institute for Scientific Computing RWTH Aachen University Aachen, Germany (This is joint work with Geir Gundersen) Trond Steihaug Newton and Halley are one step apart
32
Embed
Newton and Halley are one step apart - · Halley’s method (α = 1 2), and Super Halley’s method (α = 1). Trond Steihaug Newton and Halley are one step apart. Methods for Solving
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Newton and Halley are one step apart
Trond Steihaug
Department of Informatics, University of Bergen, Norway
4th European Workshop on Automatic DifferentiationDecember 7 - 8, 2006
Institute for Scientific ComputingRWTH Aachen University
Aachen, Germany
(This is joint work with Geir Gundersen)
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Overview
- Methods for Solving Nonlinear Equations:Method in the Halley Class is Two Steps of Newton indisguise.
- Local Methods for Unconstrained Optimization.
- How to Utilize Structure in the Problem.
- Numerical Results.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
Newton and Halley
A central problem in scientific computation is the solution ofsystem of n equations in n unknowns
F (x) = 0
where F : Rn → Rn is sufficiently smooth.
Sir Isaac Newton (1643 - 1727). Sir Edmond Halley (1656 - 1742).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
A Nonlinear Newton method
Taylor expansion
T (s) = F (x) + F′(x)s +
1
2F′′(x)ss
Nonlinear Newton:Given x . Determine s : T (s) = 0. Update x+ = x + s.Two Newton steps on nonlinear problem T (s) = 0 with s(0) ≡ 0:
T′(0)s(1) = −T (0).
T′(s(1))s(2) = −T (s(1)).
x+ = x + s(1) + s(2).
F′(x)s(1) = −F (x).[
F′(x) + F
′′(x)s(1)
]s(1) = −1
2F
′′(x)s(1)s(1).
x+ = x + s(1) + s(2).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
The Halley Class
The Halley class of iterations (Gutierrez and Hernandez 2001):Given starting value x0 compute
xk+1 = xk−{I+1
2L(xk)[I−αL(xk)]−1}(F ′
(xk))−1F (xk), k = 0, 1, . . . ,
where
L(x) = (F′(x))−1F
′′(x)(F
′(x))−1F (x), x ∈ Rn
Classical methods
Chebyshev’s method (α = 0),Halley’s method (α = 1
2), andSuper Halley’s method (α = 1).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
One Step Halley
This formulation is not suitable for implementation. By rewritingthe equation we get the following iterative method for k = 0, 1, . . .
Solve for s(1)k : F ′(xk)s
(1)k = −F (xk)
Solve for s(2)k :
[F ′(xk) + αF
′′(xk)s
(1)k
]s(2)k = −1
2F′′(xk)s
(1)k s
(1)k
Update the iterate: xk+1 = xk + s(1)k + s
(2)k
A Key Point
One step super Halley (α = 1) is two steps of Newton on thequadratic approximation.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
Super Halley as Two Steps of Newton
Two Steps of Newton is:
Solve for s(1)k : F ′(xk )s
(1)k = −F (xk )
Solve for s(2)k : F ′(xk + s
(1)k )s
(2)k = −F (xk + s
(1)k )
Update the iterate: xk+1 = xk + s(1)k + s
(2)k
One step Super Halley:
Solve for s(1)k : F ′(xk )s
(1)k = −F (xk )
Solve for s(2)k : [F
′(xk ) + F
′′(xk )s
(1)k ]s
(2)k = − 1
2F
′′(xk )s
(1)k s
(1)k
Update the iterate: xk+1 = xk + s(1)k + s
(2)k
1 In addition to F (x), F ′(x) and the solution of two linear systems they require:
- Halley requires F ′′(x)s (+ matrix vector product [F ′′(x)s] s)- Two Steps of Newton requires F ′(x + s) and F (x + s) .
2 All members in the Halley class are cubically convergent.3 Super Halley and two steps of Newton are equivalent on quadratic functions.4 The super Halley method is quartically convergent for quadratic equations (D.
Chen, I. K. Argyros and Q. Qian 1994).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
The Halley ClassMotivation
(Ortega and Rheinboldt 1970): Methods which require second and higher orderderivatives, are rather cumbersome from a computational view point. Note that, while
computation of F ′ involves only n2 partial derivatives ∂jFi , computation of F′′
requiresn3 second partial derivatives ∂j∂kFi , in general exorbiant amount of work indeed.
(Rheinboldt 1974): Clearly, comparisons of this type turn out to be even worse formethods with derivatives of order larger than two. Except in the case n = 1, where allderivatives require only one function evaluation, the practical value of methodsinvolving more than the first derivative of F is therefore very questionable.
(Rheinboldt 1998): Clearly, for increasing dimension n the required computationalwork soon outweighs the advantage of the higher-order convergence.
When structure and sparsity is utilized the picture is very different.Sparsity is more predominant in higher derivatives.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
BasicsComputations with the Tensor
Local Methods for Unconstrained Optimization
The members of the Halley class also apply for algorithms for theunconstrained optimization problem in the general case
minx∈Rn
f (x)
f (x), ∇f (x), ∇2f (x) and ∇3f (x)
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
BasicsComputations with the Tensor
Terminology
Let f : Rn → R be a three times continuously differentiable function. For a givenx ∈ Rn let
gi =∂f (x)
∂xi, Hij =
∂2f (x)
∂xi∂xj, Tijk =
∂3f (x)
∂xi∂xj∂xk.
g ∈ Rn, H ∈ Rn×n, and T ∈ Rn×n×n
H is a symmetric matrixHij = Hji , i 6= j
We say that a n × n × n tensor is super-symmetric when
Tijk = Tikj = Tjik = Tjki = Tkij = Tkji , i 6= j , j 6= k, i 6= k
Tiik = Tiki = Tkii , i 6= k.
We will use the notation (pT ) for the matrix ∇3f (x)p.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
BasicsComputations with the Tensor
Super-Symmetric Tensor
For a super-symmetric tensor we only store 16n(n + 1)(n + 2) elements Tijk for
1 ≤ k ≤ j ≤ i ≤ n. (n = 9).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
BasicsComputations with the Tensor
Computations with the Tensor
The cubic value term pT (pT )p ∈ R:
pT (pT )p =∑n
i=1 pi∑n
j=1 pj∑n
k=1 pkTijk
=∑n
i=1 pi
{[∑i−1j=1 pj
(6
∑j−1k=1 pkTijk + 3pjTijj
)+ 3pi
∑i−1k=1 pkTiik
]+ p2
i Tiii
}
The cubic gradient term (pT )p ∈ Rn:
((pT )p)i =∑n
j=1
∑nk=1 pjpkTijk , 1 ≤ i ≤ n
The cubic Hessian term (pT ) ∈ Rn×n:
(pT )ij =∑n
k=1 pkTijk , 1 ≤ i , j ≤ n
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
BasicsComputations with the Tensor
Computing utilizing Super-Symmetry:H + (pT )
T ∈ Rn×n×n is a super-symmetric tensor.H ∈ Rn×n is a symmetric matrix.Let p ∈ Rn.for i = 1 to n do
for j = 1 to i − 1 dofor k = 1 to j − 1 do
Hij+ = pkTijk
Hik+ = pjTijk
Hjk+ = piTijk
end forHij+ = pjTijj
Hjj+ = piTijj
end forfor k = 1 to i − 1 do
Hii+ = pkTiik
Hik+ = piTiik
end forHii+ = piTiii
end for
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
Induced Sparsity
Definition
The sparsity of the Hessian matrix (Griewank and Toint 1982):
∂2
∂xi∂xjf (x) = 0, ∀x ∈ Rn, and (i , j) ∈ Z
Then
Tijk =∂3f (x)
∂xi∂xj∂xk= 0 for (i , j) ∈ Z or (j , k) ∈ Z or (i , k) ∈ Z.
We say that sparsity structure of the tensor is induced by the sparsity structureof the Hessian matrix.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
Stored Elements: Arrowhead
X XX X
X XX X
X XX X
X XX X
X X X X X X X X X
Stored elements of a 9× 9 arrowhead matrix and the induced tensor.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
Stored Elements: Tridiagonal
Stored elements of a 9× 9 tridiagonal matrix and the induced tensor.
X XX X X
X X XX X X
X X XX X X
X X XX X X
X X
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
General Sparsity
If Z is the set of indices of the Hessian matrix that are 0 and define
N = {(i , j)|1 ≤ i , j ≤ n} \ Z
N is the set of indices for which the elements in the Hessian matrix at x will benonzero.
Since
Tijk = 0, if (i , j) ∈ Z, or (j , k) ∈ Z or (i , k) ∈ Z
we only need to consider the elements (i , j , k) in the tensor, where
(i , j) ∈ N and (j , k) ∈ N and (i , k) ∈ N , 1 ≤ k ≤ j ≤ i ≤ n.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
General Sparsity cont.
In the following we will assume that (i , i) ∈ N . Define
T = {(i , j , k)|1 ≤ k ≤ j ≤ i ≤ n, (i , j) ∈ N , (j , k) ∈ N , (i , k) ∈ N}
Let Ci be the indices below the diagonal nonzero elements in row i of the sparseHessian matrix:
Ci = {j |j ≤ i , (i , j) ∈ N}, i = 1, . . . , n.
ThenT = {(i , j , k)|i = 1, ..., n, j ∈ Ci , k ∈ Ci ∩ Cj}.
For a given (i , j) the indices k so that (i , j , k) ∈ T is called tube (i,j).(Bader andKolda 2004)
A Key Point
The intersection of Ci and Cj defines the tube (i , j).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
SparsityGeneral Sparsity
A general sparse implementation of: H + (pT )
T ∈ Rn×n×n is a super-symmetric tensor.H ∈ Rn×n is a symmetric matrix.Let p ∈ Rn.Let Ci is the nonzero index pattern of row i of the Hessian matrix.for i = 1 to n do
for j ∈ Ci ∧ j < i dofor k ∈ Ci ∩ Cj ∧ k < j do
Hij+ = pkTijk
Hik+ = pjTijk
Hjk+ = piTijk
end forHij+ = pjTijj
Hjj+ = piTijj
end forfor k ∈ Ci ∧ k < i do
Hii+ = pkTiik
Hik+ = piTiik
end forHii+ = piTiii
end for
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Practical Issues
Four implementations:
1 Store k ∈ Ci ∩ Cj .
2 Let k ∈ Cj and if k ∈ Ci .
3 Let k ∈ Cj and Index ik = 0 when k 6∈ Ci and 1 otherwise.
4 Expand storage of tube (i , j) to |Cj |.With these implementations of k ∈ Ci ∩ Cj , k < j there is a tradeoffbetween memory and operations (arithmetic or logical).
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Intersection with and without if: (pT )
Set the elements of t to false.for i = 1 to n do
Compute t(i)k
= true if k ∈ Cifor j ∈ Ci ∧ j < i do
for k ∈ Cj ∧ k < j do
if t(i)k
thenHij+ = pkTijkHik+ = pjTijkHjk+ = piTijk
end ifend forHij+ = pjTijjHjj+ = piTijj
end forfor k ∈ Ci ∧ k < i do
Hii + = pkTiikHik+ = piTiik
end forHii + = piTiii
Reset t(i)k
= false if k ∈ Ciend for
Set the elements of Index to zero.for i = 1 to n do
Compute Index(i)k
= 1 if k ∈ Cifor j ∈ Ci ∧ j < i do
for k ∈ Cj ∧ k < j do
Tijk = Tijk Index(i)k
Hij+ = pkTijkHik+ = pjTijkHjk+ = piTijk
end forHij+ = pjTijjHjj+ = piTijj
end forfor k ∈ Ci ∧ k < i do
Hii + = pkTiikHik+ = piTiik
end forHii + = piTiii
Reset Index(i)k
= 0 if k ∈ Ciend for
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Numerical Results: Computing (pT )p and (pT )
CPU Measurements for the Gradient Term (milliSeconds)
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Analysis
Operations = c1S + c2nnz(T ) + c3nnz(H) + c4n.
where
S =n∑
i=1
∑j∈Ci
|Cj | , nnz(T ) =n∑
i=1
∑j∈Ci
|Ci ∩ Cj | and nnz(H) =n∑
i=1
|Ci |
- For full storage, c1 = 0, c2 = 6. Memory is 2nnz(T ).
- Implementations with if and full storage have the same number of arithmeticoperations.
- For intersection with if c1 = 1, c2 = 6 and memory is nnz(T )
- For Index and x-storage c1 = 6, but memory is nnz(T ) or S ≥ nnz(T )
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
How to Utilize Sparsity in the Problem: A Skyline Matrix
A matrix has a symmetric skyline structure (envelope structure) ifall nonzero elements in a row are located from the first nonzeroelement to the element on the diagonal. Define βi to be the(lower) bandwidth of row i ,
βi = max{i − j | Hij 6= 0 with j < i .}
and define fi to be the start index for row i in the Hessian matrix
fi = i − βi
ThenCj = {k|fj ≤ k ≤ j}
Ci ∩ Cj = {k|max{fi , fj} ≤ k ≤ j}, j ≤ i .
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Skyline implementation of: pT (pT )p
Let T ∈ Rn×n×n be a super-symmetric tensor, let p ∈ Rn be a vector.Let {f1, . . . , fn} be the indices of the first nonzero elements for each row inthe Hessian matrix.Let c, s, t ∈ R be a scalar.for i = 1 to n do
t = 0for j = fi to i − 1 do
s = 0for k = max{fi , fj} to j − 1 do
s+ = pkTijk
end fort+ = pj (6s + 3pjTijj )
end fors = 0for k = fi to i − 1 do
s+ = pkTiik
end forc+ = pi (t + pi (3s + piTiii ))
end for
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Halley and Two Steps of Newton in Review
Two Steps of Newton is:
Solve for s(1)k : F ′(xk )s
(1)k = −F (xk )
Solve for s(2)k : F ′(xk + s
(1)k )s
(2)k = −F (xk + s
(1)k )
Update the iterate: xk+1 = xk + s(1)k + s
(2)k
One step Halley:
Solve for s(1)k : F ′(xk )s
(1)k = −F (xk )
Solve for s(2)k : [F
′(xk ) + αF
′′(xk )s
(1)k ]s
(2)k = − 1
2F
′′(xk )s
(1)k s
(1)k
Update the iterate: xk+1 = xk + s(1)k + s
(2)k
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Computational Requirements
The tensor computations and LDLT decomposition for dense, banded and skyline.
The LDLT decomposition has the same complexity as the tensor operations.The total computational requirements for the Newton’s and super Halley’s method.
Computational Requirements
Method Dense Banded Skyline
Newton 13n3 + 5
2n2 − 5
6n nβ2 + 6nβ + 3n − 2
3β3 − 7
2β2 − 17
6β 2nnz(T ) + 3nnz(H) − 3n
Super Halley 53n3 + 19
2n2 − 7
6n 5nβ2 + 22nβ + 11n − 10
3β3 − 14β2 − 70
6β 10nnz(T ) + 7nnz(H) − 9n
The Halley class and Newton’s method has the same asymptotic upper bound for the
dense, banded and skyline structure.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods
Upper Bound for the Skyline Structure
Theorem
The ratio of the number of arithmetic operations of a method inthe Halley class and Newton’s method is constant per iteration.
flops(One Step Halley)flops(One Step Newton)
≤ 5
when the tensor is induced by a skyline structure of the Hessianmatrix and we use a direct method to solve the systems of linearequations.
(Rheinboldt 1998): Clearly, for increasing dimension n therequired computational work soon outweighs the advantage of thehigher-order convergence
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Test Cases
Chained and Generalized Rosenbrock
Chained Rosenbrock (Toint 1982):
f (x) =n∑
i=2
[6.4(xi−1 − x2i )2 + (1− xi )
2]
Generalized Rosenbrock (Schwefel 1977):
f (x) =n∑
i=2
[(xn − x2i )2 + (xi − 1)2]
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Newton 5 iterationsChebyshev 3 iterationsHalley 3 iterationsSuper Halley 3 iterations
The termination criteria for all methods are if ‖∇f (xk)‖ ≤ 10−8‖∇f (x0)‖.The total CPU time include function, gradient, Hessian and tensor evaluations.
Trond Steihaug Newton and Halley are one step apart
Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization
The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem
Numerical Results
Test Cases
References
B. W. Bader and T. G. Kolda. MATLAB Tensor Classes for Fast Algorithm Prototyping. Technical Report
SAND 2004-5187, October 2004.
D. Chen, I. K. Argyros and Q. Qian. A Local Convergence Theorem for the Super-Halley Method in a
Banach Space. Appl. Math. Lett. Vol. 7, 5, pp. 49-52, 1994.
A. Griewank and Ph. L. Toint. On the unconstrained optimization of partially separable functions. In Michael
J. D. Powell, editor, Nonlinear Optimization 1981, pages 301-312. Academic Press, New York, NY, 1982.
G. Gundersen and T. Steihaug. Sparsity in Higher Order Methods in Optimization. Reports in Informatics
327, Dept. of Informatics, Univ. of Bergen, 2006.
J. M. Gutierrez and M. A. Hernandez. An acceleration of Newton’s method: Super-Halley method. Applied
Mathematics and Computation. 25 January 2001, vol. 117, no. 2, pp. 223-239(17).
H. P. Schwefel. Numerical Optimization of Computer Models. John Wiley and Sons, Chichester, 1981.
Ph.L. Toint. Some numerical results using a sparse matrix updating formula in unconstrained optimization.
Mathematics of Computation, Volume 32, Number 143. July 1978, pages 839-851.
Trond Steihaug Newton and Halley are one step apart