Top Banner
The Newton-Raphson Algorithm David Allen University of Kentucky January 31, 2013
21

The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

Aug 23, 2019

Download

Documents

phamnga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The Newton-Raphson Algorithm

David AllenUniversity of Kentucky

January 31, 2013

Page 2: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

1 The Newton-Raphson Algorithm

The Newton-Raphson algorithm, also called Newton’smethod, is a method for finding the minimum ormaximum of a function of one or more variables. It isnamed after named after Isaac Newton and JosephRaphson.

Back 2

Page 3: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

Its use in statisticsStatisticians often want to find parameter values thatminimize an objective function such a residual sum ofsquares or a negative log likelihood function. As θ is apopular symbol for a generic parameter, θ is used here torepresent the argument of an objective function.Newton’s algorithm is for finding the value of θ thatminimizes an objective function.

Back 3

Page 4: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

SynopsisThe basic Newton’s algorithm starts with a provisionalvalue of θ. Then it

1. constructs a quadratic function with the same value,slope, and curvature as the objective function at theprovisional value;

2. finds the value of θ that minimizes the quadraticfunction; and

3. resets the provisional value to this minimizing value.

If all goes well, these steps are repeated until theprovisional value converges to the minimizing value.

Back 4

Page 5: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

An example with one variableThe next few slides demonstrate repeated applications ofthe steps above for a scalar θ.

Back 5

Page 6: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The First ApproximationThe first approximation is with θ = 0.5.

0.5000

Back 6

Page 7: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The Second ApproximationThe second approximation is with θ = 2.25.

2.2500

Back 7

Page 8: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The Third ApproximationThe third approximation is with θ = 1.5694.

1.5694

Back 8

Page 9: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The Final ApproximationThe estimate of θ is 1.4142.

1.4142

Back 9

Page 10: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

In Matrix NotationLet o(θ) be the objective function to be minimized. Itsvector of first derivatives, called the gradient vector, is

g(θ) =d

dθo(θ)

Its matrix of second derivatives, called the Hessianmatrix, is

H(θ) =d2

dθdθto(θ)

Back 10

Page 11: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The quadratic approximationThe quadratic approximation of o(θ) at θ = θ0 in terms ofthe gradient vector and Hessian matrix is

o(θ) = o(θ0) + gt(θ0)(θ− θ0) +1

2(θ− θ0)tH(θ0)(θ− θ0)

Provided H(θ0) is positive definite, the approximatingquadratic function is minimized by

θ = θ0 −H−1(θ0)g(θ0)

Back 11

Page 12: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

ImplementationThere may be problems with convergence in practice, soNewton’s algorithm must be implemented with controls.Excellent discussions of Newton’s algorithm are given inDennis and Schnabel [1], Fletcher [2], Nocedal andWright [4], and Gill, Murray, and Wright [3].

Back 12

Page 13: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

Minimum or Maximum?By checking second derivatives Newton’s algorithmprovides a definitive check that a minimum, maximum,or saddle point of the objective function is found.

Back 13

Page 14: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

Rosenbrock’s functionThe Rosenbrock function is

100(2 − 21)2 + (1− 1)2.

Rosenbrock’s function is a frequently used test functionfor numerical optimization procedures. Even though it isa simple looking function of two variables, it has somegotchas.

Back 14

Page 15: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

An exercise

Exercise 1.1. Write an R program to apply Newton’smethod to the Rosenbrock function. Do not use built in Rfunctions except for solve. Run your program usingdifferent starting values and observe the results.

Back 15

Page 16: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

2 Least Squares

In situations where the response observations areuncorrelated with equal variances, least squares is thepreferred method of estimation. Let Y represent the thresponse observation and η(θ) its expected value. Hereθ is a vector of parameters that is functionallyindependent of the variance. The residual sum of squaresis

s(θ) =n∑

=1

(Y − η(θ))2 (1)

where n is the number of observations. The least squaresestimate of θ is the value of θ that minimizes s(θ)(assuming the minimum exists).

Back 16

Page 17: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

Derivatives of the residual sum of squaresThe vector of first derivative of s(θ), called the gradientvector, is

g(θ) = −2n∑

=1

(Y − η(θ))d

dθη(θ) (2)

The matrix of second derivatives, called the Hessianmatrix, is

H(θ) = 2n∑

=1

d

dθη(θ)

d

dθtη(θ)−2

n∑

=1

(Y−η(θ))d2

dθ dθtη(θ)

(3)

Back 17

Page 18: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The quadratic approximationThe quadratic approximation of s(θ) at θ = θ0 in terms ofthe gradient vector and Hessian matrix is

s(θ) = s(θ0) + gt(θ0)(θ− θ0) +1

2(θ− θ0)tH(θ0)(θ− θ0)

Newton’s algorithm, with the terms in H(θ) involvingsecond derivatives omitted, is called the Gauss-Newtonalgorithm.

Back 18

Page 19: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

The minimizing valueProvided H(θ0) is positive definite, the approximatingquadratic function is minimized by

θ = θ0 −H−1(θ0)g(θ0)

Back 19

Page 20: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

SummaryIn the preceding, the objective function is the residualsum of squares. The chain rule of differentiation providesformulas needed to calculate quadratic approximation ofthe objective function in terms of derivatives d

dθη(θ) andd2

dθ dθtη(θ). When the η(θ) are components of a solutionof linear differential equations, the partial derivatives canbe calculated by a computer.

In the case of other objective functions, a similar processmust be followed i.e. use the chain rule to findexpressions for g(θ) and H(θ) in terms of d

dθη(θ) andd2

dθ dθtη(θ). Unfortunately, this is sometimes difficult.

Back 20

Page 21: The Newton-Raphson Algorithm - University of Kentuckyblog.as.uky.edu/sta695/wp-content/uploads/2013/01/Newton-Raphson.pdf · 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm,

References

[1] J. E. Dennis, Jr. and Robert B. Schnabel. NumericalMethods for Unconstrained Optimization andNonlinear Equations. Prentice-Hall, Inc., EnglewoodCliffs, New Jersey 07632, 1983.

[2] Roger Fletcher. Practical methods of optimization,volume 1, Unconstrained optimization. John Wiley &and Sons, Ltd., 1980.

[3] Philip E. Gill, Walter Murray, and Margaret H. Wright.Practical Optimization. Academic Press, Inc., 1981.

[4] Jorge Nocedal and Stephen J. Wright. NumericalOptimization. Springer-Verlag New York, Inc., 1999.

Back 21