Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing: An Introductory Survey Chapter 6 – Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial, educational use only. Michael T. Heath Scientific Computing 1 / 74
74
Embed
Scientific Computing: An Introductory Surveyjiao/teaching/ams527_spring13/...Michael T. Heath Scientific Computing 3 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
DefinitionsExistence and UniquenessOptimality Conditions
Uniqueness of Minimum
Set S ⊆ Rn is convex if it contains line segment betweenany two of its points
Function f : S ⊆ Rn → R is convex on convex set S if itsgraph along any line segment in S lies on or below chordconnecting function values at endpoints of segment
Any local minimum of convex function f on convex setS ⊆ Rn is global minimum of f on S
Any local minimum of strictly convex function f on convexset S ⊆ Rn is unique global minimum of f on S
DefinitionsExistence and UniquenessOptimality Conditions
Sensitivity and Conditioning
Function minimization and equation solving are closelyrelated problems, but their sensitivities differ
In one dimension, absolute condition number of root x∗ ofequation f(x) = 0 is 1/|f ′(x∗)|, so if |f(x̂)| ≤ ε, then|x̂− x∗| may be as large as ε/|f ′(x∗)|
For minimizing f , Taylor series expansion
f(x̂) = f(x∗ + h)= f(x∗) + f ′(x∗)h + 1
2 f ′′(x∗)h2 +O(h3)
shows that, since f ′(x∗) = 0, if |f(x̂)− f(x∗)| ≤ ε, then|x̂− x∗| may be as large as
√2ε/|f ′′(x∗)|
Thus, based on function values alone, minima can becomputed to only about half precision
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Unimodality
For minimizing function of one variable, we need “bracket”for solution analogous to sign change for nonlinearequation
Real-valued function f is unimodal on interval [a, b] if thereis unique x∗ ∈ [a, b] such that f(x∗) is minimum of f on[a, b], and f is strictly decreasing for x ≤ x∗, strictlyincreasing for x∗ ≤ x
Unimodality enables discarding portions of interval basedon sample function values, analogous to interval bisection
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Golden Section Search
Suppose f is unimodal on [a, b], and let x1 and x2 be twopoints within [a, b], with x1 < x2
Evaluating and comparing f(x1) and f(x2), we can discardeither (x2, b] or [a, x1), with minimum known to lie inremaining subinterval
To repeat process, we need compute only one newfunction evaluation
To reduce length of interval by fixed fraction at eachiteration, each new pair of points must have samerelationship with respect to new interval that previous pairhad with respect to previous interval
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Golden Section Search, continued
To accomplish this, we choose relative positions of twopoints as τ and 1− τ , where τ2 = 1− τ , soτ = (
√5− 1)/2 ≈ 0.618 and 1− τ ≈ 0.382
Whichever subinterval is retained, its length will be τrelative to previous interval, and interior point retained willbe at position either τ or 1− τ relative to new interval
To continue iteration, we need to compute only one newfunction value, at complementary point
This choice of sample points is called golden sectionsearch
Golden section search is safe but convergence rate is onlylinear, with constant C ≈ 0.618
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Successive Parabolic InterpolationFit quadratic polynomial to three function valuesTake minimum of quadratic to be new approximation tominimum of function
New point replaces oldest of three previous points andprocess is repeated until convergenceConvergence rate of successive parabolic interpolation issuperlinear, with r ≈ 1.324
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Newton’s MethodAnother local quadratic approximation is truncated Taylorseries
f(x + h) ≈ f(x) + f ′(x)h +f ′′(x)
2h2
By differentiation, minimum of this quadratic function of h isgiven by h = −f ′(x)/f ′′(x)
Suggests iteration scheme
xk+1 = xk − f ′(xk)/f ′′(xk)
which is Newton’s method for solving nonlinear equationf ′(x) = 0
Newton’s method for finding minimum normally hasquadratic convergence rate, but must be started closeenough to solution to converge < interactive example >
Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method
Safeguarded Methods
As with nonlinear equations in one dimension,slow-but-sure and fast-but-risky optimization methods canbe combined to provide both safety and efficiency
Most library routines for one-dimensional optimization arebased on this hybrid approach
Popular combination is golden section search andsuccessive parabolic interpolation, for which no derivativesare required
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Newton’s Method, continued
In principle, line search parameter is unnecessary withNewton’s method, since quadratic model determineslength, as well as direction, of step to next approximatesolution
When started far from solution, however, it may still beadvisable to perform line search along direction of Newtonstep sk to make method more robust (damped Newton)
Once iterates are near solution, then αk = 1 should sufficefor subsequent iterations
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Trust Region Methods
Alternative to line search is trust region method, in whichapproximate solution is constrained to lie within regionwhere quadratic model is sufficiently accurate
If current trust radius is binding, minimizing quadraticmodel function subject to this constraint may modifydirection as well as length of Newton step
Accuracy of quadratic model is assessed by comparingactual decrease in objective function with that predicted byquadratic model, and trust radius is increased ordecreased accordingly
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Quasi-Newton Methods
Newton’s method costs O(n3) arithmetic and O(n2) scalarfunction evaluations per iteration for dense problem
Many variants of Newton’s method improve reliability andreduce overhead
Quasi-Newton methods have form
xk+1 = xk − αkB−1k ∇f(xk)
where αk is line search parameter and Bk is approximationto Hessian matrix
Many quasi-Newton methods are more robust thanNewton’s method, are superlinearly convergent, and havelower overhead per iteration, which often more than offsetstheir slower convergence rate
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Secant Updating Methods
Could use Broyden’s method to seek zero of gradient, butthis would not preserve symmetry of Hessian matrix
Several secant updating formulas have been developed forminimization that not only preserve symmetry inapproximate Hessian matrix, but also preserve positivedefiniteness
Symmetry reduces amount of work required by about half,while positive definiteness guarantees that quasi-Newtonstep will be descent direction
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
BFGS Method, continued
In practice, factorization of Bk is updated rather than Bk
itself, so linear system for sk can be solved at cost of O(n2)rather than O(n3) work
Unlike Newton’s method for minimization, no secondderivatives are required
Can start with B0 = I, so initial step is along negativegradient, and then second derivative information isgradually built up in approximate Hessian matrix oversuccessive iterations
BFGS normally has superlinear convergence rate, eventhough approximate Hessian does not necessarilyconverge to true Hessian
Line search can be used to enhance effectivenessMichael T. Heath Scientific Computing 46 / 74
Increase in function value can be avoided by using linesearch, which generally enhances convergence
For quadratic objective function, BFGS with exact linesearch finds exact solution in at most n iterations, where nis dimension of problem < interactive example >
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Conjugate Gradient Method
Another method that does not require explicit secondderivatives, and does not even store approximation toHessian matrix, is conjugate gradient (CG) method
CG generates sequence of conjugate search directions,implicitly accumulating information about Hessian matrix
For quadratic objective function, CG is theoretically exactafter at most n iterations, where n is dimension of problem
CG is effective for general unconstrained minimization aswell
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Truncated Newton Methods
Another way to reduce work in Newton-like methods is tosolve linear system for Newton step by iterative method
Small number of iterations may suffice to produce step asuseful as true Newton step, especially far from overallsolution, where true Newton step may be unreliableanyway
Good choice for linear iterative solver is CG method, whichgives step intermediate between steepest descent andNewton-like step
Since only matrix-vector products are required, explicitformation of Hessian matrix can be avoided by using finitedifference of gradient along given vector
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Nonlinear Least Squares
Given data (ti, yi), find vector x of parameters that gives“best fit” in least squares sense to model function f(t, x),where f is nonlinear function of x
Define components of residual function
ri(x) = yi − f(ti,x), i = 1, . . . ,m
so we want to minimize φ(x) = 12r
T (x)r(x)
Gradient vector is ∇φ(x) = JT (x)r(x) and Hessian matrixis
Hφ(x) = JT (x)J(x) +m∑
i=1
ri(x)Hi(x)
where J(x) is Jacobian of r(x), and Hi(x) is Hessian ofri(x)
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Gauss-Newton Method, continued
Gauss-Newton method replaces nonlinear least squaresproblem by sequence of linear least squares problemswhose solutions converge to solution of original nonlinearproblem
If residual at solution is large, then second-order termomitted from Hessian is not negligible, and Gauss-Newtonmethod may converge slowly or fail to converge
In such “large-residual” cases, it may be best to usegeneral nonlinear minimization method that takes intoaccount true full Hessian matrix
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Levenberg-Marquardt Method
Levenberg-Marquardt method is another useful alternativewhen Gauss-Newton approximation is inadequate or yieldsrank deficient linear least squares subproblem
In this method, linear system at each iteration is of form
(JT (xk)J(xk) + µkI)sk = −JT (xk)r(xk)
where µk is scalar parameter chosen by some strategy
Corresponding linear least squares problem is[J(xk)√
µkI
]sk∼=[−r(xk)
0
]With suitable strategy for choosing µk, this method can bevery robust in practice, and it forms basis for severaleffective software packages < interactive example >
DefinitionsExistence and UniquenessOptimality Conditions
Constrained OptimalityIf problem is constrained, only feasible directions arerelevant
For equality-constrained problem
min f(x) subject to g(x) = 0
where f : Rn → R and g : Rn → Rm, with m ≤ n, necessarycondition for feasible point x∗ to be solution is that negativegradient of f lie in space spanned by constraint normals,
−∇f(x∗) = JTg (x∗)λ
where Jg is Jacobian matrix of g, and λ is vector ofLagrange multipliers
This condition says we cannot reduce objective functionwithout violating constraints
DefinitionsExistence and UniquenessOptimality Conditions
Constrained Optimality, continued
If inequalities are present, then KKT optimality conditionsalso require nonnegativity of Lagrange multiplierscorresponding to inequalities, and complementaritycondition
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Sequential Quadratic Programming
Foregoing block 2× 2 linear system is equivalent toquadratic programming problem, so this approach isknown as sequential quadratic programming
Types of solution methods include
Direct solution methods, in which entire block 2× 2 systemis solved directlyRange space methods, based on block elimination in block2× 2 linear systemNull space methods, based on orthogonal factorization ofmatrix of constraint normals, JT
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Inequality-Constrained Optimization
Methods just outlined for equality constraints can beextended to handle inequality constraints by using activeset strategy
Inequality constraints are provisionally divided into thosethat are satisfied already (and can therefore be temporarilydisregarded) and those that are violated (and are thereforetemporarily treated as equality constraints)
This division of constraints is revised as iterations proceeduntil eventually correct constraints are identified that arebinding at solution
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Penalty Methods
Merit function can also be used to convertequality-constrained problem into sequence ofunconstrained problems
If x∗ρ is solution to
minx
φρ(x) = f(x) + 12 ρ g(x)T g(x)
then under appropriate conditions
limρ→∞
x∗ρ = x∗
This enables use of unconstrained optimization methods,but problem becomes ill-conditioned for large ρ, so wesolve sequence of problems with gradually increasingvalues of ρ, with minimum for each problem used asstarting point for next problem < interactive example >
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Barrier MethodsFor inequality-constrained problems, another alternative isbarrier function, such as
φµ(x) = f(x)− µ
p∑i=1
1hi(x)
or
φµ(x) = f(x)− µ
p∑i=1
log(−hi(x))
which increasingly penalize feasible points as theyapproach boundary of feasible regionAgain, solutions of unconstrained problem approach x∗ asµ → 0, but problems are increasingly ill-conditioned, sosolve sequence of problems with decreasing values of µBarrier functions are basis for interior point methods forlinear programming
Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization
Linear Programming, continued
Simplex method is reliable and normally efficient, able tosolve problems with thousands of variables, but canrequire time exponential in size of problem in worst case
Interior point methods for linear programming developed inrecent years have polynomial worst case solution time
These methods move through interior of feasible region,not restricting themselves to investigating only its vertices
Although interior point methods have significant practicalimpact, simplex method is still predominant method instandard packages for linear programming, and itseffectiveness in practice is excellent