i I I i j ( j I 1 I SU326 P30-15 THE DlFFERENTlAilON OF PSEUDOINVERSES AND N.ONLlNEAR LEAST SQUARES PROBLEMS WHOSE VARIABLES SEPARATE BY G. H. GOLUBANDV. PEREYRA STAN-B-72-261 FEBRUARY 1972 COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UNIVERS ITY
52
Embed
BY G. H. GOLUBANDV. PEREYRA - Stanford Universityi.stanford.edu/pub/cstr/reports/cs/tr/72/261/CS-TR-72... · 1998. 4. 6. · Pereyra [15]). The well-known reliable linear techniques
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
iI
I
i
j
(
j
I1I
SU326 P30-15
THE DlFFERENTlAilON OF PSEUDOINVERSES AND N.ONLlNEAR LEAST
SQUARES PROBLEMS WHOSE VARIABLES SEPARATE
BY
G. H. GOLUBANDV. PEREYRA
STAN-B-72-261
FEBRUARY 1972
COMPUTER SCIENCE DEPARTMENT
School of Humanities and Sciences
STANFORD UNIVERS ITY
SC526 ~30-15
The Differentiation of Pseudoinverses
and Nonlinear Least Squares Problems
Whose Variables-Separate
PL
G. H. Golub*
and
V. Pereyra*
Computer Science Department, Stanford University, Stanford, California 94305.This work was in part supported by the Atomic Energy Commission.
*Departamento de Computation,Venezuela.
Universidad Central de Venezuela, Caracas,
L
i ABSTRACT
For given data (ti , y.) , i+...,m ,1 we consider the least squares fit
of nonlinear models of the form
c
LFor this purpose we study the minimization of the nonlinear functional
.L - It is shown that by defining the matrix (#(z)]i j = cp.(~ ; t.) , and the
modified functional r2(z) = 11 E - @(&+(a) y 11; ,J" i
N -N it is possible to
optimize first with respect to the parameters z, and then to obtain, a
posteriori, the optimal parameters 8.H The matrix r'(g) is the Moore-
Penrose generalized inverse of O(z) , and we develop formulas for its
Frechet derivative under the hypothesis that 4(z) is of constant (though
not necessarily full) rank. From these formulas we readily obtain the deri-
vatives of the orthogonal projectors associated with O(g) , and also that
of the functional32 C)a . Detailed algorithms are presented which make exten-
sive use of well-known reliable linear least squares techniques, and numerical
results and comparisons are given. These results are generalizations of those
of H. D. Scolnik [20].
1. Introduction
The least squares fit of elcperimental data is a common tool in many
applied sciences and in engineering problems. Linear problems have been well-
studied, and stable and efficient methods are available (see for instance:
Bjb'rck and Golub [3], Golub[8]),
Methods for the nonlinear problems fall mainly in two categories:c
(a) general minimization techniques; (b) methods of Gauss-Newton type. The
latter type of method takes into consideration the fact that the functionali
to be minimized is a sum of squares of functions (cf. Daniel [ 51, Osborne [l&j,
Pereyra [15]). The well-known reliable linear techniques have been used
Lmainly in connection with the successive linearization of the nonlinear models.
Very recently it has been noticed that by restricting the class of models to
t be treated, a much more significant use of linear techniques can be made (cf.
r P, 9, 12, 13, 17, 201).L.
L
In this paper we consider the following problem. Given data (t i9 Yi > 9
first, in order to obtain the optimal parameters g , and then complete
the optimization according to our explanation in Section 2. The algorithms
differ in the procedure used for the m&&.zation of r2(z).
c Al. Minimization without derivatives. We use PRAXIS, a FORTRAN version of
ca 'program developed by R.-Brent [4], who very kindly made it available to us.
All that PRAXIS essentially requires from the user is the value of the functional
for any a! .N This is computed using the results of Section 3. In fact, the
L user has only to give code for filling the matrix @ for any 2, and our
program will effect the triangular reduction and so on. It turns out that
many times (see the examples) the models have some terms which are exclusively
linear, i.e., functions cp.Jwhich are independent of Q) . Those functionsN
.
produce columns in H(Q)rV which are constant throughout the process. If they
are considered first, then it is possible to reduce them once and for all,
saving the repetition of computation. This is done in our program.
A2. Minimization by Gauss-Newton with control of step (see (5.2)).
The user is required to provide the incidence matrix E and the array
of functionscpj
and non-vanishing partial derivatives: G . See Section 5
for a more detailed description.
5. Minimization by Marquardt's modification, as explained in Section 5 with
Fl =I. User supplied information is the same as in A2.
c
L.
i
i
1i
IL
Test problems. Problems 1 and 2 are taken from Osborne [lb], where the
necessary data can be found.
Pl. Exponential fitting. The model is of the form:
-a th, (2, E; t) = al + a2e ’
C." r+ a e
-cY2t3 .
The functions cpi are obviously Q, (z ; t) s 1, tp+ &; t) = e-Qljt
9 j=1,2 .
So the different constants, in the notation of Section 2 are: n=3, s=3, k=2 .
For the problem considered, m=33. The number g constant functions: NCF = 1 .
The number of non-vanishing partial derivatives: p--2 .
In Table I we compare
obtained by minimizing the
our results for methods Al, A2, A3, and those
full functional r(2 E) .
P2. Fitting Gaussians with an exponential background.
-a th2(z, E; t) = ale ' + a2e
-a2(t-cu5)2+ a e
-a$t-a6)2
3+ a e
-a4(t-a7)2
4 .
The functions Q. are:J
-cY t
cp&;t)=e ’ ; Qj(g; t) = e-cyj (t-a. 2
J+3)
9 j=2,3,4 .
Thus: n=4, s&, k=7, m=65 , p=7 .
Results for this problem appear in Table II.L
P3. - Iron M'dssbauer Spectrum with-two. sites of different electric field gradient
and one single line [21].
-28-
L,
i
‘L
t
i
L
. .
The model here is the following:
h3(a, cr; t) = a1 + a2t + a3t2cy(v
r 1
- a
4
I 11+ I
aIs +o.5cy2 -t2 +
I1 +
1
Q), -o.5cu2 -t 2
- - -cy3 a3 11
+ 1
1+ I @+- o&t -t 2
\ ‘6I.5a-0lI
.
Clearly, Qj(~; t) = tJ 9 j=l&5 ; and CQ t Q5 f cp6 are the functions
inside the square brackets.
Here: n=6, k=8, NCF=3, p=8, m=188, s=6 .
For this example we wish to thank Dr. J. CL Travis of NBS who kindly
supplied the problem and results from his own computer program.
Comparisons are offered in Table III.
The qualitative behavior of the three different minimization procedures
used in our computation follows the pattern that have been expounded in recent
comparisons (Bard [l]). Gauss-Newton is fastest whenever it converges from
a good initial estimate. As is shown in the fitting of Gaussians (Table II),
if the problem is troublesome, then a more elaborate strategy is called for.
Brent's program has the advantage of not needing derivatives, which in this
case leads to a big simplification. On the other hand, it is a very conservative
program which really tries to obtain rigorous results. This, of course, can
lead to a long search in cases where it is not entirely justified.
As a consequence of our Theorem 2.1, and of our numerical exptl ace, we
strongly recommend, even in the case when our procedure is noi, used, to obtair:
-29-
L-
e
initial values for the linear parameters when gj($ = aj by setting
a0 = P+(g >Y l This is done in our program for the full functional and in
the program of Travis with excellent results.
The computer times shown in Table I and Table II correspond to the CPU
times (execution of the object code) on an IBM 360/50. All calculations
were performed in long precision; viz. 14 hexadecimal digits in the mantissah p9"c
of each number. We compare the results of minimizing the reduced functional
when the Variable Projection (VP) technique is used with that of minimizing
the full functional (FF) for various minimization algorithms. In order to
eliminate the coding aspect, we have used essentially the same code for
minimizing the two functionals. The only difference was in the subroutine
DPA which computes in both cases the Jacobian of the residual vector.
In the FF approach, the subroutine DPA computed the m x (n+k) matrix
B as follows: the first n columns consisted of the vectors ~~(5) while
the remaining columns were the partial derivatives
$ (Y - @(cy)a) = -NNa
jfI, aj p 9 (l=l,z,...,k) .=
a
These derivatives were constructed using the same information provided by the
user subroutine ADA. We also obtained from DPA in the FF case, the automatic
initialization of the linear parameters, viz. a0 =@ +(a")~ .N N
For the numerical examples given here, the cost per iteration was somewhat
higher for the VP functional. However, we see that in some cases there has
been a dramatic decrease in the number of iterations; this has been observed
previously (cf. [El). Thus, in these cases the total computing time is much
more favorable for the VP approach. This was especially true for a.,. three
-3o-
c,
methods of minimization when the exponential fit was made and when
Marquardt% method was used in the 6ssbauer spectrum problem,
l?or the Mossbauer spectrum problem, we used two sets of initial
values. We used those given by Travis [21], (say) !$', and also
p 4” 2 0.05 8'. ForN
8" ,N
the value of the functional is 3.04467 X 108
while for r, the value of the functional is 6.405 x 108 ; the final
estimates of the parameters yielded a residual sum of squares less than
3.0444 x lo8 . When Brent's method was used on the full functional,
the method did not seem to converge, but for the reduced functional,
Brent's method converged reasonably well, In fact, after twenty minutes
Brent's algorithm applied to the full functional with p" did notN
achieve the desired reduction in the functional.
The results we have obtained in minimizing the full functional for the
tfirst two problems using the Marquardt method, and those of problem 3 with
Newton's method and &, are consistent with the results reported by Osborne‘L
LI
and Travis.
From a rough count of the number of arithmetic operations (function and
derivative evaluation per step are the same for both procedures, so that the
work they do can be disregarded), it seems that for almost no combination of
the parameters (m, n, k, p) the VP procedure will require fewer operations
per iteration than the FF procedure. It is an open problem then to determine
2 priori under what conditions the VP procedure will converge more quickly
than the FF procedure when minimization algorithms using derivatives are used.
Another important problem is that of stability, The numerical stability
of the process and of the attained solution must be studied. By insisting on
the use of stable linear techniques, we have tried to achieve an overall
numerically stable procedure for this nonlinear situation. Since the standards
-3l-
L
c
L
of stability for non-linear problems are ill-defined at' this time, it is
hard to say whether we have succeeded in obtaining our goal.
Table I
Exponential fit.
Method FunctionalNumber ofFunction
Evaluations
Number of TimeDerivative (seconds)Evaluations
FF 1832 191.00Al
VP *..- mo 9.00
FF 11 11 5.05*2
VP 4 4 3.20
FF 32 26 12.55
*3 VP 4 4 3.12
r(k, 8) ., r2@ 5 0.5465 x 10-4NN
Table II
Gaussian fit.-_I_
Method FunctionalNumber ofFunction
Evaluations
Number ofDerivative TimeEvaluations (seconds)
FF 11 9 23.35A3 VP 10 8 26.82
Methods Al and A2 were either slowly convergent or non-convergent.
-330
i
i
1c
L
Table 111
M&sbauer Iron Spectrum.C
MethodNumber of
FunctionalNumber of
Initial Function DerivativeValues
TimeEvaluations Evaluations (seconds)
Al
FF
VP
0
I! *
g 65 0 70.00
I
FF 4A2 &O 4 34.34
VP0
& 4 4 41.64FF 1
0
7 7 52.27
VP0
z 6 6 59e 60
FF 16A3 &O 0 16 118.22
VP & 3 3 35.35FF
VP
0
rfl,
r18
6
18
6
130.50
61.92
r($ $ Y r2(&) 5 3.0444 x 108
( ss”= (so, 49, 5, 81, 24, 9*5, 100, 41T >* Did not converge in finite amount of time.
-34-
c
i
L
Ii
REFERENCES
1. Bard, Yonathan, "Comparison of gradient methods for the solution ofnonlinear parameter estimation problems", SIAM J. Numer. Anal. 7pp. 157-186 (140). -’
2. Barrodale, I., F. D. K. Roberts, and C, R. Hunt, "Computing bestdp approximations by functions nonlinear in one parameter", Comp.J. 2, pp. 382-386 (1970). a 3%
3. Bjijrck, A., and G. H. Golub, "Iterative refinement of linear leastsquares solutions by Householder transformations", BIT '7, pp. 322=3y,0967)*
4. Brent, Richard P., "Algorithms for finding zeros and extrema of func-tions without calculating derivatives"Science Report STAN-W-71498 (lgi'l).
, Stanford University, Computer
5. Daniel, J. W., The Approximate Minimization of Functionals, PrenticeHall, New York, (141).
6. Dieudor&, J .,York, (1960).
Foundations of Modern Analysis, Academic Press, New
7* Fletcher, R. and Shirley A. Lill, "A class of methods for non-linearprogramming. II computational experience", in Nonlinear Programmiq(ed. by J. B, Rosen, 0. L. Mangasarian, and K. Ritter), pp. 67-92,Academic Press, New York (190).
8. Golub, Gene H., "Matrix decompositions and statistical calculations"in Statistical Computation (edited by Roy C. Milton and John A. Neldgr),PP. 365-3979 Academic Press, New York (1969).
9. Guttman, I., V. Pereyra, and H. D. Scolnik, "Least squares estimationfor a class of nonlinear models", Centre de Rech. Math., U. de Montreal(Jan. 191). To appear in Technometrics,
>
10. Hanson, Richard, J. and Charles-L. Lawson, "Extensions and applica-tions of the Householder algorithm for solving linear least squaresproblems", Math. Comp. 23- > PP* 787-812 (1969).
11. Jennings, L. S., and M. R. Osborne, "Applications of orthogonalmatrix transformations to the solution of systems of linear and non-linear equations",Univ. (140).
Techn. Rep. No. 37, Computer C., Australian Nat.
l2. Lawton, William H. and E. A. Sylvestre, 'Elimination of linear para-meters in nonlinear regression" , Technometrics 13, pp. 46~467 (141).
13. Osborne, M. R., "A class of nonlinear regression problems" in DataRepresentation, (R. S. Anderssen and M. R. Osborne, editors), pP.94-101w370) l
-35-
-
c
L
i
14 .
15 l
16.
17 l
18.
1.9 l
20.
21.
Osborne, M. R., "Some aspects of nonlinear least squares calculations',unpublished manuscript (Nov. 191).
Pereyra, V., "problems"
Iterative methods for solving nonlinear least squares, SIAM J. Numer. Anal. 4, pp. 27-36 (1967).
Pereyra, V., "Stability of general systems of linear equations',Aequationes Math. 2, pp. 194-206 (1969).
Pgrez, A., and H. D. Scolnik, "Derivativesstrained non-linear regression problems",
of pseudoinverses and con-to appear in Numerische Mathematik.
-- B
Peters, G. and J. H. Wilkinson, "inverses"
The least squares problem and pseudo-, The Comp. J. I& pp. 309-316 (ly?O).
Rao, Radhakrishna C ., and Sujit Kumar Mitra, Generalized Inverse ofMatrices and its Applications, Wiley, New York (141).
Scolnik, H. D., " On the solution of nonlinear least squares problems',Proc. IFIP-71, Numerical Math., pp. 18-23 (lpIl), North-HollandPub. Co., Amsterdam. Also Ph.D. thesis, U. of Zurich (190).
Travis, J. C., Radiochemical Analysis Section, Tech. Note No. 501, Nat.Bureau of Standards, pp. 19-33, Washington, D.C. (1970).
-36-
c
-_1_- - --~ -- --_ __---II__ _ . ..-.a
Key Words
Pseudoinverse
Nonlinear Least Squares
Fr&het Derivative
Projectors
Orthogonalization
i
L
-37-
VAKPRO I N P R O B L E M 1 : F I T T I N G O F T W O
t-:XT tHNA1. A D ANi;NLil\irA~+ LiiAST S Q U A R E S P R O G R A M F O R L I N E A R CCHWINATIONS O F N O N L I N E A RFUNI;TIUM5.wKITTi3 I N Ftitc’T&;;Al\i 4 - L E V E L G . I N T H I S S U B R O U T I N E T H E R E A R E W R I T ESTATkMf:tiTS U S I N G U N I T 3 A S O U T P U T . T H A T U N I T N U M B E R I S I N S T A L L A T I O NDE ~kitii)tGdT.kJhIHtL~TIGN 3Y OSdOKNE-MARQUAKQT ALGORITHM (OR G A U S S - N E W T O N W I T H STEPCiNTwL dY kAKIrVG 1‘i-IE S M A L L CHANGkS I N D I C A T E D I N T H E S E C O N D L I N E A F T E RINSTI~JCTIUN LAbtLtD 5, A N D A F T E R L A B E L 61).5tt @ JHF I~IFFiRENTIATIUh O F P S E U D O I N V E R S E S A N D N O N L I N E A R L E A S T S Q U A R E SPk4,:‘L t&S titiuSti VAK IABL E S S E P A R A T E ’ 5Y G E N E H . GOLUB A N D V.PEREYRAtSTAN-UG.2.L U. TECHN. AEf'. 261,MARCH 1 9 7 2 .f,’ = !\CIM II Cl4 i-!f 03St:RVATI ONS.h = !d.JM!s f- K \,,F FUNCT I MS.k!; = &tfMi’ri.~ tjF NUNLINFAR VARiAbLES,lULt-ut\ = hUMt.JfiK :JF CilNSTANT FUNCTIQNS, I .E . F U N C T I O N S P H I W H I C H D O N0T
bkYf!ND UPOK A N Y P A R A M E T E R S ALPHA.THEY SHWJLD A P P E A R F I R S T .Y = M- VtCTtlR rlF ObSkRVATIONS.T = .‘j-VfCTr !ti i.)F INDEUENUENT V A R I A B L E ,ia = (N*KGl IIVCWENCE M A T R I X . E(I.3) = 1 I F F VARIA8LE 3 A P P E A R S I N
F:JNCl’iW I l P = S U M O F E&J).!\L f‘ :: fit;-Vki;Tdi-‘\ O F INITII1L V A L U E S . O N O U T P U T I T W I L L C O N T A I N
THt- O P T I M A L V A L U E S OF T H E N O N L I N E A R P A R A M E T E R S .c\(; z ‘\I - VkCTW tjf L IS\lEA% P A R A M E T E R S ( O U T P U T ) .
:3 ,;: 8 4: ;‘; .& j& Q f * .L ..b 4.-.~~~,.4~4P44444848444*44*44444***4444444444*4*4*44**~4*4*44*4444444444CUh T I >JtJfi
- ^Ttdt- ilS t-i: WST PRGVIl)E A SU3KOUTfNii T H A T F O R G I V E N A L F W I L L E V A L U A T E;tW[UNCTfWS Pi-t1 ANb T H E I R YAKTIAL DERWATIVES 0 PHIII)/D ALFtJ), AT THE
,
t-LL QUltiTS T . THE VEC;TOK SAMPLED FUNCTION PHI (I 1 SHOULD BE STORED INI-Tt-1 WLUMN Ok THk (t4 X W+N+lH M A T R I X A . T H E NONLERO
d, ti IVI’IT~V~~S Cr3LUfW ViZCTCjKS S H O U L D BE S T O R E D SEQUENTIALLV I N T H E MAT&IX A51’Ar\:IIW I& THt C O L U M N N+2, I F ITER=O (THE F IRST TIME THIS S U B R O U T I N E IS~;ALLkG)rTHc PIATKIX E SHClULD B E F I L L E D . UITH T H I S M A T R I X T H E S T O R A G E isFTtic. iXKIVuTiVt,S IS FXPLAINED IN T H E F O L L O W I N G C O D E :L = iti+I.!)U Ii1 .I=1 ,KG
-39-
I;.:
- c Ub A.0 I=l,lVc I F (E(I,3))10,1~,11c Ll L=L+l
- c Dtl 10 K=l,Mc A(K.1)
I
‘=’ D P H I ( I ) / D ALF(3) (T(K))c 1iJ Ccx4TIWEii T id t. N+l-TH C0LUMN f1F
i cA IS RESERVED FOR THE VECTOR OF: QBSERVATIONS Ye
1 cTHE SlJBHC)UTINti HtAUING S H O U L D B E : ( I S E L = 0ISliL = -1 :
: FUNCT. AND DER. MUST B E CWlP
i LONLY FUNCTIONS MUST BE COMP. ISEL = li t U4l.Y DER. N E C E S S A R Y )
t
: \ L”
cSUUfGBJTIfW ADA(N,Y,KG,A,E,ITER,P,l,ALF,ISEL)
ccc
(ITi% IS Ah ITEHATION C O U N T E R PROVIOED B Y VARPRO).
cIf IS ASSUMB THAT THE MA’TRIX PHI (ALPHA) HAS A C W A Y S FULL COLUMN RANK
I T@K=GC;+r***TH~ THKtTE FclLLOWING PARAMETERS ARE USED IN THE CONVERGENCE TEST t6ETWEENc INSTKUCTIUNS N U M B E R 200 A N D 400): EPSl I S A RECATfVE TULERANCE FOR
- c DlFFtrfiENCE B E T W E E N TWU C O N S E C U T I V E R E S I D U A L S ; I-WAX IS T H E MAXIWHC T H E S I Z E :JF T H E C O R R E C T I O N . fPS2 IS A R E L A T I V E TCMXRANCE F O R T H Ec
LnlUMyll~cli4 tif; FUNiL;TI9N AND D E R I V A T I V E EVALUATIUNS ALLUYIEO.I IHAX=5uiivs I= lL*D-4
5G ‘3 R E T U R N1 0 3 FORlfATWW,’ ITER=‘,I3,’ QARAMETERS'Jla4 FORMATWiO,’ RESIDUAL',IS,D15.7) _103 WRMAT l H-IO, ’ NU=@,D15,7J136 FOKHAJ~lt-iO,’ N U W A S IhjCREASED TO’,D15.7)1 3 7 F:IC;rMAT( lt40,15,’ NEW H&iSIDlJAC',Dl!LtJ112 3 WRhAJ(lhO,’ Ttiii NORM OF THE RELATlVf CORRECTION IS=',D15.3)2 0t.J FC;RMAT( 1i-46, I S , ’ N U IS',D15.7)239 f!~)lidAT ( lHO,50( * *@ ) 1.a3 WkMATi(t&‘ WtIGHTS'//t4015.7~)215 hJRMAT( ltid, N O N L I N E A R PARAWETERS'//~40~3.7~~ilo ii FtitWAT( IH0.402U.10 1
C****WUMPUlATION OF THE L)ERIVATIVE OF THE VARIABLE l’R03ECTION.IMPLICIT KEACWHA-H,O-2)Cf!Mdtl~J A(%00,2ti),AA(200,10),&(20,20J ,Bt220,20) ,UKK(200),
* ~fzi-A(IL0) ,QIhTtGE.2 PUIIUIFNSIQN ALF(KG~,Z(lZO),X(20J,U~2Om2O~gY~hl),T(W)EXTtKfuAL A D ACALL A~A(N,M,KG,A,E,ITER,P,T,A~F,~SE~JNl=N+l1\1;;1= 1IFWZL .GJ. 0160 TO 111ZFWEK .GT. G)N2=NCFUN+l90 11u I=l,HIJ(j 11-f) JzrV2.N
1 1 Cl AM I,J)=ii( I,J)C*****ticbULTIbN uf A T O TRIAMGULAR F O R M , C O M P U T A T I O N O F V=OY, A N Dc StLECTIVt COw'UTATIUN OF QB ACCORDING TO VALUE OF ISEL.
1 1 1 01J iij3 I=l,NIli=I+l/FiISLC .Gf. 0) G O T O 2 2
-
-43.
I+( ITEK .GT. 0 .ANU. I .Lit. NCFUN)GO T O 7Sd=lA=!3.ixl 1 1 Xl=I,M
11 G’~A-~,X~A+~~ 11, I )**lLd’W=3SJRTtSGMA)1t3N1,1))12,12,13
CSUf3fiiJuTXNk ADA(N,M,KG,A,E,ITER,PpT,ALF,ISELjOSLGRNE ’ s E X P O N E N T I A L FITTING.TWO EXPONENTIALS A N D CONSTAW Tt?‘R#,I M P L I C I T REALWtA-H,Q-Z1
’ -45.
k.. DU 4 I=l,M4 A(Irl)=i.ODd
5 IFtISEL .M. ObGO T O 1 6DU 1 0 1=1&lA(I,2)=DEXP(-ALF(L+l)*T(~~)