STANFORD ARTIFICIAL INTELLIGENCE PROJECT MEMO A IM-137 COMPUTER SCIENCE DEPARTMENT REPORT NO. CS-186 AN EMPIRICAL STUDY OF FORTRAN PROGRAMS -BY Lo DONALD E. KNUTH COMPUTER SCIENCE DEPARTMENT STANFORD UNIVERS ITY Best Av able Copy INFORMA110N SERVICE " .j, 1 J vpJ ,
45
Embed
STANFORD MEMO COMPUTER SCIENCE REPORT AN OF FORTRAN … · AN EMPIRICAL STUDY OF FORTRAN PROGRAMS-BY Lo DONALD E. KNUTH COMPUTER SCIENCE DEPARTMENT STANFORD UNIVERS ITY ... a fairly
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
STANFORD ARTIFICIAL INTELLIGENCE PROJECTMEMO A IM-137
COMPUTER SCIENCE DEPARTMENTREPORT NO. CS-186
AN EMPIRICAL STUDY OF FORTRAN PROGRAMS
-BYLo
DONALD E. KNUTH
COMPUTER SCIENCE DEPARTMENT
STANFORD UNIVERS ITY
Best Av able Copy
INFORMA110N SERVICE " .j, 1J vpJ ,
An 1,pIr'ica] : ud;, of !"O'RTJ'U Pro 'ra:)i
PoraLId !.. Knuth
.wa Lract: A sample of programs, written in FORTRAN by a wide varietyof people for a wide variety of applications, was chosen "atrandom" in an attempt to discover quantitatively "whatprof-rammerc really do." Statistical results of this surveyare presented here, together with some of their apparentimplicationp for future work in compiler dosign. The principalconclusion which may be drawn is the importance of a program"profile," namely a table of frequency counts which record howoften each statement is performed in a typical run; there arestrong indications that profile-keeping should beco.re a standardpractice in all computer systems, for casual users as well assystem programmers. This paper is the report of a three monthstudy undertaken by the author and about a dozen students andrepresentatives of the softvare industry during the summer 1970.It is hoped that a reader who studies this report will obtaina fairly clear conception of how FORTRAN is being used, andwhat compilers can do about it.
This research was supported, in part, by IBM Corporation, by XeroxCorporation, and by the Advatnced Research Projects Agency of the Officeof the Department of Defense (SD-185).
Reproduced in the USA. Available from the Clearinghouse for FederalScientific and Technical Information, Springfield, Virginia 22151.Price: Full size copy b5.OO; microfiche copy $4. 96
.Al l !AI"Ie ud". PrI)rRf otrwmn
!':'\i d P, * m
Deci;,nerc of cu:,.pi vrA and 1 s!, rue to, of computer rcience usually
bave ccrriparativ:ely little Info1r-ation about the way in which programinC
lanbuat~e nre actuall, ur M by t&*ical progrwnmers. We Lhink we know what
proermmerc ,enerally do, but our notions are rarely based on a representative
sample uf thc prod rams which are actually being run on computers. Since
compiler writer- muct prepare a system capable of translating a language
in all its Ienerality, it is easy to fall into the trap of assuming that
complicated constructions are the norm when in fact they are infrequently
used. There has been a long history of optimizing the wrong things, ucing
elaborate mechaniems to produce beautiful code in cases that hardly ever
arise in practice, while doing nothing about certain frequently occurring
situations. For example, the present author once found grnat significance
in the fact that a certain complicated ..cthod was able to translate the
etatement
C[Ix J1 := ((A+X)xY) +2.7 68+ ((L-M)x(-K))/Z
into only 19 machine instructions compared to the 21 instructions obtained
by a previously published method due to Saller et %l. (See Knuth [11].)
The fact that arithmetic expressions usually have an average length of only
two opcrands, in .racti7c, would have been a great chock to thc author at
that time!
There has been widespread realization that more data about language
use is needed; we can't really compare two different compiler algorithms
1
.' w ,ltr,::,, " ' tal deal witl. Of course, the great
l t. k-!c, Lii tii as a "typical programmer"; there
'I ,',:it:uS .. i. , , c a'.iw pro .rami written by different people
wiVh dirrtterent n,, and s,.:ipathius, and indeed there is considerable
a:.i:.iou , , h:I dtM 'Liic L pUfotyeuns written by the same person. Therefore
we 2amvot t atlut aly ':,eami'ements to Ie very accurate, although we can measure
'ho de,!ee of varia , .,,r, in an attempt to determine how significant it is.
Not all poperties of pro ',rtns can be reduced to simple statistics; it is
necessary to study selected prograns in detail in order to appreciate their
cma-acteristics more clearly. For a survey of early work on performance
measurement and evaluation, see Calingaert [2] and Cerf [31.
During the summer of 1970, the author worked together with several
other people, In order to explore the nature of actual programs and the
corresponding implications both for software design and for computer science
education. Members of the group included G. Autrey, D. Brown, I. Fang,
D. Ingralls, J. Low, F. Maginnis, M. Maybury, D. McNabb, E. Satterthwaite,
R. Sites, R. Sweet, and J. Walters; these people did all of the hard work
which led to the results in this report. Our results are by no means a
definitive analysis of programming behavior; our goal was to explore the
varioas possibilities, as a group, in order to set the stage for subsequent
individual research, rather than to go off in all directions at once. Each
week the entire group had an eight-hour meeting, in order to discuss what
had been learned during the previous week, hoping that by combining our
differing points of view we might arrive at something reasonably close to
Truth.
A first idea for obtaining "typical" programs was to go to Stanford's
Computation Center and rummage in the wastebaskets and the recycling bins.
2
i; is -we resualts but, sh.owedixedae- w~la1 l ',oul,; L)ax. 1it'": -cI_')os:
wastebasrret usually: receive ner:c.n prof-rvia s urf.or,' seemrs
Th-e next approach was to probe random>, amer.; the sem--orteoted
files stored on disks, looking for sou.rce text; tis was successful,'.
resulting in itprograsis, totalling about 2,-Ccards. 111e adA.ded- ninec
proy rr:is from the ',SD subroutine lib:rary, and th.ree prog ramTs fro7m ti-e
"Scientific Subroutine Packag7e", and some prod-uct ion pro-ate-. fromr tne
Stanford Linear Accelerator Center. 1, few classical bchark n~rco-r.ms
(nuclear codes, weatiher codes, and aerospace calcu-lations) wore alsoc
contributed by 123!4 representativ:es, anad to ton th'in,-s off we th roq CrSore
w.ograsris of personal interest t, ocies of -,he .:cc:p
. ,, it. va.:id collection of proigrams: some
['t)" ) 1[! c: :t ei)I L i , SO:ic u.. ; come important, some
' . i~-; -omc I'., p:',, wI(' ion. s ome ror play; some numerical, some
:. : LtL).i al.
Lt is wdli-':own .h at Ld fcTrt pro, ranming languages evolve different
It'. of prO. r'm:i:: so our ;tudy was necessarily language-dependent.
.oi. example, one would ex pect that expressions in APL programs tend to
he .on0er t-.Lan in l cLRN programs. But virtually all of the programs
oltain.ed iby our samplinr procedure were written in FORTRAN (this was the
fi.st surprise of thie summer), so our main efforts were directed toward the
study of I-OR'TRAN programs.1 /
Was this swaple representative? Perhaps the users of Stanford's
computers are more sophisticated than the general programmers to be found
elsewhere; after all we have such a splendid Computer Science Department!
But it is doubtful whether our Department had any effect on these programs,
because for one thing we don't teach FORTRAN; it was distressing to see what
little impact our courses seem to be having, since virtually all of the
programs we saw were apparently written by people who had learned programing
elsewhere. I'rthermore, the general style of programming that we found
showed very little evidence of "sophistication"; if it was better than
average, the average is too horrible to contemplate! (This remark is not
intended as an insult to Stanford's programmers; after all we were invading
their privacy, and they would probably have written the programs differently
By contacting known users of ALGOL, it was possible to collect a fairlyrepresentative sample of ALGOL W programs as well. The analysis ofthese programs is still incomplete; preliminary indications are thatthe increased flexibility of data types in ALGOL W makes for much morevariety in the rature of inner loops than was observed in FORTRAN, andthat the improvid control structures make GO TO's and labels considerably.less frequent. A comprehensive analysis of ALGOL 60 programs hasrecently been completed by B. Wichmann [19].
We analyzed one PL/I program by hand. COBOL is not used at Stanford'sComputation Center, and we have no idea what typical COBOL programs are like.
14
if' they had known the code was to be scnitinized by self-appointed experts
like ourselves. Our purposes were purely scientific, in an attempt to find
out how things are, without moralizinil or judging people's competence.
The point is that the 'Stanford sample seems to be reasonably typical of
what might be found elsewhere.) Another reason for believing that ou,'
sample was reasonably good is that the programs varied from text-editing
and discrete calculations to number-crunching; they were by no
means from a homogeneous class of applications. On the other hand we do
have some definite evidence of differences between the Stanford sample and
another sample of over 400 programs written at Lockheed (see Section 2 of
this report).
The programs obtained by this sampling procedure were analyzed in
various ways. First we performed a static analysis, simply counting the
number of occurrences of easily recognizable syntactic constructions.
Statistics of this kind are relevant to the speed of compilation. The
results of this static analysis are presented in Section 2. Secondly, we
selected about 25 of the programs at random and subjected them to a dynamic
analysis, taking into account the frequency with which each construction
actually occurs during one run of the program; statistics of this kind are
presented in Section 3. We also considered the "inner loops" of 17 programs,
translating them by hand into machine language using various styles of
optimization in an attempt to weigh the utility of various local and global
optimization strategies; results of this study are presented in Section 4.
Section 5 of this paper summarizes the principal conclusions we reached,
and lists several areas which appear to be promising for future study.
1
e:',c'.ei a 1-r. ,, :mnr of FO.RTRAN programs to see how frequently
,' ., . ori, a!c used Ln practice. Over 250,000 cards
S ' pt's !?I:i:, pruVra:%s) were analyzed by Mr. Maybury at the computer
*',*:te:' ,:,lf 1.' te, i ).sile and Space Corporation in Sunnyvale.
.',albJ ., .l ows tLhe distr'ibution of statement types. A "typical
I. uckheed pro'run" consi'ts of 120 coment cards, plus 178 assignment
da4ehO U11140'i S* ,e a*C4.h* t! h.4 COt~S~SSSSSSSaaSSSSSSSSS5S *lS( -, tqtu 19d.j. S 6..S S S O S S S S itC.LA.CJ 0 141 A -1'01, . ii u)a'A VOL ~ S IIU~a@S*SSSSSSSSSSSSSSSgS
of course, in this particular case the loop is executed only 16 times,
and so it could be completely unrolled into 32 instructions
C 4 1.L( B
BF.R 5C , L(39)
BI!R 5
C it,L(53)BER 5
reducing the "score" to 3. But in actual fact the L table was loaded
in a DATA statement, and it contained a list of special character codes;
a ,,ore appropriate program would replace the entire DO loop by a single
test
IF (LT(K(I))) 1,3,1
fur a suitable table LT, thereby saving over half the execution time of the
program. (Furthermore, the environment of the above DO loop was
DO 2 1 = 7,72
so that any assembly language programmer would have reduced the whole business
to a sinirle "translate and test".)
20
DOUBLE ABDDO 1 K l I,NA - T(I-K,l+K)B - T(I-K,J+K)D = D-A*B
(This is one of the few times we observed double precision being used, although
the numerical analysis professors in our department strongly recommend
against the short precision operators of the 360; it serves as another
indication that our department seems to have little impact on the users
of our computer.) The scores for this loop are
89 , 67 , 38 , 13 , 12 ;
here level 2 suffers from some clumsiness in the indexing and a lack of
knowledge that an ME instruction could be used instead of MD.
Example 4. Here the inner loop is longer and involves a subroutine
call. The following code accounted for 70% of the running time; the entire
program had 214 executable statements.
DO 1 K = M,20CALL RAND(R)IF (R .-GT .81) N (K) 1
1 CONTINUE... .. a
SUBROUT11M RAND(R)J = 1*65539IF (J) 1,2,2
1 J = Z+2147483647+12 R=J
R = R*.4656613E-9I =JK K+lRETURNEND
(Here we have a notoriously bad random number generator, which the programmer
must have gotten out of an obsolete reference book; it is another example
of our failure to educate the eoemunlby,) Conversion from integer to real
is assumed to be done by the sequence
,21
I' r, '!..' l
'o Cu I t, at,.Lu ,n: .,t.: d' ,,:A: and ;P.l.H1. By further adjusting these
-31!onctanvs uLh mu.11.plicat-on by .)W5661E-9 A 2 1 could be avoided;
but this o er":atiot was felt to be beyond the scope of level 4 optimization,
althouw:t it would occur naturally to any programmer using assembly language.
The most interesting thing here, however, is the effect of subroutine
linkage, since the longt proloue and epilogue significantly increases the
time of the inner loop. The timings for levels 0-3 assume standard OS
subroutine conventions, although levels 2 and 3 are able to shorten the
prolotguie and epilogue somewhat because of their knowledge of program flow.
For levelh, the subroutine was "opened", placed in the loop without any
linkage; hence the sequence of scores,
19.9 , 105.1 , 81.h , 76.2 , 27.2
Without subscripting there is comparatively little difference between
levels 0 and ,; this implies that optimization probably has more payoff
for FORTRAN than we would find for languages with more flexible data structures.
It would be interesting to know just how many hours each day are spent
in prologues and epilogues establishing linkage conventions.
Example 5. The next inner loop is representative of several programs
which had to be seen to be believed.
DO 1 K = 1,NM = (J-1)*lO+K-1
IF (M.Ec.O) M = 1001Cl Cl+Al(M)*(Bl*-*(K-1))*(B2**(J-1))C2 - C2+A2(M)*(Blx-*(K-1))*(B2**(J-1))IF ((K-l).EX.0) T = 0.0IF ((K-l).GE.1) T = Al(M)*(K-1)*(BlI*(K-2))*(B2**(J-1))C = C3+TIF ((K-1).EQ.0) T = 0.0IF ((K-l).GE.I) T = A2(M)*(K-I)*(Bl**(K-2))*(B2**(J-I))Ch = C4+TIF ((J-1).%,.O) T = 0.0IF ((J-l).GE.l) T = Al(M)*(Bl*-*(K-1))*(J-I)*(B2**(J-2))
C5 = C5+T
22
i, ( (J - ) .I .,) T 0.0IF. (0j-)..1) Tr =a2(M) *(Bl**(K-1))*x(J-1),(Bix,,(.T-2))
31 CONTINUE
After staring; at this for several minutes, our group decided it did not
deserve to be optimized. But after two weeks' rest we looked at it again
and found interest~in. applications of "strength reduction", both for the
exponentiations and for the conversion of K to real. (The latter applies
only in level 4, which knows that K doesn't get too large.) The scores
were
13f7 , 7 545 , 159 , 145 , 10 •
Level 1 optimization finds common subexpressions, and level 2 finds the
reductions in strength. Level I removes nearly all the IF tests and
rearranges the code so that C1 and C2 are updated last; thus only
Bl*-*(K-1) is necessary, not both it and Bl**(K-2)
Example 6. In this case the "inner loop" involves subroutine calls
instead of a DO loop:
SUBROUTINE S(A,B,X) 9DIMENSION A(2),B(2) 9X O 9Y = (B(2)-A(2))*l2+B(l)-A(l) 9IF (Y.LT.0) GO TO 1 9X =Y 5
1 RETURN 9END 9SUBROUTINE W(AB,C,D,X) 4DIMENSION A(2),B(2),C(2),D(2),U(2),v(2) 4X =0 hCALL S(A,D,X) 1IF (X.Ba.O) GO TO 3 4CALL S(C,B,X) 2IF (X.EQ.O) GO TO 3 2CALL S(C,A,X) 1u(l) = A(l)u(2) = A(2) J.IF (X.NE.O) GO TO 1 1U(l) C(l) ou(2) = c(2) 0
1 CONTINUE 1
23
VALl )(1,D,X) 1
2) D(l) 0lIi' (x,.Ni..o) GO TO 2 1
• ().D(1) 0V()= D(2) 0
: CALL S(U),VX) 1CONTINUE 4R E'rURN 4
The numbers at the right of this code show the approximate relative
frequency of occurrence of each statement; calls on this subroutine
accounted for ,0e cf the execution time of the program. The scores for
various optimization styles are
1545.5 , 1037.5 , 753.3 , 736.3 , 289
Here 270 of the 1545.5 units for level 0 are due to repeated conversions
of the constant 0 from integer to real. Levels 2 and 3 move the first
statement "X = 0" out of the main loop, performing it only if "Y.LT.O"
The big impiovement in level 4 comes from inserting the code for subroutine
S in line and making the corresponding simplifications. Statements like
u(i) = A(1) , U(2) = A(2) become simply a change in base register.
Perhaps further reductions would be possible if the context of subroutine W
wer examined, since if we denote 12*A(1)+A(2) by a , 12*B(1)+B(2) by b ,
etc., the subroutine computes max(O, min(b,d)-max(a,c)) .
Example 7. In this program virtually all of the time exclusive of
input/output editing was spent in the two loops
DO 1 I = l,NA = X**2+Y*42-2.*X*Y*C(I)B = SqRT(A)K = 100.*B+I.5
1 D(I) = S(I)*T(K)Q = D(1)-D(N)DO 2 1 = 2,M,2
2 Q = Q+14.*D(I)+2 .*D(I+l)
214
where array D was not used Eubsequently. The scores are
Here level 3 computes X'*2 by "M,R 0,0" instead of a subroutine call,
and it computes .2.xD(1I4) by "ALR 0,0" instead of multiplying. Level I
combines the two DO loops into one and elininates array D entirely.
(Such savings in storage space were present in quite a few programs we
looked at; some matrices could be reduced to vectors, and some vectors
could be reduced to scalars, due to the nature of the calculations.
A quantitative estimate of how much space could be saved by such optimization
would be interesting.)
Example 8. Ninety percent of the running time of this program was spent
wIDre Henna". "Plannin. an-a codinv c1l rr s r .et, .t'0lS. ; Institute
"< ' *~ ~ -- '-4C- reprinted in
a-c> , -- nao-
Preceding page blank
Schol ol!u:'tec an; d atiApplied Cc lance, i'*uiCllarL,
Lo 05 1.els C 'v~A 0 i eot -- ~ :arcii pp.
[ f '> , ert !wait e ',acc eatu C O'l : o l . lP
ic'oa j 1 cr~ Ca Irearao:
atU Cz 1) pp).
1 'v. a' "n:: .I ~ :o a ~ rt~ni I2 1al .,atoiao' -, cn, ~ .i .~ .D}t
S " 1 liusell,, , r. "Automatic Prog~ram Anal~yis." Ph.1D. Thesis,Sehool ,1' !)f neeri, and Applied Science, Univ. of California,Ios Anj'eleo, Calif'ornia, Report f,9-l, March 196(9, 168 pp.
It atterthwaite, E. "Source LantAage DebugjinC Tools." Ph.D. Thesis,U :tanford (Inlversty, n preparation.
I!L Wic}unann, B. A. "A comparivon or' ALGOL ,O execution speeds."National Physical Laboratory", Central Computer Unit Report 3,,]anuary I' ,, i :, pp.
[19j Wic hmann, 13. A. "Oeine statistics from AW(:OL programs." NationalP'hysical Laboratory, Central Computer Unit Report 11, August 1970,Opp.