-
literate programming tools let you arrange the
parts of a program in any order and extract
docurqentation and code from the same source file. The author
argues
that languagedepen- dence and feature com- plexity have
hampered
acceptance of these tools, then affers 0 simpler
alternative.
NORMAN RAMSEY Bellcore
LITERATE PROGRAMMING S~MPLUFIED~
I n 1983, Donald Knuth introduced literate programmingin the
form of Web, his tool for writing literate Pascal programs. Web
lets authors interleave source code and descriptive text in a
single document. It also frees authors to arrange the parts of a
program in an order that helps explain how the pro- gram functions,
not necessarily the order required by the compiler.
In the mid-80s, word spread about this new programming method as
sev- eral literate programs were published. In 1987,
Com77zmications of the ACM created a special forum to discuss
liter- ate programming.2 Web was adapted to programming languages
other than Pascal, including C, Modula-2, Fortran, Ada, and
others.3-6 With expe- rience, however, many Web users
became dissatisfied. Continued inter- est in literate
programming led to a frenzy of tool building. In the resulting
confusion, the literate-programming forum was dropped, on the
grounds that literate programming had become the province of those
who could build the* own tools.*
The proliferation of literate-pro- gramming tools made it hard
for liter- ate programming to enter the main- stream, but it led to
a better under- standing of what such tools should do. Today the
field is more mature, and there is an emerging demand for tools
that are simple, easy to learn, and not tied to a particular
programming lan- guage-
My own literate-programming tool, noweb, fills this niche.
Freely available
IEEE SOFTWARE 07407459/94/m 00 0 1994 IEEE 97
-
AN EXAMPLE OF NOWEB: COUNTING WORDS I This example, based on d
program by Klaus Gunter-
mann and Joachim Schrod and a program hy Silvio Levv and D. E.
Knuth, presents the word count program from Lnix, rewritten in
noweh to demonstrate literate programming using noweh. The level of
detail in this c document is intentionally high, for didactic
purposes; many of the things spelled out here dont need to he
explained in other programs. The purpose ofwc is to count lines,
characters, and/or words in a list of files. The number of lines in
a file is the number of new-line characters it contains. The number
of characters is the file length in bytes. A word is a maximal
sequence of consecutive characters other than newline. space, or
tah, containing at least one visible ASCII code. (Vie assume that
the standard .ASCIl code is in use.)
Most literate C programs share a common structure. Its probably
a good idea tn state the overall structure explicitly at the
outset, even though the various parts could all be introduced in
chunks named if we want- ed to add them piecemeal
Here, then. is an overview of the file WC. c that is defined by
the noweh program WC. nw:
98a
Root chunk (not used in this docummt).
Lve must include the standard l/O definitions because we want to
send formatted output to stdout and stderr. dieader fiks to inrhde
98b>= 986
#include This code is used in chunk 9%~.
The status variable will tell the operating system if the run
was successful or not, and prog-nume is used in case theres an
error message to be printed.
cDe$niriom 98~s 98C #define OK 0
/* status code for successful run */
#define usage-error 1 /* status code for improper rryntsx l
/
#define cannot-open-file 2 /* statm code for file acc68is error
ft
Definer cannot-open-file, usedin chunktO&. OK, used in chunk
98d. usage~ertot, usedinchunk IO2d.
Uscs8tatwi 9M This d&niiw is continualin chunks loOa,
IO&, md 102r. Thiic&iPtsedG&uak9Ra. ,: '
.P .oSk&d tamk&s 98dxa /* 962
on the Internet since 1989, noweb strips literate programming (
tc CC
nc tk PI
i its essentials. Programs are composed of named chunks of )de,
written in any order, with documentation interleaved.
To facilitate comparison of Web and noweb, a sample Iweb program
appears in the shaded box that runs throughout iis article. I took
the text, code, and presentation for this sam- e from Knuths
Literate Programming.
Noweb was developed on Unix and can be ported to non- nix
platforms provided they can simulate pipelines and sup- )rt both
AVSI C and either awk or Icon. For example, Kean alleges Lee
Wittenberg ported noweb to MS-DOS. Noweb unique among
literate-programming tools in its pipelined,
rtensible implementation, which makes it easy for experi-
[enters to create new features without writing their own tools.
Ii1 m m W
cc h: 0 St sl St
P rc
EBS COMPLEXITIES
Webs complexities make it difficult to explore the idea of
terate programming because too much effort is required to laster
the tool. To compound the difficulty, different program ling
languages are served by different versions of Web, each ith its own
idiosyncrasies.
The classic Web expands three kinds of macros, prettyprints ,de
for typeset output, evaluates some constant expressions, a&s
string support into Pascal, and implements a simple form f version
control. The manual documents 27 control :quences. I Versions for
languages other than Pascal offer ightly different functions and
different sets of control :quences.
Web uses its Tangle tool to produce source code and its Jeave
tool to produce documentation. Webs original Tangle :moved white
space and folded lines to fill each line with
multiple f i Les a:~ exit (stdt:b) ;
If the.first argument begins with a (\tt-), the user is choosing
the desired counts and specify-
Fig-we 1. A noweb sozwc(, fiagmentfiom the example progmnz.
98 SEPTEMBER 1994
-
- - - - - - - . -7
This code is used in chunk 9 8 ~ ~ .
N o w w e c o m e to the genera l layout of the m a i n
function.
/ < T h e m a i n p rog ram 9917s 9 9 a
main(argc , a rgv) int argc;
! /* # a rgumnte o n Un ix c o n m a n d l ine*/ I char l
*argv;
tokens, m a k i n g its ou tpu t u n r e a d a b l e . La te r
adap ta t i ons p r e - se r ved l ine b reaks bu t r e m o v e d o
the r wh i te space . W e b s W e a v e d iv ides a p r o g r a m
in to n u m b e r e d sect ions, a n d its i ndex a n d c ross - re
fe rence in fo rmat ion re fe r to sect ion n u m b e r s , no t p
a g e n u m b e r s . W e b works poo r l y wi th LaTexz L a T e x
c o n - structs canno t b e u s e d in W e b source , a n d get t
ing W e a v e out- pu t to wo rk in L a T e x d o c u m e n t s
requ i res ted ious ad jus tments by h a n d . W e a v e s sou rce
(wr i t ten in W e b ) is severa l t h o u s a n d l ines long , a
n d the format t ing c o d e is no t iso la ted.
N O W E B S F E A T U R E S
N o w e b s simplici ty de r i ves f rom a s imp le m o d e l of
fi les, wh ich a r e m a r k e d u p us ing a s imp le syntax. F
igu re 1 s h o w s a f ragmen t of the n o w e b sou rce u s e d to
g e n e r a t e the b o x e d s a m - p le p r o g r a m . It s h o
w s e x a m p l e s of c h u n k def in i t ions a n d uses, q u o
t e d code , a n d lists of de f i ned ident i f iers - al l o f n
o w e b s syntax excep t e s c a p e d a n g l e brackets .
F R O structere. A n o w e b fi le is a s e q u e n c e of
chunks . A c h u n k m a y con ta in code , in wh ich case it is n
a m e d , o r documen ta t i on , in wh ich case it is u n n a m e
d . C h u n k s m a y a p p e a r in a n y o rde r . E a c h c o d
e c h u n k b e g i n s wi th + & m k nmtzes= o n a l ine by
itself. T h e doub le - le f t a n g l e b racke t must b e i n the
first co lumn. E a c h documen ta t i on c h u n k b e g i n s wi
th a l ine that starts wi th a n @ symbo l fo l l owed by a s p a c
e o r new l ine . C h u n k s a r e te rm ina ted implicit ly by
the b e g i n n i n g of a n o t h e r c h u n k o r by the e n d
of the file. If t he first l ine in the fi le d o e s no t m a r k
the b e g i n n i n g of a chunk , n o w e b a s s u m e s it is
the first l ine of a documen ta t i on chunk .
A s F igu re 2 shows, n o w e b uses its n o t a n g l e a n d n
o w e a v e tools to extract c o d e a n d documen ta t i on ,
respect ive ly . W h e n n o t a n g l e is g i ven a n o w e b
file, it wr i tes the p r o g r a m o n s t a n d a r d output . W
h e n n o w e a v e is g i ven a n o w e b file, it r e a d s it a
n d p r o - duces , o n s t a n d a r d output , Tex sou rce for
typeset d o c u m e n t a - t ion.
c o d r US. C o d e chunks con ta in p r o g r a m sou rce c o d
e a n d
/* the a rgumante , a n a r ray of a t r iage l / f
4nr iab leshca l to m a i n Y Y b > w a g - n a m - a rgv [Ol
r < S e t u p opt ion select ion 9 9 0 cPr in t the g rand
tota.ls if there w e r e mrr l t ip lef i l r r 1 0 2 b exit
(statue);
)
Defines: argc, used in chunks 99c a n d 99d. a r m used in
chunks 99c, l O O r , a n d IOlc. ma in . never used.
Uses p rog_nuu 9 8 d a n d #tatw 986. This code is used in chunk
98s.
4i r iab lcs lour l tomain99br t int f l le~count~
/* h o w m a n y f i le8 thora a re l / char *whiaht
9 9 b
/* wh ich ooun t~ to pr int */ D & W
f i l r_aouat ,usedincbdra99i r , JOOc , 10/r , m d 102b. r ih
ia~, w e d in chunks J?%, lo le, R X ? & , a n d mu.
Tb io def in i t ion ia umc io@ i s~dwnb I& b a a d lw T h i
s + b d h o ~ .
< .G t u p opt ion select ion Y 9 ~ x 3 9 91.
wh ich = lwc; /* if n o opt ion is g iven pr int 3 va lues
*/
if (a rgc > 1 h & *argv[l ] = = t-c) f wh ich = argv[ l ]
+ 1; argc-- ; a r g v + + j
1
r e fe rences to o the r c o d e chunks . S e v e r a l c o d e
chunks m a y h a v e the s a m e n a m e ; n o t a n g l e conca
tena tes the i r def in i t ions to p r o - d u c e a s ing le c h
u n k
f i le-count - argc - 1 ~ Us- = g c 9 % p % r r 9 9 ~ 4 f i
le-count 996, ~L IJ wh ich 99) . This code is used in chunk
99a.
C o d e - c h u n k def in i t ions a r e l ike m a c r o def in
i t ions: N o t a n g l e : N o w w e S c a n the r e m a i n i n g
arguments a n d try to o p e n a fi le
extracts a p r o g r a m by e x p a n d i n g o n e c h u n k
(by defau l t t he if possib le. The ti le is p rocessed a n d its
statistics a re aven. i V e u s e a d o 9 l . whi le h o p b e c a
u s e w e shou ld read f rom the
c h u n k n a m e d cc*>>) . T h e def in i t ion of that
c h u n k c o & & s ref - s tandard input if n o ti le n a
m e is g iven. e r e n c e s to o the r chunks , wh ich a r e
themse lves e x p a n d e d , a n d so on . F igu re 3 s h o w s
par t of the b o x e d s a m p l e p r o g r a m as
< P r o a u r & t b e @ 9 9 $ > r r p h 9 9 d
ext rac ted by no tang le . N o t a n g l e s ou tpu t is r
eadab le ; it p r e - l&3- - j & c I
se rves wh i te s p a c e a n d ma in ta ins the inden ta t ion
of e x p a n d e d : chunks wi th respec t to the chunks in wh ich
they a p p e a r . Th is b e h a v i o r a l lows n o w e b to b e
u s e d wi th l a n g u a g e s l ike M i r a n d a
& .& ,& i m d e o - IO lrz a n d Haskel l , in wh
ich inden ta t ion is s igni f icant. , , .gkaB r r p % jJ,
W h e n doub le - le f t a n d - r ight a n g l e b rackets a r
e no t pa i red , r #i i CstsWcs*@ X U X e s
they a r e t rea ted as l i terals. Use rs c a n fo rce a n y
such brackets , *W # e f t?o+ 1,
e v e n p a i r e d brackets , to b e t rea ted as l i teral by
us ing a p r e c e d - _ ~?q r *mrJ@?rc r ; ,I II /e _.. . .~
.-..._ - -..- .-.._ .__. _ _ _ _ _
If tbe first a rgumen t beg ins with a -, the user is choos in r
the des i red counts a n d speci fy ing the o rde r in wh ich they
- shou ld b e d isp layed. E a c h select ion is g iven by the init
ial char - acter ( l ines, words, o r characters) . For example ,
-cl wou ld cause just the n u m b e r of characters a n d the n u m
b e r of l ines to b e pr inted, in that o rde r . W e d o no t p
rocess this s t r ing now ; w s imply r e m e m b e r whe re it is.
It wi l l b e used to contro l the for- mat t ing at ou tpu t t
ime.
I E E E S O F T W A R E 8 R
-
/* even if there is only one file*/ ) while (--argc > 0);
Lsce argc 9%. This code is ued in chunk 79n. Heres the code to open
the tile. A special trick allows us
to handle input From &din when no name is given. Recall that
the file descriptor to etdin is 0; thats what we use as the default
initial value.
int fd = 0; /*file descriptor, initialized to &din*/
DdiIP3: fd, used tn chunh lOOr, 1OOd. and IOld.
0 &8 (fd=open (*(++argv),REAI_ONLY))< 0) {
fprint(stderr, "%a: cannot open file %e\n", 9rogsame.
*argv);
statue I= cannot~open~file; file-count--; continue;
Lkesargv99a, camot~open~file98c. fd 10Oa,ffle~count 99b,
prog_nans 98d, IO&%, and status 98d.
This code is wed in chunk 99d.
4kejZe 1 OOd>=
close (fd)f
Uaesfll looa.
IOVd
This code is wed in chunk 99d.
W e will do some homemade buffering in order to speed things up:
Characters will be read into the buffer array before we process
them. To do this we set up appropriate pointers and counters.
eDq%it iQm98n+e 1OOe #define buf-size BWFSI!6
/* atdi0.h BuFsIe cbnn far effici~ */ D&es:
ktf-#i8& used in chmka lwfand IOlA
--. - --i Figure 2. Using noweb to build code and documentat
ion.
ing @ sign. Any line beginning with 0 and a space terminates a
code
chunk. If such a line has the form @ cj %de f identijk-s it also
means that the preceding chunk defines the identifiers listed in
identijkn. This notation provides a way of marking definitions
manual ly when no automatic marking is available.
Documentat ion chunks. Documentat ion chunks contain text that
is ignored by notangle and copied verbatim to standard output by
noweave (except for quoted code). Code may be quoted within
documentat ion chunks by placing double square brack- ets around
it. These brackets are ignored by notangle but are used by noweave
to give the quoted code special typographic treatment. For example,
in the sample program, quoted code is set in the Courier font.
Noweave can work with LaTex, or it can use a plain Tex macro
package, suppl ied with noweb, that defines commands like \chapter
and \section. Noweave can also work with HTML, the hypertext markup
language for Mosaic and the World-Wide Web. The example simulates
the results after processing by noweave and LaTex.
Noweave adds no newline characters to its output, making it easy
to find the sources of Tex or LaTex errors. For example, an error
on line 634 of a generated Tex file is caused by a prob- lem on
line 634 of the corresponding noweb file.
Index and cross-reference features. Cross-referencing of chunks
and identifiers makes large programs easier to understand. The
sample program accompanying this article shows full cross-ref-
erence information.
Unlike Web, noweb does not introduce numbered set- t ions for
cross-referencing. Noweb uses page numbers. If two or more chunks
appear on a page, say page 24, they are distin- guished by
appending a letter to the page number: 24a or 24b, for example.
Readers of large literate programs will appreciate the use of a
single number ing system.
Like Web, noweb writes chunk-cross-reference information in a
footnote font below each code chunk. Noweb also includes
cross-reference information for identifiers, for example, Defines
file-count, used in chunks 7,11,19, and 21. Noweb generates this by
using the @ U %de f markings in its source code, or by recognizing
definitions automatically. Although noweb can automatically
recognize definitions in C programs, I used @J%def to mark the
definitions in the sample program. This choice not only illustrates
the use of @ 0 %de f
100 SEPTEMBER 1994
-
but it also ensures results compatible with the CWeb version of
this program. Atitomatically generated indices would differ because
CWeb and noweb use different recognition heuristics. Because noweb
uses a language-independent heuristic to find identifier uses, it
can be fooled into finding false uses in com- ments or string
literals, like the use of status in chunk 3.
Complier and debugger support. On a large project, it is
essential that compilers and other tools refer to locations in the
noweb source, even though they work with notangles output. Giving
notangle the -L option makes it emit pragmas that inform compilers
of the placement of lines in the noweb source. It also preserves
the columns in which tokens appear, so that line-and- column error
messages are accurate. If you do not give notan- gle the -L option,
it respects the indentation of its input, mak- ing its output easy
to read.
Formatting features. Noweave depends on text formatters in two
ways: in the source of noweave itself and in the supporting macros.
Noweaves dependence on its formatter is small and isolated, instead
of being distributed throughout a large imple- mentation. Noweb
uses 250 lines of source for Tex and LaTex combined, and another
250 for HTML. It uses about 200 lines of supporting macros for
plain Tex and another 300 lines to support LaTex, primarily because
the page-based cross-refer- ence mechanism is complex. LaTex
support without cross-ref-
/ 1
maintargc, argv! t i int argc;
/* the number of arguments on theUNIX command line */
char **arp. !* the &uments themselves, an array
of strings */ i
int fiLe_count; i* how many f-:es tkele are *:
char **hick;- i* which cxnts to c:~nt *i
int fd = 3; /* f:le descriptor, ir~itiallzed to stdin Y
char buffer[kxf-size;; 1 i* we read the ir.~.;: :T.LO this array
*/
register char *ptr; ;* the first -nprocessed cnaracterin buffer
*/ I
register char *buf-end; /* the first unused position in buffer
4
register int c; I* current character, or number of
characters
iust read */ int &word;
/* are we within a word? */ low word-count, line-count,
char-count;
/*number of words, lines, and characters -&und in file so
far */
which = *l~~*rol'
:q P=xLn~ =
-.-- Figure 3. Part of the example program after extraction by
notangle.
Ptr = buf-end = buffer; line-count = word-count = char-count =
0; in-word = 0; C W buf-end /Wj, buffer !f//& char-count
[0/y-,
in-word lM)/~, line-count I00t; Ptr IO/~; and word-count
/U/!/I
.Ihls co& is uvzl in chunk YYd. The grand totals must he
initialized to zero at the beginning
of the program. If u e made there variables local to main, we
would have to do this initialization cylicitlp; however, Cs globals
are automatically zeroed. (Or rather, statically zeroed.) (Get
It?)
cGlobaal z?ariabb 98dd>+= 1Olb
long tot-word-count, tot-line-count. tot-char-count;
/* total number of words, lines, chars */ The present chunk,
which does the counting that is W C S mi-
smz A%-e, was actually one of the simplescto write. LSre look at
each character and change state if it begins ot ends a word.
c.%anjk 101~~ IOIC
while (1) ( &ill buf fer ifit is empty; break at end offile
[Old> C ii l ptKtti if (c > " && c < 0177) 1:
/* vieibile ASCII codes l / if (!in-word) {
word-count++; in-word = 1;
1 continue
if (C == \Zl) lZi.Ile-CoUnt++i
else if (c != "'1 &Ii c Ir '\t')coutiIlue;
in_word = 0: /*c ie newline, space, or tab */
Usesingr~rd 1OOj line-countlwptr 10af;wcmLcountlO@ Thiscode
isusedinchunk 996 Buffered I/O allows us to count the number of
characters
almost for free.
IEEE SOFTWARE 101
-
printf(" '%a\n", l argv); /* not etdin l / else
printf (\n) ; /* stdin l /
Cres argv YYz~.char-count IO/J5 file-count 996, line- count
lOOj3L;wcgrint 1026. which 99b, word-count lOll&
Yhls code 1s used ,n chunk 99d.
tot-line-count t= line-count; tot-word-count += word-count;
tot-char-count += char-count;
1 erencing requires only 34 lines of source and no supporting i
macros. HTML requires no supporting macros.
Ccrs char count lOl$ line-count 1Otlf; word-count IO@ ! - Xhls
code is wed in chunk 996. I
i1.c might as well improve a hit on Vnirs WC by displaying ~ the
number of tiles too.
1) {
wcgrint(which, tot-char-count, totJord~count,
tot~line.Jzount);
prfntf (total in %d filas\n, file-count);
Uses file-count 9911, wcgrint IO2d. which 996. This code is used
in chunk 99~. The function below prints the values according to the
speci-
fied options. The calling routine should supply a newline. If an
invalid option character is found we inform the user about proper
uie of the command. Counts are printed in eight-digit fields so
thev will line UD in columns.
1 cl)efhkiuns 98c>+a
#define print-count(n) printf("%81d': n) l)ifi!liY
I OZC
dmkm~~ I OX>= WC grintcwhich, char-count, word_count,
line_count )
1 l,21
char *which; /* which counts to print/ long &ax-count,
word-count, line-count :
/* given totals l /
while (*which) switch (%hich+t) (
case '1': print-count(line-count); break:
case w: print-count (word_caunt) ; break;
case c: print-count(char-count); break; I
default: if ((status 6r usage-error) == 0) I
fgrintf (stderr, \nlWage:%a[-lwcl filename.. .]\I?,
Prog-=) i status I= usage~ermr;
D&C?S: wc.print,usedin~hunb IOlrand 102b.
Csrs char-count loof; line-count lOof; print-count 102~.
prog~~ame Wd. status Pad, usage-error 98c, which 99b, andword-count
lOOf.
This code is used in chunk 988. A test of this program against
the nstem WC command on a
SparcStadon showed the official WC rvas slightly slower.
Although that WC gave an appropriate error message for the options
-abc, it made no complaints about the options -labc! Dare we
suggest the s)istem routine might have been better had its
programmer used a more literate approach?
Uncoupling files and programs. The mapping between noweb files
and programs is many-to-many; the mapping between files and
documents is many-to-one. You combine source files by listing their
names on notangles or noweaves command line. Notangle can extract
more than one program from a single source file by using the -R
command-line option to identify the root chunks of the different
programs.
The simplest example of one-to-many program mapping is that of
putting a C header and program in a single noweb file. The header
comes from the root chunk , and the pro- gram from the default root
chunk, wc.c notangle -Rheader wc.nw I cpif -ne wc.h
The > in the first command directs notangles output to the
file wc.c. The I in the second command directs notangles output to
the cpif program, which is distributed with noweb. cpi f - ne WC. h
compares its input to the contents of file wc.h; if they differ,
the input replaces wc.h. This trick avoids touching the file wc.h
when its contents have not changed, which avoids tiggering
unnecessary recompilations.
Because it is language-independent, noweb can combine dif-
ferent programming languages in a single literate program. This
ability makes it possible to explain all of a projects source in a
single document, including not just ordinary code but also things
like make files, test scripts, and test inputs. Using literate
programming to describe tests as well as source code provides a
lasting, written explanation of the thinking needed to create the
tests, and it does so with little overhead. If not documented at
the time, the rationale behind complex tests can easily be
lost.
IMPLEMENTING NOWEB
Until now we have discussed noweb from a users point of view,
showing that it is simple and easy to use. Nowebs imple- mentation
is also worth discussing, because nowebs extensible implementation
makes it unique among literate-programming tools. Noweb tools are
implemented as pipelines. Each pipeline begins with the noweb
source file. Successive stages of the pipeline implement simple
transformations of the source, until the desired result emerges
from the end of the pipeline.
Users change or extend noweb not by recompiling but by inserting
or removing pipeline stages; for example, noweave switches from
LaTex to HTML by changing just the last pipeline stage. Nowebs
extensibility enables its users to create new literate-programming
features without having to write their own tools.
Nowebs syntax is easy to read, write, and edit, but it is not
easily manipulated by programs. Markup, which is the first stage in
every pipeline, converts noweb source to a representa-
102 SEPTEMBER 1994
-
don easily manipulated by common Unix tools like sed and mands,
respectively. awk, greatly simplifying the construction of later
pipeline Noweb turns a World-Wide-Web browser like Mosaic stages.
Middle stages add information to the representation. into a
hypertext browser for literate programs. For example, Notangles
final stage converts to code; noweaves final you can click on an
identifier or chunk name to jump to the stages convert to Tex,
LaTex, or HTML. definition of that identifier or chunk. You can
find a hyper-
In the pipeline representation, every line begins with 8 text
version of the boxed sample program at ftp://bellcore. and a
keyword. The most important possibilities appear in
comfpub/norman/noweb/wc.html. Table 1. Markup brackets chunks by
@begin . . . @end, and it uses the noweb source to identify text
and newlines, defini- EVALUATING NOWEB tions and uses of chunks,
and quoted code, which can all appear inside chunks. It 1 a so
preserves information about file Reviewers have had many
expectations of literate-pro- names and defined identifiers. Other
index and cross-refer- gramming tools. lo We expect to be able to
write code ence information is inserted automatically by later
pipeline chunks in any order. We expect to develop code and docu-
stages. The details of nowebs pipeline representation are mentation
in one place. Finally, we expect automatically described in the
Noweb Hackers Guide, which is distributed generated cross-reference
and index information. Like the with noweb. original Web, noweb
provides all these features, but in sim-
pler form. EXTENDING NOWEB Web does provide features that noweb
lacks, but existing
Unix tools can substitute for most of these. Although noweb
Noweb lets users insert stages into the notangle and contains no
internal support for macros, Unix supplies two
noweave pipelines, so that they can change a tools existing
macro processors that can work with noweb: the C pre- behavior or
add new features without recompiling. Even lan- processor and the
m4 macro processor. The xstr program guage-dependent features like
formatted output and auto- extracts string literals, and the patch
program provides a matic index generation have been added to noweb
without form of version control similar to Webs change files.
recompiling. Indexing and cross-referencing make noweb less
simple
Stages inserted in the middle of a pipeline both read and than
it could be. I need complex LaTex code to compute write nowebs
pipeline representation; they are called Jilters, page numbers for
use in cross-reference lists and in the by analogy with Unix
filters, which are used in the Unix index. The ability to use page
numbers justifies this com- implementation.
Filters can be used to change the way noweb works; for example,
a one-line sed script makes noweb treat two chunk names as
identical if they differ only in their representation of white
space, as in Web. A 55-line Icon program makes it possible to
abbreviate chunk names using a trailing ellipsis. To share programs
with colleagues who dont enjoy literate start a cllLlnk
programming, I use a filter that places each line of docu- End a
chunk mentation in a comment and moves it to the succeeding code
chunk. With this filter, notangle transforms a literate Qtext mittg
sWitt,q appeared in 3 chunk program into a traditional commented
program, without @nl A newline appeared in a chunk loss of
information and with only a modest penalty in read- ~ @tlefn
*[z7/te The code chunk named t/N?/ze in being ability. defined
Filters can be used to add significant features. Noweaves ~ @use
nume A reference to code chunk named 7ull)le cross-reference and
indexing features use two filters, one ~ @quote Start of quoted
code in a document that finds uses of defined identifiers and one
that inserts I chunk cross-reference information. In most cases,
programmers @endquote F;$kf quoted code in a document must mark
identifier definitions by hand, using @Cl %def, .
but in some cases a third, language-dependent filter can be Q f
le pt1a7ltc Name of the tile from which the used to mark identifier
definitions, making index generation cl1w1ks Gllllt! completely
automatic. @index defn ident The current chunk contains a
Kostas Oikonomou of AT&T Bell Labs, Kaelin definition of
ihnt Colclasure of Bridge Information Systems, and Conrad0 @index .
. . Automatically generated index Martinez-Parra of the Universidad
Politecnica de Catalunya inforination in Barcelona have written
noweb filters that add prettyprint- @xref . . . Automaticall
generated cross- ing for Icon, C++, and Dijkstras language of
guarded corn- ~. .._- ~~-.-~~-.- reference in ormation fy --_ .- ~~
_~~ -~~ ~~,
IEEE SOFTWARE 103
-
plexity, especially since it can be hid- den from most users.
You do need to understand the LaTex code if you want to customize
the appearance of your noweb documents while retain- ing nowebs use
of page numbers for cross-reference. Most literate-pro- gramming
tools forbid customization, but not all users will accept such a
restriction. I have compromised between simplicity and
customizability by add- ing LaTex options for a dozen of the most
com- monly requested cus- tomizations. Users can choose from among
these ontions without unders&ding nowebs LaTex code.
Experimenting with noweb is easy because the tools are simple.
If the experiment is unsat- isfying, it is easy to a- bandon,
because notan- gles output is readable,
records. Programs created with noweb may be delivered in the
form of ordi- nary source code, leaving no clue that noweb was
used. The only way for me to find out about uses of noweb is to
appeal for information on the Internet. In this way I have learned
about significant noweb projects in C++, Modula-2, Occam, parallel
C, Perl, Prolog, and Scheme.
David Hanson and
LANGUAGE- INDEPENDENT TOOLS iIKE NOWEB ARE
Chris Fraser are using noweb to write a book describing the
design and implementation of a retar- getable C compiler. Tip- ton
Cole & Company use Noweb in their consulting
SIMILAR AND business, which focuses on EASIER TO
writing database applica- tions on DOS platforms.
USE THAN They find that noweb TliADlTlONAL
helps compensate for some of the deficiencies in
COMPLEX DOS database tools, and TOOL%
that literate programming helps when a customer
and documentation can - be preserved as embed- ded comments.
Noweb is simpler than Web and easier to use and under- stand, but
it does less. I argue, howev- er, that the benefit of Webs extra
features is outweighed by the cost of the extra complexity, making
noweb better for writing literate programs. Few of Webs remaining
features will be missed; for example, many compil- ers evaluate
constant expressions at compile time. Noweb users are most likely
to miss pretty-printing, but it may be more trouble than it is
worth.
In my own work, I have used noweb for code written in various
lan- guages, including assembly language, awk, Bourne shell, C,
Icon, Modula-3, Promela, Standard ML, and Tex. These projects have
ranged in size from a few hundred to twenty thou- sand lines of
code. Information about other programs written using noweb is hard
to find. Noweb is provided free of charge, generating no sales
requests a change in a program that hasnt been
touched in a year. A customer-sup- port group at Sun
Micro-systems is using noweb to help teach their cus- tomers how to
work with aspects of the. Solaris operating system like threads and
device drivers. The liter- ate-programming paradigm makes it
possible to extract working code from the same source used to
create techni- cal reports and newsletters.
OTHER TOOLS
A survey of literate-programming tools is beyond the scope of
this arti- cle, but we can still sketch nowebs place in the context
of other tools. Most literate-programming tools are
language-dependent and complex. You must change tools when chang-
ing programming languages, repeat- ing effort spent mastering a
tool.
Newer tools, like noweb, are lan- guage-independent. The three
most prominent are noweb, nuweb, and
Funnelweb. To users, Noweb and nuweb look
very similar. There are minor syntac- tic differences, and nuweb
uses markup within the source file instead of command-line options
to show things like the names of output files, but both are simple
and easy to mas- ter. Funnelweb is a complex tool that includes its
own rudimentary typeset- ting language and command shell.
Many of the similarities between noweb and nuweb arise by
design. Nuwebs initial design borrowed from noweb, and later
versions of each tool have incorporated ideas from the other.
Noweb and nuweb differ substan- tively in implementation. Nuweb
is not pipelined; it is a single, mono- lithic C program. This
structure makes nuweb easy to port, since only a C compiler is
needed, and it makes it faster, since no parts are interpret- ed
and the overhead of creating a pipeline is eliminated, but it also
makes nuweb hard to extend. Nowebs pipeline makes it easy to
extend, and different stages of the pipeline can be implemented in
different pro- gramming languages, depending on which language is
best for which job. Extensibility is particularly valu- able to
those interested in pushing the frontiers of literate programming,
who would otherwise have to write their own tools from scratch.
I advocate language-independent tools for two reasons. First,
after mas- tering one such tool, you can write almost anything as a
literate program, including things like shell and per1 scripts,
which often benefit dispropor- tionately from a literate treatment.
Second, two of these tools - noweb and nuweb - are much simpler,
and therefore much easier to master, than any of the
language-dependent tools. Those who use one language exclu- sively
may, however, prefer a lan- guage-dependent tool, since it pro-
vides pretty-printing, which when done well can make the printed
liter- ate program easier to read.
-
. N oweb probably culminates one kind of evolution in literate
pro- gramming: the trend toward greatest simplicity. No
significantly simpler tool could do much. Noweb also begins another
kind of evolution, toward greater extensibility and flexi- bility.
Further evolution might involve replacing Unix shell scripts and
pipelines with an embedded language having special data types to
represent pipelines, chunks, and literate pro- grams. This step
would make it easier to port noweb to nonUnix platforms, and it
could make noweb run much faster. Other developments might include
constructing new pipeline stages to support language-dependent
operations like macro processing, pretty-printing, and automatic
iden- tifier cross-reference.
These changes would extend no- webs capabilities, but noweb is
already
quite capable of supporting complex programs and documents. It
and relat- ed tools are less capable of supporting a modem
word-processing style. The word processors noweb currently
supports, Tex, LaTex, and HTML, all use the old batch model of word
processing. Today, ma.ny authors prefer WYSIWYG word processors
like Framemaker, WordPerfect, or Microsoft Word. Kean Colleges
Wittenberg has developed a noweb- like system called WinWordWeb
based on Word. Because of Words limitations, including its secret
propri- etary data format, he could not reuse any of nowebs
implementation, but the design is the same.
The challenge for literate pro- gramming today is getting it
into use. Noweb helps by eliminating clutter and complexity.
Supporting modern word processors would eliminate
ACKNOWLEDGEMENTS Mark Weisers invaluable encouragement provided
the impetus for me to write this
oaner, which I did while visitine the Comnuter Science
Laboratorv of the Xerox Palo Alto kesearch Center. David Hanso:
sugEesteh and provided the cpif brogram. Preston Briggs developed
many of the ideas used in-nowebs indexing, and he con&ib;ted
code used in &e of the nineline stages. Bill Trost wrote the
first HTMLnineline stage. Dave Love nrovided
I I I 1 1 much-needed LaTex expertise. Comments from Hanson and
from the anonymous referees stimulated me to improve the paper. The
development of noweb was supported by a Fannie and John Hertz
Foundation Fellowship.
REFERENCES 1. D.E. Knuth, Literate Programming, Stanford
University, Stanford, Calif., 1992.
2. P.J. Denning, Announcing Literate Programming, &mm. ACM,
July 1987, p. 593.
3. K. Guntermann and J. Schrod, Web Adapted to C, TUGBoat, Oct.
1986, pp. 134-137.
4. S. Levy, Web Adapted to C, Another Approach, TUGBoat, April
1987, pp. 12-13.
5. N. Ramsey, Literate Programming: Weaving a
Language-Independent Web, Connn. ACM, Sept. 1989, pp.
1051-1055.
6. H. Thimbleby, Experiences of Literate Programming Using CWeb
(a Variant of Knuths Web), Cmnputer3ouma1, 1986, pp. 201.2 11,
7. N. Ramsey and C. Marceau, Literate Programming on a Team
Project, Sofnuare -Pm&e 6 Eqrit=nce, July 1991, pp.
677-683.
8. C. J. Van Wyk, Literate Programming: An Assessment, Comm.
ACM, Mar. 1990, pp. 361.365.
9. D.E. Km&, The Web System of Structured Documentation,
Tech. Report 980, Computer Science Dept., Stanford Univ., Stanford,
Calif., 1983.
another barrier, making it possible to write literate programs
without first learning a new word-processing lan- guage like LaTex
or HTML.
More must be learned about suit- able ways of structuring
literate pro- grams, about whether hypertext is a useful
alternative, and about what other kinds of documents literate pro-
grams should resemble. What place does literate programming have
for the majority of programmers, who are not writing for
publication? In the near term, I suspect the best use for literate
programming will be to sup- port rapid prototyping, providing a
simple and reliable way of document- ing the design decisions made
in, and the lessons learned from, the proto- type. In the long
term, I hope that simple, extensible tools like noweb will lead
everyone to appreciate the benefits of literate programming. +
Norman Ramsey is a research scientist at Bellcore. His research
interests are the construc- tion of software that is easy to
understand and to retar- get to different machines. His recent work
includes a retargetable debugger and a toolkit that helps build
debuggers and other programs that manipulate machine code.
Ramsey received a PhD in computer science from Princeton
University. He is a member of ACM.
Address questions about this article to Ramsey at Bellcore, 445
South Street, Morristown, NJ 07960; [email protected]. Noweb can
be obtained by anonymous ftp from CLAN, the Comprehensive Tex
Archive Network, in directory web/noweb. CTAN replicas appear on
hosts ftp.shsu.edu, ftp.tex.ac.uk, and ftpani-stuttgade. Nowebs
World-Wide-Web page is located at
ftp://bellcore.com/pub/norman/noweb.
IEEE SOFTWARE 105