Symbolics Architecture - Gwern · SymbolicsArchitecture DavidA.Moon Symbolics,Inc. Thisarchitecture enablesrapid developmentand efficientexecutionof large, ambitious applications.

Symbolics Architecture

David A. Moon

Symbolics, Inc.

This architectureenables rapiddevelopment andefficient execution oflarge, ambitiousapplications. Anunconventional designavoids trading offsafety for speed.

W hat is an architecture? In com-puter systems, an architectureis a specification of an inter-

face. To be dignified by the name architec-ture, an interface should be designed for along lifespan and should connect systemcomponents maintained by different orga-nizations. Often an architecture is part ofaproduct definition and defines character-istics on which purchasers of that productrely, but this is not true of everything thatis called an architecture. An architecture ismore formal than an internal interface be-tween closely-related system components,and has farther-reaching effects on systemcharacteristics and performance.A computer system typically contains

many levels and types of architecture. Thisarticle discusses three architectures de-fined in Symbolics computers:

(1) System architecture-defines howthe system appears to end users and appli-cation programmers, including the char-acteristics of languages, user interface,and operating system.

(2) Instruction architecture-definesthe instruction set of the machine, thetypes of data that can be manipulated bythose instructions, and the environment inwhich the instructions operate, for exam-ple subroutine calling discipline, virtualmemory management, interrupts and ex-ception traps, etc. This is an interface be-tween the compilers and the hardware.

(3) Processor architecture-defmes theoverall structure of the implementation ofthe instruction architecture. This is an in-terface between the firmware and thehardware, and is also an interface betweenthe parts of the processor hardware.

System architectureSystem architecture defines how the sys-

tem looks to the end user and to the pro-grammer, including the characteristics of

languages, user interface, and operatingsystem. System architecture defines theproduct that people actually use; the otherlevels of architecture define the mecha-nism underneath that implements it. Sys-tem architecture is implemented by soft-ware; hardware only sets bounds on whatis possible. System architecture defines themotivation for most of the design choicesat the other levels of architecture. This sec-tion is an overview of Symbolics systemarchitecture.The Symbolics system presents itself to

the user through a high-resolution bitmapdisplay. In addition to text and graphics,the display contains presentations of ob-jects. The user operates on the objects bymanipulating the presentations with amouse. The display includes a continuous-ly updated reminder of the mouse com-mands applicable to the current context.Behind the display is a powerful symbolprocessor with specialized hardware andsoftware. The system is dedicated to oneuser at a time and shares such resources asfiles, printers, and electronic mail withother Symbolics and non-Symbolics com-puters through both local-area and long-distance networks of several types. Thelocal-area network is integral to systemoperation.The system is designed for high-

productivity software development bothin symbolic languages, such as CommonLisp' and Prolog, and in nonsymboliclanguages, such as Ada and Fortran. It isalso designed for efficient execution oflarge programs, particularly in symboliclanguages, and delivery of such programsto end users. The system is intended to beespecially suited to complex, ambitious ap-plications that go beyond what has beendone before; thus it provides facilities forexploratory programming, complexitymanagement, incremental construction ofprograms, and so forth. The operatingsystem is written in Lisp and the architec-

0018-9162/87/0100-0043S01.00 © 1987 IEEE 43January 1987

tural concept originated at the MIT Artifi-cial Intelligence Laboratory. However, ap-plications are not limited to Lisp and Al.Many non-Al applications that are com-plex enough to be difficult on an ordi-nary computer have been successfullyimplemented.

Meeting these needs requires an extraor-dinary system architecture-just anotherPC or Unix clone won't do. The intendedapplications demand a lot of processorpower, main and virtual memory size, anddisk capacity. The system must provide asmuch performance as possible without ex-ceeding practical limits on cost, and com-puting capacity must not be diluted bysharing it among multiple users. Thesepurely hardware aspects are not sufficient,however. The system must also improveboth the speed ofsoftware production andthe quality of the resulting software byproviding a more complete substrate onwhich to erect programs than has beencustomary. Programmers should not behanded just a language and an operatingsystem and be forced to do everything elsethemselves.At a high level, the Symbolics substrate

provides many facilities that can be incor-porated into user programs, such as user-interface management, embedded lan-guages, object-oriented programming,and networking. At a low level, the sub-strate provides full run-time checking ofdata types, of array subscript bounds, ofthe number ofarguments passed to a func-tion, and of undefined functions and vari-ables. Programs can be incrementallymodified, even while they are running,and information needed for debugging isnot lost by compilation. Thus the edit-compile-test program development cyclecan be repeated very rapidly. Storage man-agement, including reclamation of spaceoccupied by objects that are no longer inuse, is automatic so that the programmerdoes not have to worry about it; incre-mental so that it interferes minimally withresponse to the user; and efficient becauseit concentrates on ephemeral objects,which are the best candidates for reclama-tion. The system never compromises safe-ty for the sake of speed. (A notoriousexception, the dynamic rather than indef-inite extent of &rest arguments, is recog-nized as a holdover from the past that isnot consistent with the system architectureand will certainly be fixed in the future.)

In an ordinary architecture, such fea-tures would substantially diminish perfor-mance, requiring the introduction ofswitches to turn off the features and regain

speed. Our system architecture deems thisunacceptable, because complex, ambi-tious application programs are typicallynever finished to the point where it is safeto declare them bug-free and remove run-time error-checking. We feel it is essentialfor such applications to be robust whendelivered to end users, so that when some-thing unanticipated by the programmerhappens, the application will fail in an ob-vious, comprehensible, and controlledway, rather than just supplying the wronganswer. To support such applications, asystem must provide speed and safety atthe same time.

Symbolics systems use a combination ofapproaches to break the traditional dilem-ma in which a programmer must chooseeither speed or safety and comfortablesoftware development:

* The hardware performs low-levelchecking in parallel with computation andmemory access, so that this checking takesno extra time.

* Machine instructions are generic. Forexample, the Add instruction is capable ofadding any two numbers regardless oftheir data types. Programs need not knowahead of time what type of numbers theywill be adding, and they need no declara-tions to achieve efficiency when using onlythe fastest types of numbers. Automaticconversion between data types occurswhen the two operands of Add are not ofthe same type.

* Function calling is very fast, yet doesnot lose information needed for debug-ging and does not prevent functions frombeing redefined.

* Built-in substrate facilities are alreadyoptimized and available for programmersto incorporate into their programs.

* Application-specific control ofvirtual-memory paging is possible. Pre-paging, postpurging, multipage transfers,and reordering of objects to improvelocality are supported. 2

These benefits are not without costs:* Both the cost and the complexity of

system hardware and software are in-creased by these additional facilities.

* Performance optimization is notalways automatic. Programmers still mustsometimes resort to metering tools. Decla-rations are available to optimize certain dif-ficult cases, but their use is much less fre-quent than in conventional architectures.

Why Lisp machines? This is really threequestions:

(1) Why dedicate a computer to eachuser instead of time-sharing?

(2) Why use a symbolic system architec-ture?

(3) Why build a symbolic system archi-tecture on unconventional lower-levelarchitectures?

Why dedicate a computer to each userinstead oftime-sharing? This seemed likea big issue back in 1974 when Lispmachines were invented, but perhaps bynow the battle has been won. A reportfrom that era3 states these reasons forabandoning time-sharing:

* Time-sharing systems degrade underheavy load, so work on large, ambitiousprograms could only be conducted in off-peak hours. In contrast, a single-user sys-tem would perform consistently at anytime of day.

* Performance was limited by the speedof the disk when running programs toolarge to fit in main memory. Dedicating adisk to each user would give better perfor-mance.The underlying argument was that in-

creasing program size and advancing tech-nology, making capable processors muchless expensive, had eliminated the econo-my of scale of time-sharing systems. Theoriginal purpose of time-sharing was toshare expensive hardware that was onlylightly used by any individual user. Theserendipitous feature of time-sharing wasinteruser communication. Both of thesepurposes are now served by local-area net-working. Expensive hardware units arestill shared, but the processor is no longeramong them.

These arguments apply to all types ofdedicated single-user computers, evenPCs, not only to symbolic architectures.

Why use a symbolic system architec-ture? Many users who need a platform forefficient execution of large symbolicprograms, a high-productivity softwaredevelopment environment, or a system forexploratory programming and rapid pro-totyping have found symbolic languagessuch as Lisp and symbolic architecturessuch as this one very beneficial. Programscan be built more quickly, and fit moresmoothly into an integrated environment,by incorporating such built-in substratefacilities as automatic storage manage-ment and the flexible display with itspresentation-based user interface. The fullerror-checking saves time when devel-oping new programs. The programmer canconcentrate on the essential aspects of theprogram without fussing about minor

COMPUTER44

mistakes, because the machine will catchthem. The ability to change the program in-crementally greatly speeds up development.Once the initial exploration phase is

over, it is possible to turn prototypes intoproducts quickly. Good performance canbe achieved without a lot of programmereffort and without sacrificing those devel-opment-oriented features that are also ofvalue later in the program's life, duringmaintenance and enhancement.

Why build a symbolic system archi-tecture on unconventional lower-level ar-chitectures? Conventional instruction ar-chitectures are optimized to implementsystem architectures very different fromSymbolics'. For example, they have nonotions of parallel error-checking andgeneric instructions; they often obstructthe implementation of a fast function call,especially one that retains error-checking,incremental compilation, and debugginginformation; and they usually pay greatattention to complex indexing and mem-ory addressing modes, which have littleutility for symbolic languages. Implement-ing Symbolics' system architecture on aconventional instruction architecturewould force a choice between safety andperformance: we could not have both. Thetype of software we are interested in eithercould not run at all or would require muchfaster hardware to achieve the same per-formance. Later I will discuss the specialaspects of Symbolics' instruction and pro-cessor architectures that make them moresuitable to support a symbolic system.Comparing the performance of ma-

chines with equivalent cycle times and dif-ferent architectures can sometimes be il-luminating. The 3640, VAX 11/780, and10-MHz 68020 all have cycle times ofabout 200 ns. (The 68020 takes two clockcycles to perform a basic operation, so its100-ns nominal cycle time is equivalent tothe other two machines' 200 ns.) On a For-tran benchmark (single-precision Whet-stone), the VAX is 1.8 times the speed ofthe 3640 (750 versus 400). With floating-point accelerators on each machine, theratio is 2.1. On the Lisp benchmarkBoyer, 4 the 3640 is 1.75 times the speed ofthe VAX running Portable Standard Lisp,3.9 times the speed of the VAX runningDEC Common Lisp, and 2.1 times thespeed of the 68020 running Lucid Com-mon Lisp. (The 68020 time at 10MHz wasestimated by multiplying its 16-MHz timeby 1.6, no doubt an inaccurate procedure.)The VAX and 68020 programs were com-piled with run-time error checking dis-

abled and safety compromised, while the3640 was doing full checking as always.Like any benchmark figures presentedwithout a compIete explanation of whatwas measured, how it was measured, whatfull range of cases was tested, and how itcan be reproduced in another laboratory,these numbers should not be taken veryseriously. However, they give some idea ofthe effect of optimizing the instruction ar-chitecture to fit the system architecture.One could say that the VAX is three timesbetter at Fortran than at Lisp and that the68020 and VAX are similar for Lisp. Thesefigures also show the effect of differentcompiler strategies on identical hardware.

This comparison was scaled to removethe effect of cycle time and show only theeffect of architecture. This is not com-pletely fair to the conventional machines,because in general they can be expected tohave faster cycle times than a symbolicmachine. Running the 68020 at full speedand using a newer model of the VAXwould have improved their times. Hard-ware technology of conventional ma-chines will always be a couple of yearsahead of symbolic hardware, in cycle timeand price/cycle, because of the drivingforce of their larger market. It's interest-ing to note that this hardware advantageapplies only to the processor, which usual-ly contributes less than 25 percent of sys-tem cost. Power supplies, sheet metal, anddisk drives don't care whether the archi-tecture is symbolic; they cost and performthe same for equivalent configurations ofeither type of machine.

This comparison is not completely fairto the symbolic machine, either. Softwareexploiting the full capabilities of the sym-bolic machine should have been com-pared, but this software won't run at all onthe conventional machines. Softwaretechnology on symbolic machines willalways be a couple of years ahead of con-ventional machines, because it is built on amore powerful substrate using more pro-ductive tools.

Performance. The best publishedanalysis of performance of Lisp systemsappears in Gabriel's work.4 The various3600 models perform quite capably onthese benchmarks, as can be seen from aperusal of the book. Some of the reasonsfor such good performance will becomeapparent as we proceed.

However, one must always ask exactlywhat a benchmark measures. A problemwith Gabriel's benchmarks is that they arewritten in a least common denominator

dialect that represents Lisp as it was in1970. This makes it easier to benchmark abroad spectrum of machines, but makesthe benchmarks less valid predictors oftheperformance of real-world programs.Since 1970, there have been many ad-vances in the understanding of symbolicprocessing and in the range of its applica-tions. The basic operations measured bythese benchmarks, such as function call-ing, small-integer arithmetic, and list pro-cessing, are still important today, butmany other operations not measured areof equal importance. These benchmarksdo not use the more modern features ofCommon Lisp (such as structures, se-quences, and multiple values), do not useobject-oriented programming, and aregenerally not affected by system-widefacilities such as paging and garbage col-lection. As predefined, portable pro-grams, these benchmarks cannot benefitfrom the unusual aspects of Symbolics sys-tem architecture, such as large programsupport, full run-time safety, efficientstorage management, substrate facilities,support for languages other than Lisp,and faster development of efficientprograms.

Instruction architectureSymbolics' philosophy is that different

levels of architecture should be free tochange independently, to satisfy differentgoals and constraints. Users see only thesystem architecture, leaving the lowerlevels, such as the instruction architecture,free to change to utilize available technol-ogy, maximize performance, or minimizecost. Most other computer families allowusers to depend on the instruction archi-tecture and therefore are not free to changeit. It tends to be optimized for only the firstmember of the family. Later implementa-tions using newer technology, as well asimplementations at the high or low ex-tremes of the price/performance curve,are penalized by the need for compatibilitywith an unsuitable instruction architecture.

Symbolics system architecture has beenimplemented on three different instruc-tion architectures. The LM-2 machine,based on the original MIT Lisp Machine, 3was the first; it was discontinued in 1983.The 3600 family ofmachines uses a secondinstruction architecture and three dif-ferent processor architectures. A thirdinstruction architecture, appropriate forVLSI implementation, is being used ina line of future products now underdevelopment.

January 1987 45

33 3231 0

Int, | 32-bit IntegerITag33 32 31 0

FloT 32-bit. Single-Precision floating Point lTag33 3231 28 2? 0

|Major Minor 28-bit AddressTag ITag28b Ades

Figure 3. A string containing the seven characters "Exam-ple" stores each character in a single 8-bit byte. Bytes are.0j11ZA ;'ntn 1_hit intnour nhahtePUKC XInU OLUt *ltlCgtr UUJM5

'Stored Representation' 3S 34 33 28 27?

Cons Symbol Tag Address - BOB

Figure 1. An object reference is a 34-bit quantity, consisting End List Tag Addresseither of a 32-bit data word with a 2-bit data type tag, or of 3S 34 33 28 2? )a 28-bit address with a 6-bit data type tag. 3$ 34 33 28 2? / o

Cons~Symbol Tag Address-4RAY

End Nil Tag Address- NIL

3s 28 2? 0 3S 34 33 28 2 0

Array HeaderTag Type and Length Figure 4. An ordinary list of two elements requires four

m/X/mbol Tao Address FOOwords of storage. Unlike arrays, lists do not have headers.

Int |259|Symbol Tag |Address _ AR 33433 28 27 e

Next Symbol Tag Address - BOB3S 34 33 3231 28 2? 0

Figure 2. An array of three elements-FOO, 259, and End Symbol Tag Address - RAYBAR-consists of a header word defining the type and 3s 34ss 28 27 0length of the array, followed by an object reference for each Figure 5. A compact list of two elements requires two wordsarray element. of storage. It uses the cdr code to eliminate two object

references.

(Reprinted from "Architecture of the Symbolics 3600," 12th Int'l Symp. Computer Architecture, © 1985 IEEE.)

The following sections summarize the in-struction and processor architectures of the3600 family, discuss some of the designtradeoffs involved, and show how these ar-chitectures are especially effective at sup-porting the desired system architecture.Further details can be found elsewhere. 5,6

Data are object references. The funda-mental form of data manipulated by anyLisp system is an object reference, whichdesignates a conceptual object. The valuesof variables, the arguments to functions,the results of functions, and the elementsof lists are all object references. There canbe more than one reference to a given ob-ject. Copying an object reference makes anew reference to the same object; it doesnot make a copy of the object.

Variables in Lisp and variables in con-ventional languages are fundamentallydifferent. In Lisp, the value ofa variable isan object reference, which can refer to anobject ofany type. Variables do not intrin-sically have types; the type of the object isencoded in the object reference. In a con-ventional language, assigning the value ofone variable to another copies the object,possibly converts its type, and loses itsidentity.A typical object reference contains the

address of the object's representation instorage. There can be several object refer-ences to a particular object, but it has onlya single stored representation. Side-effectsto an object, such as changing the contentsof one element of an array, are imple-mented by modifying the stored represen-

tation. All object references address thesame stored representation, so they all seethe side-effect.

In addition to such object references byaddress, it is possible to have an immediateobject reference, which directly containsthe entire representation ofthe object. The'advantage is that no memory needs to beallocated when creating such an object.The disadvantage is that copying an im-mediate object reference effectively copiesthe object. Thus, immediate object refer-ences can only be used for object types thatare not subject to meaningful side-effects,have a small representation, and need veryefficient allocation of new objects. Smallintegers (traditionally called fixnums) andsingle-precision floating-point numbersare examples of such types.

COMPUTER46

In the 3600 architecture, an object ref-erence is a 34-bit quantity consisting of a32-bit data word and a 2-bit major datatype tag. The tag determines the interpre-tation of the data word. Often the dataword is broken down into a 4-bit minordata type tag and a 28-bit address (seeFigure 1). This variable-length taggingscheme accommodates industry-standard32-bit fixed and floating-point numberswith a minimum of overhead bits for tag-ging. Addresses are narrower than num-bers to make additional tag bits availablefor the many types of objects that Lispuses.

Addresses are 28 bits wide and designate36-bit words in a virtual memory with256-word pages. The address granularity isa word, rather than a byte as in many othermachines, because the architecture isobject-oriented and objects are alwaysaligned on a word boundary. This resultsin one gigabyte of usable virtual memory.It is interesting to note that the 3600's28-bit address can actually access the samenumber of usable words as the VAX's32-bit address, because the VAX expendstwo bits on byte addressing and reservesthree-fourths of the remaining addressspace for the operating system kernel andthe stack (neither of which is large).

In addition to immediate and by-ad-dress object references, the 3600 also usespointers, a special kind of object referencethat does not designate an object as such.A pointer designates a particular locationwithin an object or a particular instructionwithin a compiled function. Pointers areused primarily for system programming. 7

Stored representations of objects. Thestored representation of an object is con-tained in some number of consecutivewords ofmemory. Each word may containan object reference, a header, a specialmarker, or a forwarding pointer. The datatype tags distinguish these types of words.For example, an array is represented as aheader word, containing such informationas the length of the array, followed by onememory word for each element of the ar-ray, containing an object reference to thecontents of that element (see Figure 2). Anobject reference to the array contains theaddress of the first memory word in thestored representation of the array.A header is the first word in the stored

representation of most objects. A headermarks the boundary between the storedrepresentations of two objects. It containsdescriptive information about the objectthat it heads, which can be expressed as

either immediate data or an address, as inan object reference.A special marker indicates that the

memory location containing it does notcurrently contain an object reference. Anyattempt to read that location signals an er-ror. The address field of a special markerspecifies what kind of error should be sig-nalled. For example, the value cell of anuninitialized variable contains a specialmarker that addresses the name of thevariable. An attempt to use the value of avariable that has no value provokes an er-ror message that includes the variable'sname.Aforwarding pointer specifies that any

reference to the location containing itshould be redirected to another memorylocation, just as in postal forwarding.These are used for a number of internalbookkeeping purposes by the storagemanagement software, including the im-plementation of extensible arrays.Some objects include packed data in

their stored representation. For example,character strings store each character in asingle 8-bit byte (see Figure 3). For unifor-mity, the stored representation of an ob-ject containing packed data remains a se-quence of object references. Each word isan immediate object reference to an in-teger, whose 32 bits are broken down intopacked fields as required, such as four8-bit bytes in the case ofa character string.A word in memory consists of36 bits, of

which I have already explained 34. When amemory word contains a header or amachine instruction, the remaining twobits serve as an extension of the rest of theword. When a memory word contains anobject reference, a special marker, or aforwarding pointer, the remaining two bitsare called the cdr code. The representationof conses and lists (Steele, p. 26) 1 savesone word by using the cdr code instead ofaseparate header to delimit the boundariesofthese small objects. In addition, lists arerepresented compactly by encoding com-mon values of the cdr in the cdr code in-stead of using an object reference (seeFigures 4 and 5).

Tagging every word in memory pro-duces these benefits:

* All data are self-describing and the in-formation needed for full run-time check-ing of data types, array subscript bounds,and undefined functions and variables isalways available.

* Hardware can process the tag in par-allel with other hardware that processes therest of a word. This makes it possible to op-timize safety and speed simultaneously.

* Generic instructions alter their opera-tion according to the tags of theiroperands.

* Automatic storage management issimple, efficient, and reliable. It can beassisted by hardware, since the data struc-tures it deals with are simple and indepen-dent of context. The details appear else-where. 8,5

* Data use less storage due to compactrepresentations. Programs use less storagedue to generic instructions and because tagchecking is done in hardware, notsoftware.The cost of tagging is that more main

memory and disk space are required tostore numerical information. Each mainmemory word includes 7 bits for errordetection and correction, so the 4 tag bitsadd 10 percent. Each 256-word disk sectorincludes about 128 bytes of formattingoverhead, so the 4 tag bits per word add 11percent. We feel that the benefits amplyjustify these costs.

Instruction set. The 3600 architectureincludes an instruction set produced by thecompilers and executed by a combinationof hardware and firmware. All instruc-tions are 17 bits long, consisting of a 9-bitoperation field and an 8-bit argumentfield. Instructions are packed two perword, which is important for performancein two ways:

(1) Dense code decreases paging over-head by making programs occupy fewerpages and

(2) simplifies the memory system bydecreasing the ratio ofrequired instructionfetch bandwidth (in words/second) toprocessor speed (in instructions/second).

Every instruction is contained in a com-piled function, which consists of somefixed overhead, a table of constants, and asequence of instructions (see Figure 6).The table of constants contains objectreferences to objects used by the instruc-tions, including locative pointers to defmi-tion cells of functions called by this func-tion. Indirection through the definitioncell ensures that if a function is redefinedits callers are automatically linked to thenew defmition.

Instructions operate in a stack machinemodel: Many instructions pop theiroperands off the stack and push theirresults onto the stack. In addition to these0-address instructions, there are 1-addressinstructions, which can address any loca-tion in the current stack frame. In this waythe slots of the current stack frame servethe same purpose as registers. The

January 1987 47

l a

SizeHeader Tao

Number of Arguments

Debugger Information

This Function's Definition Cell

Symbol Tag Address - FOO

Locative Tag Address Called Definition Cell

Tag Single-Precision Floating PointInt Instruction 1 Entry InstructionTag (low 16 bits)

Instruction 3(low 16 bits)

Instruction 2(low 16 bits)

16 IS

0verhead

ChnSan

sI

de

SP-

FP-I

Direction of Stack GrowthCallee's Temporary Storage

Callee's Local VariablesCallee's Copy of Arguments

Callee FunctionMiscellaneous Status Bits

Caller's Saved PCCaller's Saved SPCaller's Saved FP -

Caller's Copy of Arguments

Caller's Frame

I - I - - I lk.- .. .- . %--~~~~~*1

Figure 6. A compiled function consists of four words of Flgure 7. A stack frame consists of the caller's copy of theoverhead, a table of constants and external references, and arguments, five header words, the cailee's copy of thea sequence of 17-bit instructions, packed two per word. arguments, local variables, and temporary storage. The

frame-pointer (FP) and stack-pointer (SP) registers addressthe current stack frame.

(Reprinted from "Architecture of the Symbolics 3600," 12th Int'l Symp. Computer Architecture, © 1985 IEEE.)

1-address instructions include multi-operand instructions, which pop all oftheir operands except the last off the stackand take their last operand from a locationin the current stack frame.

There are several ways an instructioncan use its argument field. Table 1 lists theways to develop the address of an operandin the stack or in memory by adding argu-ment to a base address. Table 2 lists non-address uses of argument. Each individualopcode only uses argument in a single way;there are no addressing modes. The moti-vation for implementing this particular setof arguments is to provide for constants(including small integers as a special case),all types of Lisp variables (local andnonlocal lexical, special, structure slot, in-stance), branching, and byte fields. Bytefields were included because they areheavily used in system programming.Many instructions are simply Lisp func-

tions directly implemented by hardwareand firmware, rather than built up fromother Lisp functions and implemented ascompiled instructions. These Lisp-func-tion instructions are known as built-ins.They take a fixed number of argumentsfrom the stack and from their argumentfield. They return a fixed number of valueson the stack. Examples of built-ins are eq,symbolp, logand (with two arguments),car, cons, member, and aref (with twoarguments). I The criterion for imple-menting a Lisp function as a built-in in-

struction is that hardware is only used tooptimize key performance areas. When aLisp function is not critical to system per-formance, or hardware implementation ofit cannot achieve a major speedup, it re-mains in software where it is easier tochange, to debug, and to optimize.

Using an instruction set designed forLisp rather than adapting one designed forFortran or for a hand-crafted assemblylanguage enhances safety and speed. 3600instructions always check for errors andexceptions, so programs need not executeextra instructions to do that checking. In-structions operate on tagged data, so extrainstructions to insert and remove tags arenot needed. Instructions are generic, sodeclarations are not needed to tell the com-piler how to select type-specific instruc-tions and translate between data formats.In contrast, Lisp compilers for conven-tional machines9 must generate extrashifting or masking instructions tomanipulate tags, must use multi-instruc-tion sequences for simple arithmeticoperations unless there are declarations,and are always having to compromise be-tween safety and speed.

Unlike many machines, the 3600 doesnot have indexed and indirect addressingmodes. Instead it has instructions that per-form structured, object-oriented opera-tions such as subscripting an array orfetching the car of a list. This fits theinstruction set more closely to the needs of

Lisp and at the same time simplifies thehardware by reducing the number ofinstruction formats to be decoded.

Function call. Storage whose lifetime isknown to end when a function returns (oris exited abnormally) is allocated in threestacks, rather than in the main objectstorage heap, to increase efficiency. Thecontrol stack contains function-nesting in-formation, arguments, local variables,function return values, and small stack-allocated temporary objects. The bindingstack records dynamically bound vari-ables. I The data stack contains stack-allocated temporary objects. This articleconcentrates on the control stack, which isthe most critical to performance.The protocol for calling a function is to

push the arguments onto the stack, thenexecute a Call instruction that specifies thefunction to be called, the number of argu-ments, and what to do with the valuesreturned by the function. When the func-tion returns, the arguments have beenpopped off the stack and the values (ifwanted) have been pushed on. Note thesimilarity in interface between functionsand built-in instructions.

Every time a function is called, a newstack frame is built on the control stack. Astack frame consists of the caller's copy ofthe arguments, five header words, thecallee's copy of the arguments, local vari-ables, and temporary storage, including

COMPUTER

I

3s 2s 27 0

IIII

II

48

arguments being prepared for calling thenext function (see Figure 7). The currentstack frame is delimited by the frame-pointer (FP) and stack-pointer (SP) regis-ters, which are available as base registers ininstructions that use their argument fieldto address locations in the current stackframe.A compiled function starts with a se-

quence of one or more instructions knownas the entry vector. The first instruction inthe entry vector, the entry instruction,describes how many arguments the func-tion accepts, the layout ofthe entry vector,and the size of the function's constantstable (see Figure 6), and tells the Call in-struction where in the entry vector totransfer control. The Call instruction andthe entry vector cooperate to copy thearguments to the top of the stack (creatingthe callee's copy), convert their arrange-ment in storage if required, supply defaultvalues for optional arguments that thecaller does not pass, handle the &rest andApply features of Common Lisp, andsignal an error if too many or too fewarguments were supplied. The details arebeyond the scope of this article.

Function return. A function returns byexecuting a Return instruction whoseoperands are the values to be returned.The value disposition saved in the frameheader by Call controls whether Returndiscards the values, returns one value onthe stack, returns multiple values with acount on the stack, or returns all the valuesto the caller's caller.

Return removes the current frame fromthe stack and makes the caller's frame cur-rent, by restoring the saved FP, SP, andPC registers. If the cleanup bits in theframe header are nonzero, special actionmust be taken before the frame can beremoved. Return takes this action, clearsthe bit, and tries again. Cleanup bits areused to pop corresponding frames fromthe binding and data stacks, for unwind-protect, I for debugging and metering pur-poses, and for stack buffer housekeeping.

Motivations of the function call disci-pline. The motivations for this particularfunction-calling discipline are

* to implement full Common Lispfunction calling efficiently,

* to be fast, so that programmers willwrite clear programs,

* to retain complete information for theDebugger, and

* to be simple for the compiler.

To implement full Common Lisp func-tion calling efficiently requires matchingthe arguments supplied bythe caller-withnormal function calling or with Apply-o the normal, &optional, and &rest pa-ameters of the callee, and generatingdefault values for unsupplied optionalarguments. The entry vector takes care ofthis. Common Lisp's &key parameters areimplemented by accepting an &rest pa-rameter containing the keywords andvalues, then searching that list for each&key parameter. Multiple values are pass-ed back to the caller on the stack, with acount. The caller reconciles the number ofvalues retumed with the number of valuesdesired.

Function calling historically has been amajor bottleneck in Lisp implementa-tions, both on stock hardware and onspecially-designed Lisp machines. It is im-portant for function calling to be as fast aspossible. If it is not, efficiency-mindedprogrammers will distort their program-ming styles to avoid function calling, pro-ducing code that is hard to maintain, andwill waste a lot of time doing optimizationby hand that should have been done by theLisp implementation itself. The 3600'sfunction call mechanism attains goodspeed (fewer than 20clock cycles for a one-argument function call and return whenno exceptions occur) by using a stack buf-fer to minimize the number of memoryreferences required, by optimizing thestack frame layout to maximize speedrather than to minimize space, by arrang-ing for the checks for slower exception

cases to be fast (for example, Retum sim-ply checks whether the cleanup bits arenonzero), and by using the entry vectormechanism to simplify run-time decision-making.The information that the debugger can

extract from a stack frame includes the ad-dress of the previous frame (from thesaved FP in the header), the function run-ning in that frame (from the header), thecurrent instruction in that function (fromthe PC saved in the next frame), the argu-ments (from the stack-the header speci-fies the argument count and arrangement),the local variables (from the stack), andthe names of the arguments and localvariables (from a table created bythe com-piler and attached to the function).The compiler is simple because there is

only a single calling sequence. Any call cancall any function, and the argument pat-tems are matched up at run time. Every-thing is in the stack and no register-savingconventions are required, since there areno general-purpose registers.The principal costs of this functioh-

calling discipline are the five-word headerin each frame and the copying of argu-ments to the top of the stack. The time tocreate the header is not a problem, becauseit is overlapped with necessarymemory ac-cesses, but the space occupied by theheader and by the extra copy of thearguments is a substantial fraction of thetypical frame size. This extra space is not amajor problem because the stack buffer islarge enough (1024 words) that it rarelyoverflows.

January 1987 49

Argument copying is necessary becauseCommon Lisp functions do not take afixed number of arguments. In a functionwith &optional parameters, some of thearguments are supplied by the caller whilethe others are defaulted by the entry vec-tor. The location in the stack frame of anargument must not depend on whether itwas supplied or defaulted, since this variesfrom one call to the next, but the compilermust know the location in order to gener-ate code to access the argument. The entryvector could not put default values in thestandard location if the arguments werenot at the top of the stack, because theframe header would be in the way. In afunction with an &rest parameter, thecaller can supply an arbitrary number ofarguments. If these arguments were at thetop of the stack, they would make it im-possible for the compiler to know the loca-tions of the local variables, which arepushed after the arguments.Copying the arguments that are not part

of an &rest parameter to the top of thestack solves both these problems. It givesthe function complete control over the ar-rangement of its stack frame and makesthe stack depth constant. Argument copy-ing takes extra time, but typically only oneclock cycle per argument, which is fasterthan the run-time decision-making thatwould otherwise be necessary to access anoptional argument or a local variable.

Processor architectureThree processor architectures are used

in three representative models of the 3600family: 3640, 3675, and 3620. Since theyall implement the same instruction archi-tecture, there are substantial similaritiesamong their processor architectures. Theydiffer due to implementation in differenttechnologies and choices of differentcost/performance tradeoffs, but this over-view largely glosses over the differences.The main goal of each of these proces-

sor architectures is to implement the in-struction architecture described earlierwith the highest performance achievablewithin its particular cost budget. The costsare generally higher than most worksta-tions but lower than most minicomputers.For high performance the number ofclockcycles required to execute an instructionmust be minimized; the goal is to execute anew instruction every cycle. Because thesystem architecture specifies that safetyand convenience must not be compro-mised to increase performance, instruc-

tions typically make many checks for er-rors and exceptions. Minimizing the cyclecount demands that these checks be per-formed in parallel, not each in a separatecycle.Adequate bandwidth for access to

operands is also required. In the 3600 in-struction architecture, a simple instructioncan read two stack locations and write onestack location. One ofthese is a location inthe current stack frame specified by an ad-dress in the instruction, while the othertwo are at the top of the stack. Operandsare supplied by the stack buffer, a1K-word memory that holds up to fourvirtual-memory pages of the stack. Thestack buffer contains all of the currentframe plus as many older frames as hap-pen to fit. When the stack buffer fills up(during Call), the oldest page spills intonormal memory to make room for the newframe. When the stack buffer becomesempty (during Return), pages move fromnormal memory back into the stack bufferuntil the frame being returned to is entirelyin the buffer. The maximum size of a stackframe is limited to what will fit in the stackbuffer. A second stack buffer contains anauxiliary stack for servicing page faultsand interrupts without disturbing the pri-mary buffer.

Associated with the stack buffer are theFP and SP registers, which point to thecurrent frame and to the top of the stack,and hardware for addressing locations inthe current stack frame via the argumentfield of an instruction, which calculates aread address and a write address everyclock cycle. The third operand access isprovided by a duplicate copy of the toplocation in the stack, in a scratchpadmemory, which can be read and writtenevery clock cycle. The SP register is incre-mented or decremented by instructionsthat push or pop the stack.The stack buffer provides the same

operand bandwidth, two reads and onewrite every clock cycle, as in a typical regis-ter-oriented architecture. It has the advan-tage that register saving and restoringacross subroutine calls is not required,since all registers already reside in thestack. As in a register-window design,overhead occurs only when the stack buf-fer overflows or underflows and requires ablock transfer between stack buffer andmain memory. Another advantage is thateach instruction contains only one addressinstead of three, making the instructionssmaller (so that they can be fetched frommain memory more quickly and processedwith less hardware) and allowing more

registers to be addressed. A disadvantageof a stack architecture is that it requiresaddress-calculation hardware, including a10-bit (for a 1K-word buffer) adder. Sinceeach instruction contains only one addressinstead of three, extra instructions aresometimes required to move data to thetop of the stack so they can be addressed.

Instructions are processed by a four-stage pipeline (see Figure 8) under the con-trol ofhorizontal microcode. Microcode isused as an engineering technique, not tocreate a general-purpose emulator thatcould implement alternate instruction ar-chitectures. Knowledge of the instructionarchitecture is built into hardwarewherever that achieves a substantial per-formance improvement.To achieve full performance, instruc-

tions must be supplied to the processor atan adequate rate. Each processor modelhas a different design, with differenttradeoffs.The 3640 uses a four-instruction buffer.

When the buffer is exhausted, or a branchoccurs, microcode reads two words frommemory and refills the instruction buffer.This design uses much less hardware thanthe other two, but provides lower perfor-mance. Refilling the buffer takes fiveclock cycles, so in the worst case the per-formance penalty is about a factor oftwo.With a typical instruction mix, the ob-served slowdown is about 35 percent,because complex instructions such asfunction calls and memory referencesspend more than one cycle in the executestage.The 3675 uses a 2K-instruction cache.

Program loops that fit in the cache executeat full speed, with no instruction fetchingoverhead. An autonomous instructionprefetch unit fills the cache with instruc-tions before they are needed, in parallelwith execution. At the cost ofa substantialincrease in hardware complexity over the3640, this design ensures that the pipelinealmost never has to wait for an instruction.The 3620 uses a six-instruction buffer.

An autonomous instruction prefetch unitfills the buffer in parallel with execution.The 3620 instruction stage is a compro-mise between the other two designs.Straight-line code executes at full speed,but branches execute at 3640 speedbecause they must refill the buffer.The datapath contains several units that

function in parallel (see Figure 9). Simpleinstructions such as data movetnent, arith-metic, logical, and byte-field instructionsexecute in a single clock cycle. For exam-ple, when executing an Add instruction

COMPUTER50

Fgure 8. The instructionprocsing pipeline, withvariations for three 3600family models.

FIgure 9. 3640 datapath,contained in the Executeand Write stages of thepipeline. Other 3600 familymodels have generallysimilar datapaths.

the following activities all take place inparallel:

* The stack buffer fetches the twooperands, one from a calculated address inthe stack buffer memory and the otherfrom the duplicate top-of-stack in thescratchpad memory.

* The fixed-point arithmetic unit com-putes the 32-bit sum of the operands andchecks for overflow. This result is onlyused if both operands are fixnums.

* The optional floating-point ac-celerator, if present, starts computing thesum of the operands and checking forfloating-point exceptions. This result is

only used if both operands are single-floats.

* The tag processor checks the datatypes of the operands.

* The stack buffer accepts the resultfrom the fixed-point arithmetic unit, ad-justs the stack pointer, and in the writestage stores the result at the new top of thestack.

* The decode stage decodes the next in-struction and produces the microinstruc-tion that will control its execution. If thetype-checking unit or either arithmeticunit detects an exception, control isdiverted to a microcode exception handler.

When the operands ofAdd are not bothfixnums, executing the instruction takesmore than one machine cycle and morethan one microinstruction. In the case ofadding two single-floats, the extra time isonly required because the floating-pointarithmetic unit is slower than the fixed-point arithmetic unit. In other cases, extratime is required to convert the operands toa common format, to perform double-precision floating-point operations, or totrap to a Lisp function to add numbers ofless common types.

Memory-reference instructions such asthe car and arefLisp operations are limited

January 1987 51

mainly by the speed of the memory. Car,for example, takes four clock cycles. Com-plex instructions such as Call, Return, andthe Common Lisp member function in-voke microcode subroutines. A widemicroinstruction word and fast microcodebranching minimize the number ofmicroinstructions that need to be exe-

cuted. Simple and memory-reference in-structions can be discovered to be complexat run time because of an exceptional con-dition such as the data type of theoperands.

have described here an unusual sys-tem architecture and presented an

overview of the underlying ar-

chitectures that implement it. When con-

sidering the type of applications that thissystem architecture targets, note how im-portant to their success it is that we com-

promise neither safety nor speed. Withthis in mind, some of the unconventionaldesign choices in these architectures were

made based on rationales with varied bene-fits and costs. For example, a close fit be-tween processor, instruction, and systemarchitectures improves performance, butallowing users to depend on details of theinstruction architecture can interfere with

this. The lack ofthis close fit dissipates thehardware price/performance advantageof conventional architectures when mea-

suring system-level performance on soft-ware suited to symbolic architectures. R

References1. G. L. Steele, Common Lisp, Digital Press,

Burlngton, MA, 1984.2. D. L. Andre, Paging in Lisp Programs, Master's

thesis, University of Maryland, 1986.3. R. D. Greenblatt et al., "The LISP Machine,"

Interactive Programming Environments, eds.D. R. Barstow, H. E. Shrobe, and E. SandewaDl,McGraw-Hil, Hightstown, NJ, 1984.

4. R. P. Gabriel, Performance and Evaluation ofLisp Systems, The MIT Press, Cambridge, MA,1985.

5. D. A. Moon, "Architecture of the Symbolics3600," 12th Int'Symp. ComputerArchitecture,1985, pp. 76-83.

6. Symbolics Technical Summary, Symbolcs Inc,Cambridge, MA, 1985.

7. Symbolics Common Lisp: LanguageDictionary,Symbolics Inc, Cambridge, MA, 1986.

8. D. A. Moon, "Garbage CoDlection in a LargeLisp System," Proc. 1984ACMSymp. LispandFunctional Programming, pp. 235-246.

9. R. A. Brooks et al., "Design of an Optimizing,Dynamically Retargetable Compiler forCommon Lisp," Proc. 1986ACM Conf. Lispand Functional Programming, pp. 67-85.

David A. Moon is a technical director atSymbolics, Inc. Previously, he was a hardwaredesigner, microprogrammer, and writer ofmanuals at Symbolics. His interests includeadvanced software development and architec-tures for symbolic processing.Moon received the BS degree in mathematics

from MIT in 1975.Readers may write to the author at Symbol-

ics, Inc., 11 Cambridge Center, Cambridge,MA 02142. His e-mail address is Moon@Stony-Brook. SCRC.Symbolics.COM on theARPA Internet.

New in 1987 from Macmillan:A practical new textbook on databasedesign and data management!TheDatabase

MARY E.S. LOOMISBook465 pages

* Emphasizes the practical application of principles andthe importance of design in database development.

* Mathematical treatment of concepts has been kept to aminimum-orientation is toward practical applicationsfor both business and engineering.

* Covers the 3-scheme approach for implementing andcontrolling distributed databases.

* Features chapters on: Logical data modelingtechniques, logical design of network databases, anddata dictionaries.

* Each chapter concludes with discussion questions,problems, and exercises.

* Techniques presented throughout text will enablestudents to work successfully with any commercial/research database management system.

Look to Macmillan for your textbook needs. Call Toll-Free1-800-428-3750, or write:

Macmillan Publishing CompanyCollege Division/866 Third Ave/New York, NY 10022

Reader Servke Number 4

RCI IS REACHING NEW PLA TEA US IN PARAMETRIC SOFTWARECOST ESTIMA TING MODELS WITH

SOFTCOST-RUtilizing the Revolutonary Efforts of Dr. R. Tausworthe at the renowned

Jet Propulsion Laboratory, RCI has developed a Cost Estimating Packagethat encompasses the requirements that up until now were just goodIdeas. Only SOFTCOST-A can provide you with features that Include:

* What-If" Capacity that Enables Rapid Analysis of your projectasCost and Schedules

* A Work Breakdown Structure that lets you Tie-In with AutomaticGantt Schedule and Pert Chart Generation

* A way to bound Risk by Computing the Confidence of your estimateof Time, Effort and Size through a Series of Submodels

* Ease of Use and Understanding as well as Support Services withavailable Training, User's Group Annual Conference, QuarterlyUser's Newsletter, Maintenance, Consultations, and now a Tutorialis in the process of being developed.

* An ability to run on IBM PC and Compatibles, and to be Calibratedto your Specific Environment

* Generation of Many Useful Reports for Managers and Cost Analystssuch as: Resources Reports, Input Value Summary Reports, ProjectEstimate Summary Reports and others

* Uses the 1986 version of the popular COCOMO model as a sanitycheck

* Lets you evaluate the implications of ADAtm and Incrementaldevelopement on your workforce allocations decisions

Now Is the time to become one of the Many Successful Organizationswhohove acquied the beerlt of SOFTCOST.R by mdng thir primary Soft-war Cost Estimating Package. For fuhr lnformatbn wrHt or cal today:

AKVyReifer Consultants, Inc.

25550 Hawthome Boulevard, Suite 208Torrance, Califomia 90505/(213) 373-8728

Reader Servke Number 5

Symbolics Architecture - Gwern · SymbolicsArchitecture DavidA.Moon Symbolics,Inc. Thisarchitecture enablesrapid developmentand efficientexecutionof large, ambitious applications.

Documents