LM2 - A LogicMachine Minicomputer

LM2 - A Logic Machine MinicomputerTheodore H. Kehl, Christine Moss, and Lawrence Dunkel

University of Washington

Introduction

In 1963 Clark and Molnar' developed the LINCcomputer (Laboratory Instrument Computer), principallyfor use in biomedical research. Several of the features ofthis machine (autoindexing, LINC tapes, 12-bit word, databreak) later appeared in the PDP-5 and, still later, in thePDP-8. It is probably fair to give Clark and Molnar creditfor thus starting the minicomputer revolution.

An important concept in the project was that the LINCwould be used as a laboratory instrument and thescientist/user would assume all responsibility for theconstruction, use, and maintenance of each machine. Theproject was eminently successful. Today, about a decadelater, more than 30 of the 40 original LINC computers arestill operational in biomedical research laboratories.'

Drawing on the knowledge that biomedical researchscientists can and will take total responsibility for acomputer system, we have designed and constructed aminicomputer to achieve the same objectives: low cost,flexible input/output, compact size, and simple organiza-tion. Because of the progress in circuit technology(especially MSI integrated circuits) an important goal wasto build a more sophisticated, powerful machine but stillhave the machine as easy to maintain as the LINC. All ofthese objectives have been met by using the Logic MachineDesign System.2A logic machine is an ensemble of microprogrammable

control processor, several functional units, one or morebi-directional data buses, and microprogram all arranged so

12

that a specific digital device is implemented-in this case aminicomputer. It differs from other modular computersystems, notably macromodules3 and register transfermodules (RTM's),4,5 6 in two major respects: a) it has amicroprogrammable control processor rather than distribu-ted control and b) its functional units tend to be morecomplex.

At a recent conference on modular computer systems,7most participants agreed that building a minicomputer ofRTM's or macromodules would cost (to build, which can bequite unrelated to purchase price) twice as much and runhalf as fast as a conventional computer. In this paper wedescribe a 16-bit logic machine minicomputer system(LM2) which runs about as fast but costs less than aconventional system. (See Table 1.) Note that a systemrather than a CPU is described. A logic machine CPU,without peripherals, would probably cost on the order of20% more than a conventional CPU. It is only when someperipheral equipment is included that the overall costdecreases.

Despite the restriction of developing a minicomputer forLINC-type use by scientist/users, we feel we have madesome computer architecture innovations.

Briefly stated, with more detail below, these innovationsare:

a) An hierarchical use of microprogramming with onelevel of strictly vertical encoding in the microprogramengine (control processor) and a second level ofhorizontally encoded ROM-control contained within thefunctional units (modules). This technique allows the

COMPUTER

Table 1. Characteristics of logic machine minicomputer (LM2)

generality and simplicity of vertical microprogrammingwhile providing the speed of horizontal microprogram-ming.

b) A technique where several functional units "coopera-tively" determine in parallel with the control processorthe sequence of microroutine execution. Each functionalunit competes for control processor service bygenerating a microcode address designating microcode tobe executed for the requesting unit. A priority networkresolves the conflict between competing functionalunits. A specific microinstruction, "DECODE," transfersthe highest priority address, via the bus, to themicroprogram counter of the control processor-thuseffecting a branch to the desired microcode.

c) Coroutines, a familiar software concept, implementedin hardware to aid and simplify the execution of looselycoupled microroutines such as the quasi-parallelmicroprogrammed control of a magnetic tape drive, A/DconVerter control, and macroinstruction execution. (Inthe LM2 all peripheral devices are controlled bymicrocode resident in the control processor.)

d) The ability to reload RAM control store from mainmemory permitting dynamic alteration of microcodeduring program execution. Microprogrammability, bywhich we mean the ability of the user to microprogram,increases the ease with which microcode can bedeveloped, and also allows more microcode to beutilized than can be contained in control store at anygiven time.

e) A ROM-macroprogram, SUP, and typewriter usedinstead of a console. Although less expensive than theswitches and lights, SUP is more powerful in providingbootstrap loaders, memory displays on the typewriter,and dynamic debug trace capability (like DDT).

November 1975

Hierarchically Organized Microcode-a "Judicious" Separation of Control

The logic machine control processor (microprogramengine) is strictly vertically microprogrammed, meaningthat only a single command emanates from a micro-instruction. Eight-bit microinstructions are executed at 100ns per instruction, except that branch instructions requiretwo sequential words, instruction and branch address, anduse 200 ns. A 4-bit page register is concatenated to an 8-bitmicroprogram counter to give a control processoraddressing space of 4K words made up of any combinationof RAM's and ROM's, provided cycle times are under 80 ns.

As is well known, vertically encoded microprogramengines tend to be slower than horizontally encoded ones.To review briefly the reasons for this consider a quitecommon CPU design (as diagrammed in Figure la) withadder (ALU), adder-A input multiplexer (A-Mux), adder-Binput multiplexer (B-Mux). A vertically microprogrammedengine invariably requires additional registers (as in Figurelb) to store the control settings of each of the three logicstructures (A-Mux Cnst, B-Mux Cnst, ALU-Cnst). Theseregisters constitute extra hardware and slow executionbecause they must be set with a sequential microinstructionas in the IBM 2025.8A horizontally encoded microprogram engine, on the

other hand, utilizes a much wider microinstruction (24-64bits, usually) with specific fields wired directly to each logicstructure (as in Figure c). The constant registers areeliminated and the logic structures are controlled in parallelso the whole is faster. This design is typical of the IBM360/50,8 the PDP 1 l/45,9 and many other recentcomputers.

However, horizontally encoded microprogram engineslose generality because the only way to add new logicstructures is to expand the microinstruction word to

13

Macro-Level Machine:

Word Size (bits) 16Number of Macros 73

Addressing Space (maximum main memory size 16-bit words) 65K

Memory Speed 500-750 ns full cycle325-375 ns access

Direct Memory Access Channels 6

Direct I/O (Programming I/O) Channels 16Maximum DMA Speed 2M bytes/sec

Micro-Level Machine:

Word Size (bits) 8Addressing Space (maximum control store size in 8-bit words)* 4096

Page Size 256 words/pageNumber of pages 16 pagesNumber of Micros Implemented for LM2 Macros (total possible = 256) 46Microprogram storage for Macro Execution 97 wordsMicro Execution Time single word 100 ns

double (16 bit) 200 ns*Any mixture of ROM and RAM

a. Conventional

M Program Engine

A-Mux B-Mux

ALU

c. Horizontallyprogrammed

Figure 1. Comparison of conventional, vertically, and horizontallyencoded control parts

Control (a)Store

Control Processor

Up to 256,L-instructionspoint-to-point wired to

functional units

include new control fields for each added logic structure.For a modular computer design system this eventuallybecomes impractical. Indeed, the objective of the LogicMachine design system is to develop functional units and amicroprogram engine (control processor) that maintaintheir generality.

To gain the speed advantage offered by horizontalencoding but still retain the generality of vertical encoding,a partitioning was developed with the horizontal encodingdone within a functional unit. One such functional unit(ALU-effective address functional unit) is diagrammed inFigure 2 and has the same role as a conventional CPU. Thefollowing describes its operation with reference to Table 2,a listing of three microroutines which perform (a)macroinstruction fetch, (b) seven memory referencemacros, and (c) eight non-memory reference macros.

16-bit Bidirectional Bus

ConsoleKeyboard /PrinterFunctional Unit

Disk ControlFunctional Unit

Logic Structures *Memory address register (16-bits)12-bitwords 50bit control(noshw)12-bitwords 5 onro (not shown) **Memory data register (16-bits)

ALU-Effective Address Functionol Unit

Figure 2. ilierarchically organized control store. One level contained in vertically encoded control processor (a) and second level of horizontallyencoded control words (b) in ALU-effective address functional unit. Not all LM2 functional units shown.

Macro (b) * 28 FieldsInstruction Control ROM * Used to ControlBranch Table Data Part

14 COMPUTER

Table 2. Microroutines for(a) CNSQ (MACRO FETCH),(b) 7 memory reference and(c) 8 non-memory reference

macroinstructions

Comments Time (ns)

(a) 100 CNSQ: PMAR/ Macro-P to memory address register

READ! If memory not busy, start read cycle

I NCP/ Increment macro-P

100

100

100

MDRINR/ When access completed, move memory 100-400data register to macroinstruction register

DECODE/ Move the preselected microroutinebranch address from whichever branchtable is generating the address to themicro-P

Total CNSQ

200

600-900 ns

(b) ADD: SUB: ORI: ORE: AND: CMA: CMB:

EAMAR/ Effective address adder gated to MAR 100

READ/ If memory not busy, start read cycle 100

MDRIA/ When access complete, move memory 100-500data to IA register, the operand register

OPALU/ Combinatorial logic is set up as per 100instruction in macroinstructionregister, gate ALU onto bus andstrobe enabled destination register

ZROJMIP/ Branch to location zero of this page 200(Location 100)

Total Macro 600-1000 ns

Total Macro Including CNSQ 1200-1900 ns

Raytheon 704 2000 ns

(c) LOC: GLO: CPX: CXP: SBU: SBL: MSK: UNM

OPFLAG/ Combinatorial logic is set up, strobeenabled registers and flags

ZROJMP/ Branch to location zero of this page(Location 100)

Total Macro

Total Macro Including CNSQ

Raytheon 704

100

200

300 ns

900-1200 ns

1000 ns

Macroinstruction-fetch (the microroutine CNSQ, startinga location 100 of control store) obtains a macroinstructionfrom main memory and delivers it to the ALU-effectiveaddress 'functional unit. At the end of CNSQ a microDECODE is executed which either permits I1/O cyclestealing or continues with the execution of the just-fetchedmacro.

Now, to examine macroinstruction-fetch in more detail,CNSQ begins by transferring the contents of themacroprogram counter to the memory address register ofmain memory (performed by microinstruction PMAR). Amemory read is begun by microinstruction (READ).Overlapped with the memory functional unit accessprocess, the macroprogram counter is incremented (INCP).MDRINR attempts to transfer the contents of the memorydata register to the macroinstruction register of theALU-effective address functional unit; however, an "access

November 1975

complete" signal interlocks this transfer until the data inthe MDR is valid. Hence, the microinstruction MDRINRhas variable timing depending on the speed of the memory.

Note that the macroinstruction register (Figure 2) isconnected to the address inputs of the ROM-control in theALU-effective address functional unit. A macro in themacroinstruction register causes a 50-bit control word to beread from the ROM and distributed to the data part logicstructures.* Thus, the control of the data path for amacroinstruction is distributed, in parallel, from a singlehorizontally encoded ROM word. The entire process fromthe time 'a macroinstruction is loaded into the macro-instruction register until the data path is set is strictlycombinational. Since one usually thinks of a microprogramstructure as requiring sequential steps it is probably best to

*Actually, a two level table-lookup is performed. See details in theAppendix.

15

Macro Micro

refer to this as "ROM-controlled" rather than micropro-grammed.

The strobes required to cause specific registers of thisfunctional unit to accept data, set flags, etc., are all

provided by the control processor. One unique strobe isgenerated for each microinstruction, as is the case ofstrictly vertically microprogrammed engines.

As is typical of horizontal encoding, the 50-bit controlword is composed of many fields. For example, theycontrol the ALU (5 bits), ALU carry-in (1 bit), accumulatorload/shift control (2 for left byte and 2 for right),arithmetic compares (less than, equal, greater than), control(2 bits) and enable (1 bit), overflow enable (1 bit),macroprogram counter control (2 bits) and enable -( bit),shift .steering right or left (2 bits), shift steering on bit 15,7, '8, 31 (1 bit each). Other fields control the computationof effective address.

I INR~~~ Topa-InstructiorC o n tr Decoders

Control Processor

* Microroutine program counter;4MSB used as page register;microroutine addresses from FU 'sare taken from 12 LSB of bus

**8bit microinstruction register

Figure 3. Cooperative processing. Each cooperative processingfunctional unit converts a condition, Si, to a micro-routine address in its branch table. When the controlprocessor executes a DECODE micro, -the-functionalunit with the highest priority gates its branch table,via the bus, to the ,u-PC.

Thus, a 16-bit macroinstruction presented to the addressof the ROM results in setting the control for the data paththroughout the ALU-effective address functional unit. Aslong as the macroinstruction register is not altered, the50-bit ROM output- will not- change-regardless of thenumber of microinstructions performed by th'e controlprocessor. In a con'ventional horizontally encoded micro-program engine,' all 50 bits would be redundantly-present ineach microinstruction. Here just a single 50-bit word isrequired for each microroutine 'regardless of the number ofmicroinstructions.We have emulated a Raytheon 704 minicomputer with

the LM2.'In this case fewer than 32 control words arerequired. As described in the Appendix, some additionalROM's are required which, when included, bring the totalto 1920 bits. Total bit count, including control pro'cessor

16

microinstructions used to implement all 76 Raytheon 704macroinstructions, is 1920 bits for ALU-effective addressunits and 97 microinstructions at 8 bits/instruction equals1920 + 776 = 2696 bits. As a very rough (because of thedissimilarity of architectures) comparison, the PDP 11/45requires 256 words with 64 bits/word for a total of 16Kbits,9 and the 360/25 uses 16K bytes (8 bits/byte).1° In

any event, hierarchically organized control store greatlyreduces the redundancy of horizontal encoding and thus isa cost saving. Speed is not sacrificed by this hierarchicalorganization. Shown in the microroutine listings of Table 2are the LM2 execution times for comparison with those ofthe Raytheon 704-a hardwired machine utilizing the sameIC technology. IThe LM2 is somewhat faster in memoryreference instructions (1200-1900 vs. 2000 ns)-in part dueto a faster memory-and nearly the same for register-to-register macroinstructions (900-1200 ns vs. 1000 ns).

'L

Branch .0

table - S2

Functional Unit

Branch

ta ble 2

Si-i Functional Unit

.-Daisy Chain Priority Network

-.16 -bit Bidirectional Data Bus

"Cooperative" Processing

"Cooperative" processing (see Figure 3) is a'method inwhich several functional. units cooperate with the controlprocessor in determining the sequence of microroutineexecution. Each functional unit has states, Si, each ofwhich implies that a specific microroutine, Mi, should beexecuted by the control processor. Thus a mapping or

translation is required to perform Si Mi. Usually a strictlyverticallyencoded microprogram engine (for example, theIBM 20258) would a'ccomplish this mapping by sequential"mask-test-branch" instructions. A sequence of 'thesemicroinstructions eventually determines which Si is present,and the corresponding -branch completes the mappingprocess 'to the correct Mi. In the LM2 the mapping

translation is done within the functional units. Our

COMPUTER

SIBranch . S2table

SNFunctional Unit

-10..

I

rationale is that such processing can be done in parallel ineach functional unit and, furthermore, that the controlprocessor need only be provided with the final outcome ofthis processing-the Mi or microroutine.

To explain this process in more detail, when a condition,Si, in a given functional unit requests that a specificmicroroutine be executed, a microroutine address is preparedby activating a branch table. Each activated branch tablemakes a microroutine address ready to be placed on thebus, but only the highest priority functional unit is enabled.When the control processor executes the microinstrcutionDECODE, the highest priority microroutine address istaken from the generating functional unit and strobed intothe control processor's microprogram counter via thebidirectional bus. The net result is a branch to themicroroutine which most "immediately" requires execu-tion.

One can view this operation as the control processorasking, "What microroutine should be executed next?" Theresponse is decided in parallel in all the functional unitswith none of the conventional "mask-test-branch" sequen-tial microinstructions.

In the LM2, not only do the peripheral device functionalunits process cooperatively but also the branches tomicroroutines for each macroinstruction are processed inthis way. The cooperative processing involved in selectingthe branch-to-macroinstruction's microroutine is describedin the following section. It should be noted thatmacroinstruction and peripheral unit function units all usethe same mechanism (DECODE) for cooperative processing.

Cooperative Processing of Macroinstructions

For each macroinstruction, Mi, a series of microinstruc-tions (a microroutine) must be executed. Again, a mappingtranslation is required:

Mi÷ Mi.

where the Mi are macroinstructions and the Mi aremicroroutines. Of course this is identically the situationwhich led to cooperative processing and, for the sake ofdesign economy and simplicity of understanding, macro-instructions are processed the same way in the LM2. Themapping-translation from macroinstruction-to-microroutineaddress is performed within the ALU-effective addressfunctional unit. An instruction branch table is addressed bythe macroinstruction to generate a microroutine address,Figure 2.

Referring to Table 2, the microroutine listing, noticethat at the end of CNSQ (the macro fetch microroutine) aDECODE microinstruction is executed. The highest priorityperipheral device requiring microroutine execution gainscontrol of the control processor as described above, but ifno peripheral device requires service then, because it is thelowest priority device, the ALU-effective address functionalunit gates its branch table's microroutine address into thecontrol processor microprogram counter-thus causing abranch to the microroutine appropriate for the execution ofthe pending macroinstruction in the macroinstructionregister.

November 1975

After the completion of the macroinstruction fetch, theALU-effective address functional unit retains all theinformation required to complete the macroinstruction.Therefore, it is safe to allow peripheral devices to gain theattention of the control processor to do such chores asaccess memory, etc., which is what occurs in response toDECODE.

Assume for a moment that a peripheral device gainscontrol of the control processor. In doing so a servicemicroroutine will be executed for the peripheral device. Atthe end of this service microroutine another DECODE isexecuted. Again the priority is checked, and if no higherpriority functional unit requires service, the.ALU-effectiveaddress functional unit is allowed to load its branch tableaddress into the microprogram counter, and macroinstruc-tion execution is continued. The net effect is similar to thefamiliar cycle-stealing of hardwired computers.

Coroutine Processing

The LM2 utilizes coroutines as loosely coupled softwareprocedures which run nearly-but not quite-independent-ly. (For a more detailed discussion of coroutines, seeKnuth, Vol. 1 1") For example, the microroutines whichcontrol the magnetic tape drive and the A/D are coroutinesto each other in the sense that each makes requests on mainmemory at independent rates, and each uses the sameresources-i.e., control processor, bidirectional bus, andmain memory (sequentially, of course). To facilitate the"quasi-parallel" (Knuth) nature of coroutines, a coroutinemicroinstruction "coroutine 5," for example, causes thecontents of location "5" of the coroutine branch table tobe swapped (exchanged) with the microprogram counter.There are sixteen locations in the coroutine branch tableallowing for up to 16 coroutines. Normally each of these 16will be assigned to a specific peripheral device. Forillustration, the magnetic tape control function will bedescribed.

The coroutines for the magnetic tape drive aresegmented. For example, the write operation coroutine ismade up of the following six segments: (a) coroutineinitialization and start magnetic tape forward, (b) access tothe jth word to be written, write word on tape, logicallyadd this word to the CRCC generator (cyclic redundancycheck character), decrement word count, and, if the wordcount not zero, reset coroutine pointer to (b), but if zero,(c) write CRCC, (d) write LRCC (longitudinal redundancycheck character), (e) stop magnetic tape. In each coroutinesegment a counter/timer is loaded with an appropriateconstant and, independent of the control processor,counted to zero at a fixed frequency. That is, the countcontinues, independently, after the control processor does acoroutine branch back to macroinstruction execution. Atzero count (or if an error occurs), the coroutine isreactivated by the execution of the microinstruction"coroutine 5." At the completion of any segment thecoroutine branch table is pointing at the next segment to beexecuted, and reactivation takes place without sequential"mask-test-branch" microinstructions.

Each operation performed on a magnetic tape drive(READ, WRITE, WRITE EOF, READ FORWARD TOEOF, READ REVERSE TO EOF, etc.) has a separate

17

coroutine (each of which is segmented), but inasmuch asonly one of these operations can be active at a given time,only a single location of the coroutine branch table isneeded for all operations. At the beginning of an operation("select magnetic tape" macro, for example), theinitialization microcode loads the coroutine branch tablewith the address of the first segment. Thereafter, sequencingthrough the segments is executed automatically asdescribed above and summarized in Figure 4.

4Macro Fetch a|ALU-EA FU Setup

conditions which caused the branch to the previous servicemicroroutine, will at least temporarily be satisfied, controlwill not go the previous Mi but to some new Mi. Sincemacroinstruction execution is always required, by defini-tion,- then macro execution is the default option-i.e., ifnothing else needs attention, then process macroinstruc-tions.

In the LM2 both DECODE and "coroutine j" require200 ns each. Assuming an average coroutine segment Mjk of4-8 microinstructions (hence, 400-800 ns), activating anMjk would require about 1200-1600 ns and thus a 500K16-bit word/sec I/O transfer rate is quite feasible. AnI/Otransfer rate in excess of 2M word/sec is possible ifDECODE and "coroutine j" are not used. In comparison,the IBM 360/25 has a maximum 16-bit word transfer rateof 40K/sec using the integrated I/O selector channel.

* Microroutines which completemacro instruction processing

** Non-segmented coroutines donot use the coroutinebranch toble

Figure 4. Macroinstruction and coroutine flow. DECODE causes a branch to highest priority functional unit (macroinstruction has lowestpriority). Non-segmented coroutines terminate with DECODE to cause branch to either another coroutine or to continue macro-processing. Segmented coroutine requires access of coroutine branch table.

Unused locations of the coroutine branch table can beused for subroutine linkage (subroutines are a special caseof coroutines; see Knuth"1).

Coroutines represent a second-order mapping operation:

DECODE "coroutine j"

S. M. =M.k

where a state, Si, maps (via microinstruction DECODE) to a

microroutine Mi, which in turn maps (via microinstruction"coroutine j") via the coroutine branch table to the Mjksegment of the coroutine Mj. Coroutine initializationconsists of setting Mjk to point at the coroutine appropriatefor the operation desired. The return from each activationof the coroutine leaves Mjk pointing at the next segment,unless looping within a segment is necessary. Returningfrom a coroutine segment is done by mapping in reverse

order-i.e., "coroutine j" followed by a DECODE. Thisimplies that all coroutines end with a "coroutine j"microinstruction which causes a branch back to Mi. At Mithe instruction DECODE is executed next and because the

18

Because much of the control of the magnetic tape wasincorpprated in the control processor as microroutines, wewere able to reduce the number of IC packages from over200 in a conventionally designed, magnetic tape controllerto 35 in the LM2 's magnetic tape functional unit. Both ofthe above designs used the same TTL integrated circuits,and hence the percentage reduction in IC count should berepresentative of a cost reduction.

The Microprogrammability of LM2

Various levels of microprogrammability exist incomputers. At one extreme are those which have beenmicroprogrammed but are designed never to have majorchanges in microcode. Usually these computers utilize amicroprogram. branching technique which does not lenditself to reprogramming, and control store is a ROM typewhich is not easily altered. The PDP 1 l's, Mod Comp, andLockheed SUE all fall into this class. Another level ofmicroprogrammability is represented by those machineswhich are designed to be remicroprogrammed-but not

COMPUTER

often (that is to say, not more than a few times a day). Forexample, the IBM 360/25 and some 370's can write intocontrol store from cards or disk, but writing into this storeis neither easy nor very dynamic.

One of the first computers to be dynamicallymicroprogrammed-i.e., one in which microcode can bealtered during the execution of a program under thatprogram's control-was the Packard-Bell (later Raytheon)440, which appeared in about 1963. Two of thesecomputers are still in operation in our laboratory. Thiscomputer uses a special core, referred to as BIAX for itsbiaxial core threading, as a non-destructive readout controlstore. Control store is in the same addressing space as mainmemory and has a read time of 200 ns (although in usecycle time is 1 microsecond). A program can altermicrocode simply by storing into BIAX at 6 ,sec per word.

Our experience with the 440 convinced us of thedesirability of simple, dynamic microprogrammability. Inthe LM2 reloading control store is accomplished from mainmemory, which can, of course, obtain information fromdisk, magnetic tape, etc. A special 16-bit I/O register, thereload register, was developed in the following format:

Main Memory Address Control Store Page12 bits 4 bits

When a word is output to this register a page of controlstore is reloaded in the following sequence:

1. The control processor clock is inhibited.2. The 4 LSB of the reload register is transmitted to the

control processor page register; then the 4 LSB bits ofthe reload register are cleared (set to zero).

3. The 8 LSB of the control processor's microprogramcounter are cleared, thus addressing a page boundaryof control store.

4. The reload register is used to move the right 8 bits of256 main memory words to control store.

5. Macroinstruction processing resumes.

Note that (a) whole pages of control store must bereloaded, (b) no I/O requests will be satisfied during reload,and (c) reloading a page takes 256,sec.

Console

Nearly all minicomputers now have ROM bootstraploaders, which greatly increase ease of use and tend to makethe switches and lights of the console superfluous. Ratherthan expend funds on switches and lights we haveincorporated, in ROM, an entire 275-word supervisor utilitypackage (SUP) which provides all of the readout and entryfuntions of lights and switches (via the consolekeyboard/printer) as well as bootstrap loaders and dynamicdebug tracing. A push-button causes a transfer to SUPwithout loss of current values of the counters and registersin any program that may have been interrupted. Then thefollowing functions can be performed by typing on theconsole keyboard/printer:

S Status of the macroinstruction programcounter, program status word, accumulator,and index register is printed.

November 1975

XXXX P Macro program counter is set to XXXX, anumber in hexadecimal notation.

XXXX I Index register is set.

XXXX W Accumulator is set.

N Causes a single macroinstruction to beexecuted.

XXXX H Halt (return to SUP) occurs when themacroinstruction program counter equalsxxxx.

R Places system in run mode.

T A record is read from magnetic tape intomain memory starting at location 0 and,upon completion, a macroinstruction jumpto location 0 is performed.

G Same as T except the first two disk sectorsare loaded.

Memory can be displayed by:XXXX LYYYY 0 Types out contents of main mem-

ory from XXXX to YYYY-1. Eachadditional 0 lists the next 8sequential memory locations.

Memory can be altered by setting the starting locationXXXXL, followed by a series of hexadecimal digits. Thelast four digits are written into memory when a "return" isstruck and L is then incremented by 1.

Many computers have similar octal or hex utilitypackages and also dynamic debug capabilities. In thissystem, however, the program is in ROM and located in thelast 275 words of the 64K main memory address space. It isinteresting to note the SUP is simultaneously morepowerful than lights and switches and less expensive.

One may be concerned about the difficulty inperforming maintenance on a machine that does not havelights and switches. The following has been our experience:1) The new thermal printers (for example, TexasInstruments Silent 700) are at least as reliable as lights andswitches. 2) In our applications we always have aconsole/printer and, in fact, one spare for each six LM2. 3)Unlike lights and switches, this console/printer is modularand can be replaced if suspected of failure. Hence, ourexperience easily leads us to prefer a ROM SUP as describedabove.

However, even more importantly, a major feature of thelogic machine is the capability of performing microdiagnos-tics, as will be described in the following section.

Maintenance

Maintenance of a computer system is always animportant design concern. In this system, in which weexpect the scientist/user to perform his own maintenance,it is especially critical to success. Our major methods ofsimplifying maintenance are (a) extensive use of ROM

19

control, (b) minimum layers of logic from bus-back-to-busoperations, (c) centralized control part, and (d) micro-diagnostics.

ROM-controlled architecture is fast becoming themethod of choice in computer design. With this techniquean instruction is applied to the address inputs of a ROMand the word which is read out is used to set the ALU,multiplexers, etc., of the data part. In other words, atranslation from instruction space to control space isperformed where the ROM is the translation device. Thealternative to ROM-controlled architecture is, of course,random logic. Random logic control, however, requires anunderstanding of how the individual gates of the randomlogic produce the desired effect-far too cumbersome (and,it turns out, too expensive) an approach for casualmaintenance. As a measure of our reliance on ROM-control,an IC package count for the ALU-effective addressfunctional unit into ROM-control, data part, and randomlogic (some random logic is unavoidable) shows that 14% isROM control, 70% is data part, and 16% is random. Itshould be pointed out that the ROM control packages are256-bit ROMs, and the random logic is almost entirelysmall scale integration (SSI)-i.e., very simple gatestructures.

Minimizing the number of layers of logic frombus-back-to-bus operations not only increases speed butdecreases complexity. Our initial goal was to have at mostfive "package" layers of logic, and-except for a minorcase-this was attained. To do so required another adder,but the modest additional cost of a second adder foreffective address calculation is easily justified by thesimplicity of maintenance thus provided the scientist/user.

Centralizing nearly all of the timing signals in the controlprocessor gains two objectives: nearly all timing and controlare independent of the data part, and since the controlprocessor exclusive of its control store represents onlyabout 50 integrated circuits, the control processor is easilydiagnosed.

Finally, small microdiagnostic programs have beenwritten which exercise parts of the total system. Forexample, the start timing of the magnetic tape drive may berepetitively executed to verify its performance withoutrequiring the transport even to be attached. Also, effectiveaddress calculation, coroutine branch table, ROM-controlsof various functional units may all be exercisedindependently. With these microdiagnostics a failing circuitis quickly identified.

Discussion

Hierarchically organized control store, cooperativeprocessing, and coroutine branch tables embody the idea ofoptimally separating control-or, as we prefer-judiciousseparation of control. This is an important objective formodular design, but other benefits occur as well.

Hierarchically organized control store implies thesegregation of control so that the control fields associatedwith a particular functional unit are contained in thatfunctional unit. For example, there seems to be noadvantage' to making the ALU control bits a part of the

20

microinstruction; indeed, there is the disadvantage that thisfield must be present in all microinstructions-even thosenot using the ALU. Hence, a "judicious" separation isdesirable.

Furthermore, with a hierarchically organized controlstore it is quite simple to replace the ALU-effective addressfunctional unit (now mostly MSI) with newer LSI bit-slicedmicroprocessors, entailing only minor changes to the rest ofthe system.

Both cooperative processing and coroutines separatecontrol in the sense that each technique helps determinemicroroutine sequencing at the functional units rather thanin the control processor. In a modular computer system thisoffers three advantages: (1) functional units may be addedor deleted easily with no effect on other functional units;(2) whatever circuitry is required to generate a branchtable address is a part of the functional unit and not thecontrol processor; and (3) perhaps most important,functional unit generation of a microroutine address as the"token" identifying microroutine sequencing is, it seems tous, the most direct and therefore fastest way to selectmicroroutines.

This last point requires more elaboration. In designing amicroprogram engine a central issue is the speed of.interpreting a macroinstruction. Since every macroinstruc-tion must undergo interpretation, any speed increase herewill have significant impact on overall system speed. In theLM2 we have been able to generalize this concept tomicroroutines for peripheral devices as well so that bothmacroinstruction and peripheral device service micro-routines are sequenced by the microinstruction DECODE.This concept is summarized in Figure 4, a flowchartshowing the relationship of macroprocessing, non-segmented coroutines, and segmented coroutines.

Conclusions

Previous work on modular computer systems has tendedto emphasize the development of a minimum number of"universal" module types. The basic idea was to draw froman inventory of modules, during the design process, and adda distributed control part to sequence the data stream in aspecific algorithmic way. In current technological contextthe earlier modules were no more complex than commonlyavailable MSI. With MSI offering this higher level ofcomplexity it is now possible to reexamine the concept ofan "ideal" module.

It is our feeling that an "ideal" functional unit shouldpossess a significant portion of an algorithm and yet remaintotally independent from other functional units. Sequenc-ing data from functional unit to functional -unit should beentirely under the control of a centralized control processorwith some cooperative processing. Changing the sequencingof the data stream through the functional units should bemaximally flexible-thus, in our opinion, implying micro-programming. However, it should be easy to physicallyreplace a functional unit and not disturb the operation ofthe other functional units.

Our search is not for a minimum "universal" set. Ratherwe attempt to find an "ideal" segmentation of an algorithmwhich simultaneously provides speed and low complexity.

COMPUTER,

Inventory of functional units is not a problem: with CADtechniques and wire-wrap facilities'2 we inventory"virtual" functional units which only exist physically in aspecific device. Thus we remain flexible so that, as new MSIand LSI are introduced, we can easily and quickly redesignwithout concern for an inventory.

Appendix

The hierarchically organized microcode of the LM2 hasthe first level of microprogram in the control processor(vertically encoded) and a second level in the ALU-Effective Address Functional Unit (horizontally encoded).This second level is internally organized as a two-level tablelookup so as to reduce bit requirements. An understandingof LM2 macroinstruction formats is necessary beforedescribing the technique.

Each 16-bit macroinstruction is composed of 4syllables-4 bits/syllable. Macroinstruction interpretationconsists of a left-to-right syllable scan. Some examples ofOP codes for left-most syllable are as follows:

Syllable

0

1

2

3

Instruction

Generic

JMP (JUMP)

JSX (JUMP-SET-RETURN)

STB (STORE BYTE)

When the first syllable is not a 0-(generic), then the formatis:

OP Index

4 bits 1 bit

Address

11 bits

If the first syllable is 0-generic, then the second syllablemust be examined. The following table gives someexamples:

Some instructions require three syllables to be interpreted.For example,

Syl SY2 SY3

0 9 1 SLA (Shift Left Arithmetic)

0 9 2 SLA D (Shift Left Arithmetic Double)

and the fourth syllable is a literal field.

Two Level Look-up for ROM Control Words Figure Alis a block diagram of the two-level table lookup whichproduces a 50-bit control word for each macroinstruction.In the first level ROM each syllable is "typed" in paralleland a small amount of random logic enables one of thethree. For example, 0-(generic) enables syllable 2 and threeROMs.

2 3 4Macroinstruction,

register

st level

TF SYL,I SYL 2 SYL 3Type" control

j ~~~~~~~~~~~~~~50-bitcontrol

2nd level

EffectiveTodt,~~~~~~~~~~~~~~Tdatavo

"Control" word address part logiccontrol

DriKA ~~~structures

To bidirectionalbus

Figure A-1. Two-level table loQkup organization.Used to generate the horizontally encoded control of theALU-effective address functional unit

Syllable 1 Syllable 2

0 2

0

0

0

Instruction

DIN (Direct Input)

3 DOT (Direct Output)

4 IXS (Increment Index and Skip)

5 DXS (Decrement Index and Skip)

The format for 2-syllable OP codes is

016

4 bits

November 1975

4-its

4 bits

Modifier/Literal/Address

8 bits

Actually the random logic need only determine ifsyllable 1 is 0-(generic) so as to enable the other twosyllable ROMs. Likewise, syllable 3 is enabled only ifsyllable 2 is a 9 (Arithmetic Shift) or A (Logical Shift),with these three gates all "type" information is accessed inparallel.

Each syllable ROM word is 16 words X 16 bits dividedinto four fields: 1) ALU type (5 bits), 2) Effective Addresstype (5 bits), 3) Instruction type (5 bits) and 4) Shift type(1 bit). The three 5-bit fields are used as addressed in thesecond-level ROM lookup.

The second-level ROM lookup produces the 50-bitcontrol word and is composed of three fields: 1) ALU, 2)Effective Address, and 3) Shift. Each control field isaddressed independently from the first level "type" ROM.

21

The reason for two-level lookup now becomes obvious:the macroinstruction-to-control word translation is amany-to-one operation. For example, all effective addressesare computed in the identical manner regardless of whetherthe instruction is ADD, SUB, AND, OR, XOR, etc., andthus only one control field word from the second-leveleffective address control ROM is needed. Similarly, onlyone ALU control field word is required for all comparesand another for stores. Had we used all 12 bits of the threesyllables, we would have required .212 words X 50 bits =204800 bits, although many would be "I don't cares."

Two-level table lookup for 50-bit control wordsproceeds as follows: 1) A macroinstruction is placed in themacroinstruction register (by a micro-routine); 2) thisresults in a combinational level-i lookup in all three-syllableROMs; 3) in parallel, random logic determines which of thethree syllable ROMs is to be gated to level 2; 4) level-2ROMs are accessed, on a field basis, to form a 50-bitcontrol; 5) this finally sets the data path as indicated forthat macroinstruction. Note that the entire process fromthe time the macroinstruction is in its register to the timethe data path is set is all combinational. Often this schemeis called "ROM-controlled" rather than "micro-programmed" because the latter has "micro-steps" whilethis technique has no sequential actions.

Emulation of the Raytheon 704 instruction set required,in the second level ROMs, 21 ALU ROM field words (20bits/field), 14 Effective Address ROM field words (16bits/field), and 20 Shift Control ROM words (14 bits/field).Including the instruction branch table words (19 words, 12bits/word) the total bit requirements become 1920 controlbits used in the ALU-Effective Address functional unit.Microroutines in the control processor consumed another97 8-bit words for a total 776 bits. Thus, total bitrequirements were 2696 bits for all aspects of theemulation. n

5. C. G. Bell, J. Grason and D. P. Siewiorek, "Register TransferModules (RTM's) for Understanding Digital System Design,"COMPCON'72, pp. 305-308.

6. C. G. Bell and J. Grason, "The Register Transfer ModuleDesign Concept," Computer Design, May 1971.

7. S. H. Fuller and D. P. Siewiorek, "Some Observations onSemiconductor Technology and the Architecture of LargeDigital Modules," Computer, pp. 15-21, October 1973.

8. S. S. Husson, Microprogramming: Principles and Practice,Englewood Cliffs, N. J.: Prentice Hall, 1970.

9. KBI I-A Central Processor Unit Maintenance Manual, DEC-il-HK BB-D Digital Equipment Corporation, Maynard, Mass.

10. IBM System/ 360 Model 25 Functional Characteristics,Publication A24-35 10.

11. D. E. Knuth, The Art of Computer Programming, Vol.1-Fundamental Algorithms, Reading; Mass.: Addison-Wesley,1968.

12. T. H. Kehl, C. Moss, and L. Dunkel, "Design Automation forCustom Hardware in Computer Research," IEEE Transactionson Computers, Vol. E-17, No. 3, August 1974, pp. 168-170.

... Theodore H. Kehl is an associate professor ofcomputer science and of physiology andbiophysics at the University of Washington. Hishardware interests include microprogrammablemodular computer systems. Software researchincludes information retrieval and firmwarecompiler techniques. Both hardware andsoftware research is directed at more efficient

.'^'~ systems for use in biomedical research.-W 9. He received the BS, MS, and PhD degrees

from the University of Wisconsin, Madison, in 1956, 1958, and1961. Although originally trained in physiology, aDl of his researchand teaching at the University of Washington has been in computerscience.

Acknowledgements

We wish to thank Drs. Bertil Hille and Allen M. Scher forencouragement and funds; John Dunkel, Ron Harding, andMichele Kehl for CAD runs and wire-wrapping; and AnitaOlson and Joanne Beaurain for manuscript preparation.

References1. Charles Molnar and Wesley Clark. Personal Communication.

Very little was published about the LINC project-probablybecause Clark and Molnar did not realize how important it was.

2. J. 0. Torode and T. H. Kehl, "The Logic Machine: A ModularComputer Design System," IEEE Transactions on Computers,Vol. C-23, No. 11, November 1974.

3. W. A. Clark, "Macromodular Computer Systems," in Proc.1967 Science Joint Computer Conf}, 1967, pp. 337-401.

4. C. G. Bell, J. Grason and A. Newell, Designing Computers andDigital Systems, Maynard, Mass.: Digital Press, 1972.

22

Christine Moss has been associated with theDepartment of Physiology and Biophysics atthe University of Washington since 1965, whereshe has been involved in the development ofsoftware for the department's research facili-ties.

She received the BA degree from WellesleyCollege in 1959 and an MA in mathematicsfrom the University of Washington in 1966

§NS,90tggi ; B ' twhere she has done further graduate study incomputer science. Ms. Moss is a member of the ACM and anassociate member of the IEEE Computer Society.

W_g§:&. g,5 Lawrence Dunkel is on the staff of thePhysiology and Biophysics Department of theUniversity of Washington. Since joining thedepartment in 1969, he has been engaged in the.design and development of digital hardware formicroprogrammed real-time computer systems.Earlier, he was employed by the Boeing Com-pany, where he wrote computer programs onwind tunnel data analysis. He received the BS innatural science from Seattle University in 1965.

COMPUTER

LM2 - A LogicMachine Minicomputer

Documents