DMY 16-bit RISC Microprocessor Cecilia Florescu Mojdeh Makabi Daniel Yee December 2, 2002 CS M152B.

DMY

16-bit RISCMicroprocess

orCecilia FlorescuMojdeh MakabiDaniel Yee

December 2, 2002

CS M152B

DMY

OverviewOverview

Purpose: Design a pipelined RISC Purpose: Design a pipelined RISC microprocessormicroprocessor

Design Platform: Xilinx ISE 4.1, Design Platform: Xilinx ISE 4.1, ModelSim ModelSim 5.6, Visual C+5.6, Visual C++ 6.0, + 6.0, Windows 2000 Windows 2000 ProfessionalProfessional

DMY

PipeliningPipeliningIt acts like an assembly lineIt acts like an assembly line Ford’s Auto Assembly Ford’s Auto Assembly

LineLineStation 1

Station 2

Station 3

Station 4

Sequential Auto Production VS Pipelining Auto Production

1 2 3 4

1 2 3 4

1 2 3 4

Auto Production

Time 1 2 3 4

1 2 3 4

1 2 3 4

Time

Auto Production

DMY

Pipelined RISCPipelined RISCRISC is an acronym for Reduced Instruction Set RISC is an acronym for Reduced Instruction Set

ComputerComputer It has a reduced and simple instruction set It has a reduced and simple instruction set It has a large number of general-purpose registersIt has a large number of general-purpose registers

In our Pipelined RISC Processor:In our Pipelined RISC Processor: Each instruction takes 1 clock cycle for each stageEach instruction takes 1 clock cycle for each stage The processor can accept 1 new instruction per clockThe processor can accept 1 new instruction per clock Instructions are processed in stages as they pass downInstructions are processed in stages as they pass down Multiple instructions in some phase of execution Multiple instructions in some phase of execution

concurrentlyconcurrently Pipelining doesn't improve the latency of instructions Pipelining doesn't improve the latency of instructions

(each instruction still requires the same amount of time to (each instruction still requires the same amount of time to complete)complete)

It does improve the overall throughputIt does improve the overall throughput

DMY

Pipelined RISC DesignPipelined RISC Design

DMY

Instruction Fetch StageInstruction Fetch Stage

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

DMY

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

Instruction Decode StageInstruction Decode Stage

DMY

Execution StageExecution Stage

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

DMY

Memory Access StageMemory Access Stage

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

DMY

Write Back StageWrite Back Stage

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

DMY

Modified Pipelined RISC Modified Pipelined RISC DesignDesign

16-bit ISA16-bit ISA• 16-bit fixed-length instructions, 16 registers16-bit fixed-length instructions, 16 registers• no “funct” field for R-type, only “op” fieldno “funct” field for R-type, only “op” field

• limited number of operationslimited number of operations• 4-bit “opcode” field => maximum 16 operations4-bit “opcode” field => maximum 16 operations

opcodeopcode

opcodeopcode

opcodeopcode

rsrs rtrt rdrd

rsrs rtrt addressaddress

target addresstarget address

R-typeR-type

I-typeI-type

J-typeJ-type

44 44 44 44

44 44 44 44

44 1212

opcodeopcode rsrs rtrt rdrdSuggestedSuggested

R-typeR-type

33 33 33 33 44

functfunct

DMY

Multiplier AlgorithmsMultiplier Algorithms

““Pencil-and-paper method”Pencil-and-paper method”

1 0 1 0 1 01 0 1 0 1 0 x 1 0 1 1x 1 0 1 1

1 0 1 0 1 01 0 1 0 1 01 0 1 0 1 01 0 1 0 1 0

0 0 0 0 0 00 0 0 0 0 0+ 1 0 1 0 1 0 + 1 0 1 0 1 0

1 1 1 0 0 1 1 1 01 1 1 0 0 1 1 1 0

• requires M cycles for one NxM multiplicationrequires M cycles for one NxM multiplication• implemented with AND, adder, and shift registerimplemented with AND, adder, and shift register

DMY

Multiplier AlgorithmsMultiplier Algorithms Array MultiplierArray Multiplier

DMY


Modified Booth Encoding (MBE)Modified Booth Encoding (MBE)• reduces number of partial products by N/2 for MxN multiplicationreduces number of partial products by N/2 for MxN multiplication• performs parallel encoding v. serial encoding in original Boothperforms parallel encoding v. serial encoding in original Booth

Y2i + 1 Y2 Y2i - 1 Operation on X

0 0 0 0 x X

0 0 1 +1 x X

0 1 0 +1 x X

0 1 1 +2 x X

1 0 0 -2 x X

1 0 1 -1 x X

1 1 0 -1 x X

1 1 1 0 x X

DMY

• increases speed of summing byincreases speed of summing by

• all bits of PP in each column are all bits of PP in each column are

• x-2 compressor composed of x-2 compressor composed of CSAs;CSAs;

3-23-2compresscompress

oror


oror


oror


oror


oror


oror

PP00jj PP11

jj PP22jj PP33

jj PP44jj PP55

jj PP66jj PP77

jj PP88jj

cc22jj

cc33j-1j-1 cc22

j-1j-1 cc11j-1j-1

cc11jj

cc44j-1j-1cc55

j-1j-1

cc44jj

cc33jj

cc55jj

cc66jj cc66

j-1j-1

Carry[j]Carry[j] Sum[j]Sum[j]


Wallace TreeWallace Tree

increased parallelismincreased parallelism

added independently andadded independently andsimultaneouslysimultaneously

x := the number of PP’s in columnx := the number of PP’s in column

9-2 Compressor9-2 Compressor

DMY

Multiplier DesignMultiplier Design

Issues and SolutionsIssues and Solutions• limited opcode sizelimited opcode size

• made NOP instruction ADD $0, $0, $0 => freed one opcodemade NOP instruction ADD $0, $0, $0 => freed one opcode• ADD instruction doesn’t change register $0 (constant zero value)ADD instruction doesn’t change register $0 (constant zero value)

• latency v. simplicitylatency v. simplicity• multiplier lies in critical path; must calculate product in one cyclemultiplier lies in critical path; must calculate product in one cycle• algorithms trade simplicity of control and/or wiring for faster speedalgorithms trade simplicity of control and/or wiring for faster speed• multiplier latency not detrimental if n is small enoughmultiplier latency not detrimental if n is small enough

• => 8x8 multiplier=> 8x8 multiplier• negative and positive integer multiplicationnegative and positive integer multiplication

• 8 LSB of 16-bit operand taken as a two’s complement number8 LSB of 16-bit operand taken as a two’s complement number• sign detection unit detects signs operands and sets product signsign detection unit detects signs operands and sets product sign

DMY

Exception MException Managing anaging HardwareHardware

Pipeline ModificationsPipeline Modifications• EPC register tracks the problematic instructionEPC register tracks the problematic instruction• EPC_2 register to hold the instruction to return to, if allowedEPC_2 register to hold the instruction to return to, if allowed• Expansion of control unit to detect overflow signal and handle Expansion of control unit to detect overflow signal and handle

exceptionexception

PCInstructionMemory

IF/ID

Registers

ControlUnit

ID/EX

+

ALU

Memory

+

Sign Exd

MEM/WBEX/MEM

EPC

EPC 2

Overflow

ClkData Input

SubrtAddr

DMY

Arithmetic Overflow HandlerArithmetic Overflow HandlerALU performsarithmeticaloperations

Is Overflowsignal high?

Instructioncontinues to MEM

stageNO

Control Unit hasbeen notified, andtakes corrective

action

YES

Instruction inMEM_WB latch

will continue

Software SupportSoftware Support

• Assurance that MEM and WB Assurance that MEM and WB stages of pipeline continue stages of pipeline continue executionexecution

DMY




stageNO


action

YES

Instructions inIF_ID and ID_EXE

latches will beflashed


will continue



• Interruption of programInterruption of program

DMY




stageNO


action

YES




will continue

Content of EPCwill be stored in

R$15


• Assurance that MEM and WB stages Assurance that MEM and WB stages of pipeline continue executionof pipeline continue execution


• Request to involve the operating Request to involve the operating systemsystem

DMY




stageNO


action

YES




will continue

Content of EPCwill be stored in

R$15

PC will jump tooverflow handling

subroutine




• Request to involve the operating Request to involve the operating systemsystem

• Enhancement of ISAEnhancement of ISA “ “MFCO” - move from MFCO” - move from

coprocessorcoprocessor “ “JR” - jump to address stored JR” - jump to address stored

in in reserved registerreserved register

DMY

Overflow ExampleOverflow Example

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

--------------------------------------------------

Instruction stored at address 103: 32 + 65527= 65559Instruction stored at address 103: 32 + 65527= 65559

Note: Note:

•221616 = 65536 = 65536

•2216 16 < 65559< 65559

DMY

ConclusionConclusion

16-bit processor, enhanced with a 16-bit processor, enhanced with a multiplier and able to detect multiplier and able to detect arithmetic overflowarithmetic overflow

Harvard Architecture model for Harvard Architecture model for memory managementmemory management

14 multipurpose, 2 reserved registers14 multipurpose, 2 reserved registers Advantages and disadvantages of Advantages and disadvantages of

designed 16-bit ISAdesigned 16-bit ISA

DMY

ReferencesReferences Boerger, Egon. Boerger, Egon. Architecture Design and Validation MethodsArchitecture Design and Validation Methods . New York Springer, 2000.. New York Springer, 2000. Carpinelli, John D. Carpinelli, John D. Computer Systems Organization and ArchitectureComputer Systems Organization and Architecture . . Boston: Addison-Wesley, Boston: Addison-Wesley,

2001.2001. Cohen, Ben. Cohen, Ben. VHDLVHDL Coding Styles and MethodologiesCoding Styles and Methodologies . Boston: . Boston: Kluwer Academic Publishers, Kluwer Academic Publishers,

1999.1999. Dahan, David. Dahan, David. 17x17-Bit, High-Performance, Fully Synthesizable 17x17-Bit, High-Performance, Fully Synthesizable MultiplierMultiplier. Technology . Technology

Licensing Division DSP Group Inc.Licensing Division DSP Group Inc. Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Introduction to Digital Systems.Introduction to Digital Systems. New New

York: John Wiley & Sons, Inc., 1999.York: John Wiley & Sons, Inc., 1999. Hennessy, John L. and David A. Patterson. Hennessy, John L. and David A. Patterson. Computer Organization and DesignComputer Organization and Design . 2nd ed. San . 2nd ed. San

Francisco: Morgan Kaufmann Publishers Inc., Francisco: Morgan Kaufmann Publishers Inc., 1997.1997. High Speed Parallel Multiplier For LEON Processor Algorithm.High Speed Parallel Multiplier For LEON Processor Algorithm. Lab #5: Lab #5: Implementation of a Multiplier.Implementation of a Multiplier. EE116L course, UCLA. EE116L course, UCLA. Nahata, Sunny and Rohit Madampath. Nahata, Sunny and Rohit Madampath. 8 by 8 bit High Speed Multiplier Design Using (4,2) 8 by 8 bit High Speed Multiplier Design Using (4,2)

Counters. Counters. 2002.2002. Smith, James E. Smith, James E. The Microarchitecture of Superscalar ProcessorsThe Microarchitecture of Superscalar Processors . New York: Madison, 1995.. New York: Madison, 1995. Stalling, William. Stalling, William. Computer Organization and ArchitectureComputer Organization and Architecture . 6th ed. Upper Saddle. 6th ed. Upper Saddle River: River:

Prentice Hall, 2003.Prentice Hall, 2003. Sweetman, Dominic. Sweetman, Dominic. See MIPS RunSee MIPS Run. San Francisco: Morgan Kaufmann Publishers Inc., 1999. San Francisco: Morgan Kaufmann Publishers Inc., 1999.. Tamir, Yuval. Tamir, Yuval. Computer Systems Architecture Notes.Computer Systems Architecture Notes. UCLA. UCLA. Yeh, Wen-Chang and Chein-Wei Jen. Yeh, Wen-Chang and Chein-Wei Jen. High-Speed Booth Encoded Parallel High-Speed Booth Encoded Parallel Multiplier Design.Multiplier Design.

IEEE Transactions on ComputersIEEE Transactions on Computers, Vol. 49, No. 7. July 2000., Vol. 49, No. 7. July 2000.

DMY 16-bit RISC Microprocessor Cecilia Florescu Mojdeh Makabi Daniel Yee December 2, 2002 CS M152B.

Documents

dmy instruction

compressor slide

dmy execution stage

reduced instruction

new instruction

nop instruction

simple instruction

dmy multiplier design