DMY 16-bit RISC Microprocesso r Cecilia Florescu Mojdeh Makabi Daniel Yee December 2, 2002 CS M152B
Mar 28, 2015
DMY
16-bit RISCMicroprocess
orCecilia FlorescuMojdeh MakabiDaniel Yee
December 2, 2002
CS M152B
DMY
OverviewOverview
Purpose: Design a pipelined RISC Purpose: Design a pipelined RISC microprocessormicroprocessor
Design Platform: Xilinx ISE 4.1, Design Platform: Xilinx ISE 4.1, ModelSim ModelSim 5.6, Visual C+5.6, Visual C++ 6.0, + 6.0, Windows 2000 Windows 2000 ProfessionalProfessional
DMY
PipeliningPipeliningIt acts like an assembly lineIt acts like an assembly line Ford’s Auto Assembly Ford’s Auto Assembly
LineLineStation 1
Station 2
Station 3
Station 4
Sequential Auto Production VS Pipelining Auto Production
1 2 3 4
1 2 3 4
1 2 3 4
Auto Production
Time 1 2 3 4
1 2 3 4
1 2 3 4
Time
Auto Production
DMY
Pipelined RISCPipelined RISCRISC is an acronym for Reduced Instruction Set RISC is an acronym for Reduced Instruction Set
ComputerComputer It has a reduced and simple instruction set It has a reduced and simple instruction set It has a large number of general-purpose registersIt has a large number of general-purpose registers
In our Pipelined RISC Processor:In our Pipelined RISC Processor: Each instruction takes 1 clock cycle for each stageEach instruction takes 1 clock cycle for each stage The processor can accept 1 new instruction per clockThe processor can accept 1 new instruction per clock Instructions are processed in stages as they pass downInstructions are processed in stages as they pass down Multiple instructions in some phase of execution Multiple instructions in some phase of execution
concurrentlyconcurrently Pipelining doesn't improve the latency of instructions Pipelining doesn't improve the latency of instructions
(each instruction still requires the same amount of time to (each instruction still requires the same amount of time to complete)complete)
It does improve the overall throughputIt does improve the overall throughput
DMY
Pipelined RISC DesignPipelined RISC Design
DMY
Instruction Fetch StageInstruction Fetch Stage
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
DMY
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
Instruction Decode StageInstruction Decode Stage
DMY
Execution StageExecution Stage
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
DMY
Memory Access StageMemory Access Stage
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
DMY
Write Back StageWrite Back Stage
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
DMY
Modified Pipelined RISC Modified Pipelined RISC DesignDesign
16-bit ISA16-bit ISA• 16-bit fixed-length instructions, 16 registers16-bit fixed-length instructions, 16 registers• no “funct” field for R-type, only “op” fieldno “funct” field for R-type, only “op” field
• limited number of operationslimited number of operations• 4-bit “opcode” field => maximum 16 operations4-bit “opcode” field => maximum 16 operations
opcodeopcode
opcodeopcode
opcodeopcode
rsrs rtrt rdrd
rsrs rtrt addressaddress
target addresstarget address
R-typeR-type
I-typeI-type
J-typeJ-type
44 44 44 44
44 44 44 44
44 1212
opcodeopcode rsrs rtrt rdrdSuggestedSuggested
R-typeR-type
33 33 33 33 44
functfunct
DMY
Multiplier AlgorithmsMultiplier Algorithms
““Pencil-and-paper method”Pencil-and-paper method”
1 0 1 0 1 01 0 1 0 1 0 x 1 0 1 1x 1 0 1 1
1 0 1 0 1 01 0 1 0 1 01 0 1 0 1 01 0 1 0 1 0
0 0 0 0 0 00 0 0 0 0 0+ 1 0 1 0 1 0 + 1 0 1 0 1 0
1 1 1 0 0 1 1 1 01 1 1 0 0 1 1 1 0
• requires M cycles for one NxM multiplicationrequires M cycles for one NxM multiplication• implemented with AND, adder, and shift registerimplemented with AND, adder, and shift register
DMY
Multiplier AlgorithmsMultiplier Algorithms Array MultiplierArray Multiplier
DMY
Multiplier AlgorithmsMultiplier Algorithms
Modified Booth Encoding (MBE)Modified Booth Encoding (MBE)• reduces number of partial products by N/2 for MxN multiplicationreduces number of partial products by N/2 for MxN multiplication• performs parallel encoding v. serial encoding in original Boothperforms parallel encoding v. serial encoding in original Booth
Y2i + 1 Y2 Y2i - 1 Operation on X
0 0 0 0 x X
0 0 1 +1 x X
0 1 0 +1 x X
0 1 1 +2 x X
1 0 0 -2 x X
1 0 1 -1 x X
1 1 0 -1 x X
1 1 1 0 x X
DMY
• increases speed of summing byincreases speed of summing by
• all bits of PP in each column are all bits of PP in each column are
• x-2 compressor composed of x-2 compressor composed of CSAs;CSAs;
3-23-2compresscompress
oror
3-23-2compresscompress
oror
3-23-2compresscompress
oror
3-23-2compresscompress
oror
3-23-2compresscompress
oror
4-24-2compresscompress
oror
PP00jj PP11
jj PP22jj PP33
jj PP44jj PP55
jj PP66jj PP77
jj PP88jj
cc22jj
cc33j-1j-1 cc22
j-1j-1 cc11j-1j-1
cc11jj
cc44j-1j-1cc55
j-1j-1
cc44jj
cc33jj
cc55jj
cc66jj cc66
j-1j-1
Carry[j]Carry[j] Sum[j]Sum[j]
Multiplier AlgorithmsMultiplier Algorithms
Wallace TreeWallace Tree
increased parallelismincreased parallelism
added independently andadded independently andsimultaneouslysimultaneously
x := the number of PP’s in columnx := the number of PP’s in column
9-2 Compressor9-2 Compressor
DMY
Multiplier DesignMultiplier Design
Issues and SolutionsIssues and Solutions• limited opcode sizelimited opcode size
• made NOP instruction ADD $0, $0, $0 => freed one opcodemade NOP instruction ADD $0, $0, $0 => freed one opcode• ADD instruction doesn’t change register $0 (constant zero value)ADD instruction doesn’t change register $0 (constant zero value)
• latency v. simplicitylatency v. simplicity• multiplier lies in critical path; must calculate product in one cyclemultiplier lies in critical path; must calculate product in one cycle• algorithms trade simplicity of control and/or wiring for faster speedalgorithms trade simplicity of control and/or wiring for faster speed• multiplier latency not detrimental if n is small enoughmultiplier latency not detrimental if n is small enough
• => 8x8 multiplier=> 8x8 multiplier• negative and positive integer multiplicationnegative and positive integer multiplication
• 8 LSB of 16-bit operand taken as a two’s complement number8 LSB of 16-bit operand taken as a two’s complement number• sign detection unit detects signs operands and sets product signsign detection unit detects signs operands and sets product sign
DMY
Exception MException Managing anaging HardwareHardware
Pipeline ModificationsPipeline Modifications• EPC register tracks the problematic instructionEPC register tracks the problematic instruction• EPC_2 register to hold the instruction to return to, if allowedEPC_2 register to hold the instruction to return to, if allowed• Expansion of control unit to detect overflow signal and handle Expansion of control unit to detect overflow signal and handle
exceptionexception
PCInstructionMemory
IF/ID
Registers
ControlUnit
ID/EX
+
ALU
Memory
+
Sign Exd
MEM/WBEX/MEM
EPC
EPC 2
Overflow
ClkData Input
SubrtAddr
DMY
Arithmetic Overflow HandlerArithmetic Overflow HandlerALU performsarithmeticaloperations
Is Overflowsignal high?
Instructioncontinues to MEM
stageNO
Control Unit hasbeen notified, andtakes corrective
action
YES
Instruction inMEM_WB latch
will continue
Software SupportSoftware Support
• Assurance that MEM and WB Assurance that MEM and WB stages of pipeline continue stages of pipeline continue executionexecution
DMY
Arithmetic Overflow HandlerArithmetic Overflow HandlerALU performsarithmeticaloperations
Is Overflowsignal high?
Instructioncontinues to MEM
stageNO
Control Unit hasbeen notified, andtakes corrective
action
YES
Instructions inIF_ID and ID_EXE
latches will beflashed
Instruction inMEM_WB latch
will continue
Software SupportSoftware Support
• Assurance that MEM and WB Assurance that MEM and WB stages of pipeline continue stages of pipeline continue executionexecution
• Interruption of programInterruption of program
DMY
Arithmetic Overflow HandlerArithmetic Overflow HandlerALU performsarithmeticaloperations
Is Overflowsignal high?
Instructioncontinues to MEM
stageNO
Control Unit hasbeen notified, andtakes corrective
action
YES
Instructions inIF_ID and ID_EXE
latches will beflashed
Instruction inMEM_WB latch
will continue
Content of EPCwill be stored in
R$15
Software SupportSoftware Support
• Assurance that MEM and WB stages Assurance that MEM and WB stages of pipeline continue executionof pipeline continue execution
• Interruption of programInterruption of program
• Request to involve the operating Request to involve the operating systemsystem
DMY
Arithmetic Overflow HandlerArithmetic Overflow HandlerALU performsarithmeticaloperations
Is Overflowsignal high?
Instructioncontinues to MEM
stageNO
Control Unit hasbeen notified, andtakes corrective
action
YES
Instructions inIF_ID and ID_EXE
latches will beflashed
Instruction inMEM_WB latch
will continue
Content of EPCwill be stored in
R$15
PC will jump tooverflow handling
subroutine
Software SupportSoftware Support
• Assurance that MEM and WB Assurance that MEM and WB stages of pipeline continue stages of pipeline continue executionexecution
• Interruption of programInterruption of program
• Request to involve the operating Request to involve the operating systemsystem
• Enhancement of ISAEnhancement of ISA “ “MFCO” - move from MFCO” - move from
coprocessorcoprocessor “ “JR” - jump to address stored JR” - jump to address stored
in in reserved registerreserved register
DMY
Overflow ExampleOverflow Example
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
Instruction stored at address 103: 32 + 65527= 65559Instruction stored at address 103: 32 + 65527= 65559
Note: Note:
•221616 = 65536 = 65536
•2216 16 < 65559< 65559
DMY
ConclusionConclusion
16-bit processor, enhanced with a 16-bit processor, enhanced with a multiplier and able to detect multiplier and able to detect arithmetic overflowarithmetic overflow
Harvard Architecture model for Harvard Architecture model for memory managementmemory management
14 multipurpose, 2 reserved registers14 multipurpose, 2 reserved registers Advantages and disadvantages of Advantages and disadvantages of
designed 16-bit ISAdesigned 16-bit ISA
DMY
ReferencesReferences Boerger, Egon. Boerger, Egon. Architecture Design and Validation MethodsArchitecture Design and Validation Methods . New York Springer, 2000.. New York Springer, 2000. Carpinelli, John D. Carpinelli, John D. Computer Systems Organization and ArchitectureComputer Systems Organization and Architecture . . Boston: Addison-Wesley, Boston: Addison-Wesley,
2001.2001. Cohen, Ben. Cohen, Ben. VHDLVHDL Coding Styles and MethodologiesCoding Styles and Methodologies . Boston: . Boston: Kluwer Academic Publishers, Kluwer Academic Publishers,
1999.1999. Dahan, David. Dahan, David. 17x17-Bit, High-Performance, Fully Synthesizable 17x17-Bit, High-Performance, Fully Synthesizable MultiplierMultiplier. Technology . Technology
Licensing Division DSP Group Inc.Licensing Division DSP Group Inc. Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Introduction to Digital Systems.Introduction to Digital Systems. New New
York: John Wiley & Sons, Inc., 1999.York: John Wiley & Sons, Inc., 1999. Hennessy, John L. and David A. Patterson. Hennessy, John L. and David A. Patterson. Computer Organization and DesignComputer Organization and Design . 2nd ed. San . 2nd ed. San
Francisco: Morgan Kaufmann Publishers Inc., Francisco: Morgan Kaufmann Publishers Inc., 1997.1997. High Speed Parallel Multiplier For LEON Processor Algorithm.High Speed Parallel Multiplier For LEON Processor Algorithm. Lab #5: Lab #5: Implementation of a Multiplier.Implementation of a Multiplier. EE116L course, UCLA. EE116L course, UCLA. Nahata, Sunny and Rohit Madampath. Nahata, Sunny and Rohit Madampath. 8 by 8 bit High Speed Multiplier Design Using (4,2) 8 by 8 bit High Speed Multiplier Design Using (4,2)
Counters. Counters. 2002.2002. Smith, James E. Smith, James E. The Microarchitecture of Superscalar ProcessorsThe Microarchitecture of Superscalar Processors . New York: Madison, 1995.. New York: Madison, 1995. Stalling, William. Stalling, William. Computer Organization and ArchitectureComputer Organization and Architecture . 6th ed. Upper Saddle. 6th ed. Upper Saddle River: River:
Prentice Hall, 2003.Prentice Hall, 2003. Sweetman, Dominic. Sweetman, Dominic. See MIPS RunSee MIPS Run. San Francisco: Morgan Kaufmann Publishers Inc., 1999. San Francisco: Morgan Kaufmann Publishers Inc., 1999.. Tamir, Yuval. Tamir, Yuval. Computer Systems Architecture Notes.Computer Systems Architecture Notes. UCLA. UCLA. Yeh, Wen-Chang and Chein-Wei Jen. Yeh, Wen-Chang and Chein-Wei Jen. High-Speed Booth Encoded Parallel High-Speed Booth Encoded Parallel Multiplier Design.Multiplier Design.
IEEE Transactions on ComputersIEEE Transactions on Computers, Vol. 49, No. 7. July 2000., Vol. 49, No. 7. July 2000.