MIPS CPU: Core Instruction Set Implementation - ULPGCnunez/clases-micros-para-com/clases-mpc-slide… · Phong Nguyen 1 MIPS CPU: Core Instruction Set Implementation Purpose This

Phong Nguyen

1

MIPS CPU: Core Instruction Set Implementation

Purpose

This machine was built to demonstrate how the core instructions of the MIPS instruction set are

implemented in a multi‐cycle CPU. The data path was designed following the “black box” modular design

methodology to demonstrate how complex logic can be simplified into individual modules. In addition,

the CPU was built and simulated using Verilog in order to familiarize the student with modern synthesis

techniques.

By implementing the instruction set using a multi‐cycle CPU the designer is able to minimize the

average cycle time it takes to execute an instruction. If the designer doesn’t split up the execution of an

instruction into different clock cycles, then the designer must ensure that the clock cycle period is long

enough for the slowest instruction to execute. This results in large overhead when an instruction is able

to execute in much less time than the clock cycle period. A multi‐cycle design allows an instruction that

is able to execute in a shorter amount of time to do so, and an instruction that takes longer does not

have to worry about a race condition where it must execute before the clock cycle period expires. As a

result of letting less complex instructions execute faster, the average time it takes to run a program will

be less than the single‐cycle case where every instruction takes as long as the slowest instruction to

execute. This multi‐cycle approach demonstrates just one way to improve CPU performance.

To simplify synthesis of the machine a hardware description language was used and a modular

design methodology was adopted. A modular design methodology allows complex structures and logic,

such as the control unit, to be self contained. Organizing the CPU into self contained logic modules

enables the CPU to be represented as a data path diagram. This visual representation reduces

complexity and enables designers to create more intuitive design decisions. For example, by adopting

Phong Nguyen

2

this visual representation the designer is able to follow a signal line as it enters and exits a logic module.

By understanding the different paths that each instruction takes the designer is able to optimize the

paths by replacing a logic module or removing a module. Accessing a module requires clock cycles, so by

replacing a slow module with a fast one or reorganizing the path to negate the need to access a module,

the designer is essentially improving CPU performance. These insights are why the modular design

methodology is so powerful.

Instruction Set Definition

In this implementation only fifteen instructions out of the MIPS core instruction set were

implemented. These instructions were made up of R, I, and J type instructions. Particular attention was

paid to implementing branch and jump instructions so that programming control structures could be

implemented. Memory access instructions, such as load and store word, were also implemented to aid

programming tasks. The remaining instructions were arithmetic or logical in nature and require ALU

access. Refer to the diagram below for details on how the instructions were encoded.

Phong Nguyen

3

Instruction Format

TYPE NAME FORMAT OPCODE10 RS RT RD SHAMT10 FUNCT10R sll R[rd] =R[rt] << SHAMT 0 x x x 0 R srl R[rd] =R[rt] >> SHAMT 0 x x x 2 R add R[rd] = R[rs] + R[rt] 0 x x x 32 R sub R[rd] = R[rs] ‐ R[rt] 0 x x x 34 R and R[rd] = R[rs] & R[rt] 0 x x x 36 R or R[rd] = R[rs] | R[rt] 0 x x x 37 R nor R[rd] = R[rs] | R[rt] 0 x x x 39 R slt R[rd] = (R[rs] < R[rt]) ? 1 : 0 0 x x x 42 OPCODE RS RT IMMEDIATE

I beq if (R[rs] == R[rt])

PC = PC + 1 + SignExtImm 4 x x x

I addi R[rt] = R[rs] + SignExtImm 8 x x x I slti R[rt] = (R[rs] < SignExtImm) ? 1 : 0 10 x x x I lui R[rt] = {SignExtImm, 16’b0} 15 x x I lw R[rt] = M[R[rs] + SignExtImm] 35 x x x I sw M[R[rs] + SignExtImm] = R[rt] 43 x x x OPCODE IMMEDIATE J j PC = Imm 2 X

Architecture

A multi‐cycle MIPS data path was chosen as the target architecture for its uniformity and simplicity. The

three different instruction types outline a template that all instructions must follow. This allows the

designer to take advantage of shared control signals and greatly reduce the complexity of the control

logic. The implementation of the immediate arithmetic instructions is a case in point.

On the surface immediate arithmetic instructions operate very much like an R type instruction,

but it also contains features of I type instructions. This similarity enables us to reduce the control logic

by copying the memory computation state, but with one minor change. Instead of an ALUOp of 0, which

instructs the ALU to perform addition, an ALUOp of 3 is used. Much like how an ALUOp of 0 represents

an R type arithmetic instruction, an ALUOp of 3 represents an immediate arithmetic instruction. By

Phong Nguyen

4

reusing the ALUOp control signal we are able to reuse the ALU controller module, but extend it with

minor modifications to support this new category of instructions. Since the immediate arithmetic

instructions do not contain a FUNCT field an alternative tag must be used to indicate what operation the

ALU should perform. To support this function the OpCode field of the instruction is extended to the ALU

controller module. This works because each immediate arithmetic instruction contains a unique

OpCode, unlike R type instructions that all have an OpCode of 0. R type instructions also exhibit control

signal sharing. Instead of using the OpCode as the tag, all R type instructions have a FUNCT field that is

used to control the ALU. The uniformity of the MIPS instruction set allows such optimizations to take

place, which greatly reduces logic complexity. This optimization is used throughout the design of the

CPU.

Modules

The following figure is the data path implemented by this machine. It takes advantage of

modular design and module reuse optimizations to greatly reduce the complexity of the machine. Once

the organization of the data path is outlined the individual modules and their functionalities can be

described and synthesized using Verilog.

Phong Nguyen

5

Memory

In the multi‐cycle implementation the memory module is used for both data and instruction.

This reduces one memory module from the single cycle implementation, but requires more complex

control logic. This is a reasonable trade off because the combinational control logic is usually cheaper to

implement than a memory hierarchy.

To be able to use one memory module for both data and instruction a multiplexor is needed to

determine whether to read the next instruction or to read a piece of data from memory. In addition,

two control lines are needed to enable reading and writing. Closer inspection will show that this can be

further simplified by using the negation of either the read signal or the write signal, thus reducing the

control lines from two to one. This is not done to maintain easy comprehension.

Phong Nguyen

6

Another simplification is made to this particular memory module. Since instructions that address

a byte are not implemented the memory structure is not byte addressable. This negates the need to

shift left by two when calculating branch addresses and jump addresses.

Register File

The register file does not change very much from the single cycle implementation. What do

change are its inputs and outputs. The input for write data is hooked up to a multiplexor with a register

called Memory Data as one of its inputs. This is needed to retain the value read from memory for use in

a later cycle. The two read data ports need to be saved as well so that it can be supplied later to the ALU

if needed. There must also be a write control signal to prevent invalid writes.

Instruction Register

The instruction register was created as its own module to enhance comprehension. The

behavior of this module could be defined in Verilog using simple reg constructs, but encapsulating the

behavior of the instruction register into its own module hides the complexity of deriving the various

instruction fields. The behavior is pretty simple: if IRWrite is asserted the instruction register is written

with the instruction from memory and then it derives the various instruction fields and provides them as

outputs for the rest of the data path to use.

ALU

The operation of the ALU is totally dependent upon the ALU control signal that comes from the

ALU controller module. The ALU control signal is 4 bits wide and it tells the ALU what operation to

perform. In addition, this implementation of the ALU also accepts the SHAMT field as inputs. This field

contains the amount to shift. Depending upon the value of the FUNCT field this could either be a left

logical shift or a right logical shift. What makes this ALU unique, though, is its ability to perform a variety

of operations. This reduces the number of ALUs from the single cycle data path down to just one; greatly

Phong Nguyen

7

reducing the cost. To perform both arithmetic and address computations, the inputs of the ALU are

chosen using a number of multiplexors, whose select signals are derived from the control module. This

enables the inputs to the ALU to come from different sources depending on the instruction.

ALU Control

This ALU control module is slightly different, as mentioned earlier. It accepts the OpCode field of

the instruction as an additional input. This allows it to decode immediate arithmetic instructions and

produce the appropriate ALU control output signals. How the control module interprets its inputs

depends upon the ALUOp code. There are essentially three codes that are implemented in this machine.

An ALUOp of 2 tells the control module to use the FUNCT field to generate the control outputs. An

ALUOp of 0 automatically tells the ALU control to tell the ALU to perform addition. An ALUOp of 1 tells

the ALU control to tell the ALU to perform subtraction. Finally, an ALUOp of 3 tells the ALU control to

use the OpCode to derive the appropriate ALU control signals.

Control

The control is the most complex module to be synthesized and it is also the most crucial to the

proper functioning of the CPU. The module is essentially a state machine that first fetches and decodes

the OpCode, and then depending upon what the OpCode is it proceeds through the proper states.

During each state the appropriate control signals are set for the current instruction. The control also has

the job of deciding which module to access during the current clock cycle. As mentioned earlier, the

complexity of the control logic is already greatly reduced because the MIPS architecture was adopted.

Implementing new instructions is also greatly simplified by adopting the MIPS architecture. By

identifying what type of format the new instruction will be many of the states in the control can be

reused with only slight modification. An example is the ComputeImm state, a variation of the

ComputeAddr state, designed specifically for the immediate arithmetic instructions by setting the

Phong Nguyen

8

ALUOp to 3. Another example is the ImmCompletion state; a variation of the RTYPECompletion state,

except the RegDst is set to 0 to select RT as the write address rather than RD.

Testing

A Fibonacci term calculator was written using the MIPS instructions that were implemented by

the machine. This program was not able to test all instructions, but provided a good start. Given N the

program would compute the (N‐1)th term of the Fibonacci sequence. First the initial variables are

loaded into registers from memory. There is a space in memory reserved for program instructions

starting at memory address 0 and with anything after memory address 50 reserved for program data.

Once calculated the data is stored into the appropriate space in memory. For the Fibonacci calculator,

the result is stored into memory address 53. To be able to accomplish the behavior described above the

program must thoroughly test the memory access instructions and branch and jump instructions. It

must also test a few of the arithmetic instructions. To test the arithmetic instructions, a mix of

immediate arithmetic and R type arithmetic instructions were used to optimize the test.

The next program tests the rest of the instructions, the majority of which are R type arithmetic

and immediate arithmetic instructions. These are pretty simple tests as they just pose initial signed

decimal values and then perform the arithmetic operation on them. The result is stored then stored

back into the data memory space.

Conclusion

The MIPS instruction set is designed to be very easy to implement in hardware. Everything from

the encoding of the instructions to the design of the data path took advantage of spatial and temporal

locality. The uniformity allowed signals and modules to be shared, which reduced the number of cycles

needed to execute an instruction. The splitting of instruction execution into multiple clock cycles takes

advantage of temporal locality by allowing faster instructions to execute faster. The MIPS instruction set

Phong Nguyen

9

is simple to understand, yet it is able to demonstrate the basic concepts of CPU design well. Presented

here was just a subset of the MIPS instruction set. Further exploration will definitely yield further design

techniques to further optimize the CPU. Concepts that would be worth exploring would be pipelining

and floating point instructions, but that is beyond the scope of this implementation.

Lessons Learned

Tools for debugging Verilog are very archaic. Compared to modern IDE such as Eclipse,

ModelSim’s Verilog debugger leaves much to be wanted. As a result it took hours to debug a simple

mistake that is buried in layers of code. The design concepts were not hard to understand, but the

frustration of synthesizing it using Verilog became very frustrating. This was the most time consuming

part of the project.

There were design lessons to be learned as well. Through in depth examination of the different

aspects of the MIPS instruction set it is obvious that spatial and temporal locality is taken full advantage

of. Also, the concept of making the common fast is applied frequently. It would not be surprising if

further inspection of modern processors yields similar optimizations based upon those fundamental

design concepts.

module ALU(ALUOut, Zero, Op1, Op2, SHAMT, ALUControl); output [31:0] ALUOut; output Zero; input [31:0] Op1, Op2; input [4:0] SHAMT; input [3:0] ALUControl; parameter AND = 4'd0; // 0000 parameter OR = 4'd1; // 0001 parameter NOR = 4'd12; // 1100 parameter ADD = 4'd2; // 0010 parameter SUB = 4'd6; // 0110 parameter SLT = 4'd7; // 0111 parameter SLL = 4'd3; // 0011 parameter SRL = 4'd4; // 0100 parameter LUI = 4'd5; reg [31:0] ALUOut; reg Zero; always @ (Op1 or Op2 or ALUControl) begin case (ALUControl) AND: ALUOut = Op1 & Op2; OR: ALUOut = Op1 | Op2; ADD: ALUOut = Op1 + Op2; SUB: ALUOut = Op1 - Op2; SLT: ALUOut = (Op1 < Op2) ? 32'd1 : 32'd0; NOR: ALUOut = ~(Op1 | Op2); SLL: ALUOut = Op2 << SHAMT; SRL: ALUOut = Op2 >> SHAMT; LUI: ALUOut = {Op2, 16'd0}; endcase Zero = ( (Op1 - Op2) == 0 ) ? 1 : 0; endendmodule

Page 1 User Bao Nguyen December 04, 2007

D:/Classes/CPR E 305/FinalProject/ALU.v

module ALUControl(ALUControl, OpCode, FUNCT, ALUOp); output [3:0] ALUControl; input [5:0] OpCode, FUNCT; input [1:0] ALUOp; parameter ADD = 2'b00; parameter SUB = 2'b01; parameter IMM = 2'b11; parameter RTYPE = 2'b10; parameter F_AND = 6'd36; parameter F_OR = 6'd37; parameter F_ADD = 6'd32; parameter F_SUB = 6'd34; parameter F_SLT = 6'd42; parameter F_NOR = 6'd39; parameter F_SLL = 6'd0; parameter F_SRL = 6'd2; // OpCodes parameter LUI = 6'd15; parameter SLTI = 6'd10; reg [3:0] ALUControl; /* parameter AND = 4'd0; // 0000 parameter OR = 4'd1; // 0001 parameter ADD = 4'd2; // 0010 parameter SUB = 4'd6; // 0110 parameter SLT = 4'd7; // 0111 parameter NOR = 4'd12; // 1100 */ always @ (FUNCT or ALUOp) begin case (ALUOp) ADD: ALUControl = 4'd2; SUB: ALUControl = 4'd6; IMM: begin case (OpCode) LUI: ALUControl = 4'd5; SLTI: ALUControl = 4'd7; endcasePage 1 User Bao Nguyen December 04, 2007

D:/Classes/CPR E 305/FinalProject/ALUControl.v

end RTYPE: begin case (FUNCT) F_AND: ALUControl = 4'd0; F_OR: ALUControl = 4'd1; F_ADD: ALUControl = 4'd2; F_SUB: ALUControl = 4'd6; F_SLT: ALUControl = 4'd7; F_NOR: ALUControl = 4'd12; F_SLL: ALUControl = 4'd3; F_SRL: ALUControl = 4'd4; endcase end endcase endendmodule


D:/Classes/CPR E 305/FinalProject/ALUControl.v

module Control(PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, PCSource, ALUOp, ALUSrcB, ALUSrcA, RegWrite, RegDst, OpCode, Clock);

output PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, ALUSrcA, RegWrite, RegDst; output [1:0] PCSource, ALUOp, ALUSrcB; input [5:0] OpCode; input Clock; // one-hot encoding of the Moore machine states parameter InstrFetch = 12'd1; parameter InstrDecode = 12'd2; parameter ComputeAddr = 12'd4; parameter MemReadAccess = 12'd8; parameter MemWriteAccess = 12'd16; parameter WriteBack = 12'd32; parameter Execution = 12'd64; parameter RTYPECompletion = 12'd128; parameter BranchCompletion = 12'd256; parameter JumpCompletion = 12'd512; parameter ImmCompletion = 12'd1024; parameter ComputeImm = 12'd2048; // OpCode constants parameter RTYPE = 6'd0; parameter LW = 6'd35; parameter LUI = 6'd15; parameter SW = 6'd43; parameter BEQ = 6'd4; parameter ADDI = 6'd8; parameter SLTI = 6'd10; parameter J = 6'd2; reg PCWriteCond, PCWrite, IorD, MemRead, MemWrite,MemToReg, IRWrite, ALUSrcA, RegWrite, RegDst; reg [1:0] PCSource, ALUOp, ALUSrcB;Page 1 User Bao Nguyen December 04, 2007

D:/Classes/CPR E 305/FinalProject/Control.v

reg [11:0] State; // sequential logic always @ (posedge Clock) begin case (State) InstrFetch: State <= InstrDecode; InstrDecode: begin case (OpCode) RTYPE: State <= Execution; LW: State <= ComputeAddr; LUI: State <= ComputeImm; SW: State <= ComputeAddr; BEQ: State <= BranchCompletion; ADDI: State <= ComputeAddr; SLTI: State <= ComputeImm; J: State <= JumpCompletion; endcase end ComputeAddr: begin case (OpCode) LW: State <= MemReadAccess; SW: State <= MemWriteAccess; ADDI: State <= ImmCompletion; endcase end ComputeImm: State <= ImmCompletion; MemReadAccess: State <= WriteBack; MemWriteAccess: State <= InstrFetch; WriteBack: State <= InstrFetch; Execution: State <= RTYPECompletion; ImmCompletion: State <= InstrFetch; RTYPECompletion: State <= InstrFetch; BranchCompletion: State <= InstrFetch; JumpCompletion: State <= InstrFetch; default: State <= InstrFetch; endcase end // combinational logic always @ (State) begin // we want everything to be zero if it is not explicitly set in each statePage 2 User Bao Nguyen December 04, 2007


PCWriteCond = 0; PCWrite = 0; MemRead = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0; ALUOp = 2'b00; // muxes IorD = 0; MemToReg = 0; PCSource = 0; ALUSrcB = 2'b00; ALUSrcA = 0; RegDst = 0; case (State) InstrFetch: begin // control signals MemRead = 1; IRWrite = 1; PCWrite = 1; // muxes ALUSrcA = 0; IorD = 0; ALUSrcB = 2'b01; ALUOp = 2'b00; PCSource = 2'b00; end InstrDecode: begin // muxes ALUSrcA = 0; ALUSrcB = 2'b11; ALUOp = 2'b00; end ComputeAddr: begin // muxes ALUSrcA = 1; ALUSrcB = 2'b10; ALUOp = 2'b00; endPage 3 User Bao Nguyen December 04, 2007


ComputeImm: begin // muxes ALUSrcA = 1; ALUSrcB = 2'b10; ALUOp = 2'b11; end MemReadAccess: begin // control signals MemRead = 1; IorD = 1; end MemWriteAccess: begin // control signals MemWrite = 1; IorD = 1; end WriteBack: begin // control signals RegWrite = 1; // muxes RegDst = 0; MemToReg = 1; end ImmCompletion: begin // control signals RegWrite = 1; // muxes RegDst = 0; MemToReg = 0; end Execution: begin // muxes ALUSrcA = 1; ALUSrcB = 2'b00; ALUOp = 2'b10; end RTYPECompletion: begin // control signals RegWrite = 1; // muxesPage 4 User Bao Nguyen December 04, 2007


RegDst = 1; MemToReg = 0; end BranchCompletion: begin // control signals PCWriteCond = 1; // muxes ALUSrcA = 1; ALUSrcB = 2'b00; ALUOp = 2'b01; PCSource = 2'b01; end JumpCompletion: begin // control signals PCWrite = 1; // muxes PCSource = 2'b10; end default: begin // control signals MemRead = 1; IRWrite = 1; PCWrite = 1; // muxes ALUSrcA = 0; IorD = 0; ALUSrcB = 2'b01; ALUOp = 2'b00; PCSource = 2'b00; end endcase endendmodule



module CPU(Cycle, PCReg, RegA, RegB, ALUOutReg, Instruction, Clock); output [31:0] Cycle, PCReg, RegA, RegB, ALUOutReg,Instruction; input Clock; wire [5:0] OpCode; wire PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, ALUSrcA, RegWrite, RegDst, ALUOutWrite; wire [1:0] PCSource, ALUOp, ALUSrcB; Control control(PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, PCSource, ALUOp,ALUSrcB, ALUSrcA, RegWrite, RegDst, OpCode, Clock); DataPath data_path(Cycle, PCReg, RegA, RegB, ALUOutReg, Instruction, OpCode, PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, PCSource, ALUOp, ALUSrcB, ALUSrcA, RegWrite, RegDst, Clock);endmodule


D:/Classes/CPR E 305/FinalProject/CPU.v

module CPU_tb; wire [31:0] Cycle, PCReg, RegA, RegB, ALUOutReg, Instruction; reg Clock; CPU cpu(Cycle, PCReg, RegA, RegB, ALUOutReg, Instruction, Clock); always @ (posedge Clock) begin $monitor("Cycle: %d, PCReg: %b, Instruction: %b, RegA: %d, RegB: %d, ALUOutReg: %d", Cycle, PCReg, Instruction, RegA, RegB, ALUOutReg); endendmodule;


D:/Classes/CPR E 305/FinalProject/CPU_tb.v

module DataPath(Cycle, PCReg, RegA, RegB, ALUOutReg, Instruction, OpCode, PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, PCSource, ALUOp, ALUSrcB, ALUSrcA, RegWrite, RegDst, Clock); // debugging output variables output [31:0] Cycle, PCReg, ALUOutReg, RegA, RegB,Instruction; output [5:0] OpCode; input PCWriteCond, PCWrite, IorD, MemRead, MemWrite, MemToReg, IRWrite, ALUSrcA, RegWrite, RegDst, Clock; input [1:0] PCSource, ALUOp, ALUSrcB; wire [31:0] ExtendedJumpAddr, ExtendedImmediate, PC, ALUOut, ReadData1, ReadData2, MemData, Address, MemWriteData, RegWriteData, Op1, Op2; wire [25:0] JumpAddr; wire [15:0] Immediate; wire [5:0] FUNCT; wire [4:0] RS, RT, RD, SHAMT, WriteAddr; wire [3:0] ALUControl; wire Zero; reg [31:0] Cycle, PCReg, ALUOutReg, RegA, RegB, MemDataReg, BranchReg;

initial begin PCReg = 0; Cycle = 0; end

Memory memory(MemData, Address, MemWriteData, MemRead, MemWrite, Clock); InstructionRegister instr_reg(Instruction, JumpAddr, Immediate, OpCode, RS, RT, RD, SHAMT, FUNCT,MemData, IRWrite, Clock); RegisterFile regfile(ReadData1, ReadData2, RegWriteData, RS, RT, WriteAddr, RegWrite, Clock);Page 1 User Bao Nguyen December 04, 2007

D:/Classes/CPR E 305/FinalProject/DataPath.v

ALUControl alu_control(ALUControl, OpCode, FUNCT, ALUOp); ALU alu(ALUOut, Zero, Op1, Op2, SHAMT, ALUControl);

Mux4to1 PCMux(PC, ALUOut, ALUOutReg, ExtendedJumpAddr, 32'bx, PCSource); Mux4to1 Op2Mux(Op2, RegB, 32'd1, ExtendedImmediate, ExtendedImmediate, ALUSrcB); assign PCWriteControl = (Zero & PCWriteCond) | PCWrite; assign ExtendedJumpAddr = {PCReg[31:28], {2{JumpAddr[25]}}, JumpAddr}; // word addressable so shifting left by 2 is unnecessary; sign-extend instead assign ExtendedImmediate = {{16{Immediate[15]}}, Immediate}; assign Address = IorD ? ALUOutReg : PCReg; assign WriteAddr = RegDst ? RD : RT; assign RegWriteData = MemToReg ? MemDataReg : ALUOutReg; assign MemWriteData = RegB; assign Op1 = ALUSrcA ? RegA : PCReg; always @ (posedge Clock) begin Cycle <= Cycle + 1; if (PCWriteControl) begin PCReg <= PC; end MemDataReg = MemData; RegA <= ReadData1; RegB <= ReadData2; ALUOutReg <= ALUOut; endendmodule


D:/Classes/CPR E 305/FinalProject/DataPath.v

module InstructionRegister(Instruction, JumpAddr, Immediate, OpCode, RS, RT, RD, SHAMT, FUNCT, MemData, IRWrite, Clock); output [31:0] Instruction; output [25:0] JumpAddr; output [15:0] Immediate; output [5:0] OpCode, FUNCT; output [4:0] RS, RT, RD, SHAMT; input [31:0] MemData; input IRWrite, Clock; reg [31:0] Instruction; always @ (posedge Clock) begin if (IRWrite) begin Instruction <= MemData; end end assign OpCode = Instruction[31:26]; assign RS = Instruction[25:21]; assign RT = Instruction[20:16]; assign RD = Instruction[15:11]; assign SHAMT = Instruction[10:6]; assign FUNCT = Instruction[5:0]; assign Immediate = Instruction[15:0]; assign JumpAddr = Instruction[25:0];endmodule


D:/Classes/CPR E 305/FinalProject/InstructionRegister.v

module Memory(MemData, Address, WriteData, MemRead, MemWrite, Clock); output [31:0] MemData; input [31:0] Address, WriteData; input MemRead, MemWrite, Clock; reg [31:0] Memory [0:1023]; initial begin // Load the program into memory // register usage: $0 = 0 (constant), $1 = f1 = 1, $2 = f2 = -1, $3 = (n - 1)th Fibonacci number to calculate, $4 = 1 (constant) // // lw $rt, $rs, Imm // 0 lw $1, $0, 50 // 1 lw $2, $0, 51 // 2 lw $3, $0, 52 // // 3 LOOP: beq $3, $0, 4 # Done with loop if n = 0 000100 00011 00000 0000000000000101 // 4 add $1, $1, $2 # f1 = f1 + f2 000000 00001 00010 00001 00000 100000 // 5 sub $2, $1, $2 # f2 = f1 - f2 000000 00001 00010 00010 00000 100010 // 6 addi $3, $3, -1 # n = n + -1 000000 00011 00100 00011 00000 100000 // 7 j 3 # repeat until n =0 000100 00000 00000 0000000000000000 // 8 END: sw $1, 53($0) # store result @ address 53 101011 00001 00000 0000000000000000 // // This part of the program tests various instructions // 9 lui $1, 53; // test lui // 10 lui $2, 24; // 11 srl $1, $1, 16; // srl $1 = $1 >> 16 // 12 srl $2, $2, 16; // $2 = 24 // 13 sll $2, $2, 1; // $2 = 48 because we shift left by 1 // 14 nor $3, $1, $2; // test $3 = 53 NOR 48 // 15 sw $3, 54($0); // store result @54 // 16 and $3, $1, $2; // test $3 = 53 AND 48Page 1 User Bao Nguyen December 04, 2007

D:/Classes/CPR E 305/FinalProject/Memory.v

// 17 sw $3, 55($0); // store result @55 // 18 or $3, $1, $2; // test $3 = 53 OR 48 // 19 sw $3, 56($0); // store result @56 // 20 slt $3, $1, $2; // $3 = 53 < 48 ? 1 : 0 = 0 // 21 sw $3, 57($0); // store result @57 // 22 slt $3, $2, $1; // $3 = 48 < 43 ? 1 : 0 = 1 // 23 sw $3, 58($0); // store result @58 // 24 slti $3, $1, 37; // $3 = 53 < 37 ? 1 : 0= 0 // 25 sw $3, 59($0); // store result @59 // 26 slti $3, $1, 64; // $3 = 53 < 64 ? 1 : 0= 1 // 27 sw $3, 60($0); // store result @60 Memory[0] = {6'd35, 5'd0, 5'd1, 16'd50}; Memory[1] = {6'd35, 5'd0, 5'd2, 16'd51}; Memory[2] = {6'd35, 5'd0, 5'd3, 16'd52}; Memory[3] = {6'd4, 5'd3, 5'd0, 16'd4}; Memory[4] = {6'd0, 5'd1, 5'd2, 5'd1, 5'd0, 6'd32}; Memory[5] = {6'd0, 5'd1, 5'd2, 5'd2, 5'd0, 6'd34}; Memory[6] = {6'd8, 5'd3, 5'd3, -16'd1}; Memory[7] = {6'd2, 26'd3}; Memory[8] = {6'd43, 5'd0, 5'd1, 16'd53}; Memory[9] = {6'd15, 5'd0, 5'd1, 16'd53}; Memory[10] = {6'd15, 5'd0, 5'd2, 16'd24}; Memory[11] = {6'd0, 5'd0, 5'd1, 5'd1, 5'd16, 6'd2}; Memory[12] = {6'd0, 5'd0, 5'd2, 5'd2, 5'd16, 6'd2}; Memory[13] = {6'd0, 5'd0, 5'd2, 5'd2, 5'd1, 6'd0}; Memory[14] = {6'd0, 5'd1, 5'd2, 5'd3, 5'd0, 6'd39}; Memory[15] = {6'd43, 5'd0, 5'd3, 16'd54}; Memory[16] = {6'd0, 5'd1, 5'd2, 5'd3, 5'd0, 6'd36}; Memory[17] = {6'd43, 5'd0, 5'd3, 16'd55}; Memory[18] = {6'd0, 5'd1, 5'd2, 5'd3, 5'd0, 6'd37};Page 2 User Bao Nguyen December 04, 2007


Memory[19] = {6'd43, 5'd0, 5'd3, 16'd56}; Memory[20] = {6'd0, 5'd1, 5'd2, 5'd3, 5'd0, 6'd42}; Memory[21] = {6'd43, 5'd0, 5'd3, 16'd57}; Memory[22] = {6'd0, 5'd2, 5'd1, 5'd3, 5'd0, 6'd42}; Memory[23] = {6'd43, 5'd0, 5'd3, 16'd58}; Memory[24] = {6'd10, 5'd1, 5'd3, 16'd37}; Memory[25] = {6'd43, 5'd0, 5'd3, 16'd59}; Memory[26] = {6'd10, 5'd1, 5'd3, 16'd64}; Memory[27] = {6'd43, 5'd0, 5'd3, 16'd60}; Memory[50] = 32'd1; // the constant 1 to load into $1 (f1) Memory[51] = -32'd1; // the constant -1 to load into $2 (f2) Memory[52] = 32'd8; // the (n-1)th Fibonacci number to calculate goes into $3 end // read from memory assign MemData = MemRead ? Memory[Address] : 0; // the pc is incremented by 4 bytes, which is the nextword. // this is due to the fact that the system is byte addressable. // we will ignore this, which will simplify the designby not // requiring the immediate value to be shifted left by2 on a branch. // write to memory always @ (posedge Clock) begin if (MemWrite) begin Memory[Address] <= WriteData; end endendmodule



Memory Output.txt 12/4/2007

1 // memory data file (do not edit the following line - required for mem load use)

2 // instance=/CPU_tb/cpu/data_path/memory/Memory3 // format=mti addressradix=h dataradix=s version=1.0 wordsperline=24 0: 10001100000000010000000000110010 100011000000001000000000001100115 2: 10001100000000110000000000110100 000100000110000000000000000001006 4: 00000000001000100000100000100000 000000000010001000010000001000107 6: 00100000011000111111111111111111 000010000000000000000000000000118 8: 10101100000000010000000000110101 001111000000000100000000001101019 a: 00111100000000100000000000011000 0000000000000001000011000000001010 c: 00000000000000100001010000000010 0000000000000010000100000100000011 e: 00000000001000100001100000100111 1010110000000011000000000011011012 10: 00000000001000100001100000100100 1010110000000011000000000011011113 12: 00000000001000100001100000100101 1010110000000011000000000011100014 14: 00000000001000100001100000101010 1010110000000011000000000011100115 16: 00000000010000010001100000101010 1010110000000011000000000011101016 18: 00101000001000110000000000100101 1010110000000011000000000011101117 1a: 00101000001000110000000001000000 1010110000000011000000000011110018 1c: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx19 1e: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx20 20: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx21 22: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx22 24: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx23 26: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx24 28: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx25 2a: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx26 2c: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx27 2e: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx28 30: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx29 32: 00000000000000000000000000000001 1111111111111111111111111111111130 34: 00000000000000000000000000001000 0000000000000000000000000000110131 36: 11111111111111111111111111001010 0000000000000000000000000011000032 38: 00000000000000000000000000110101 0000000000000000000000000000000033 3a: 00000000000000000000000000000001 0000000000000000000000000000000034 3c: 00000000000000000000000000000001 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx35 3e: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx36 40: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx37 42: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx38 44: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx39 46: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx40 48: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx41 4a: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx42 4c: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx43 4e: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx44 50: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx45 52: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx46 54: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx47 56: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx48 58: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx49 5a: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx50 5c: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx51 5e: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx52 60: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Page 1

module Mux4to1(Output, Input0, Input1, Input2, Input3,Select); output [31:0] Output; input [31:0] Input0, Input1, Input2, Input3; input [1:0] Select; reg [31:0] Output; always @ (Input0 or Input1 or Input2 or Input3 or Select) begin case (Select) 0: Output = Input0; 1: Output = Input1; 2: Output = Input2; 3: Output = Input3; endcase endendmodule


D:/Classes/CPR E 305/FinalProject/Mux4to1.v

module RegisterFile(ReadData1, ReadData2, WriteData, ReadAddr1, ReadAddr2, WriteAddr, RegWrite, Clock); output [31:0] ReadData1, ReadData2; input [31:0] WriteData; input [4:0] ReadAddr1, ReadAddr2, WriteAddr; input RegWrite, Clock; reg [31:0] RegFile [0:31]; initial begin RegFile[0] = 32'd0; end assign ReadData1 = RegFile[ReadAddr1]; assign ReadData2 = RegFile[ReadAddr2]; always @ (posedge Clock) begin if (RegWrite) begin RegFile[WriteAddr] <= WriteData; end endendmodule


D:/Classes/CPR E 305/FinalProject/RegisterFile.v

Untitled 12/4/2007

1 # Cycle: 1, PCReg: 00000000000000000000000000000000, Instruction: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, RegA: x, RegB: x, ALUOutReg: x

2 # Cycle: 2, PCReg: 00000000000000000000000000000001, Instruction: 10001100000000010000000000110010, RegA: x, RegB: x, ALUOutReg: 1

3 # Cycle: 3, PCReg: 00000000000000000000000000000001, Instruction: 10001100000000010000000000110010, RegA: 0, RegB: x, ALUOutReg: 51


5 # Cycle: 5, PCReg: 00000000000000000000000000000001, Instruction: 10001100000000010000000000110010, RegA: 0, RegB: x, ALUOutReg: x


7 # Cycle: 7, PCReg: 00000000000000000000000000000010, Instruction: 10001100000000100000000000110011, RegA: 0, RegB: 1, ALUOutReg: 2












19 # Cycle: 19, PCReg: 00000000000000000000000000000100, Instruction: 00010000011000000000000000000100, RegA: 8,

Page 1

Untitled 12/4/2007

RegB: 0, ALUOutReg: 820 # Cycle: 20, PCReg: 00000000000000000000000000000101,

Instruction: 00000000001000100000100000100000, RegA: 8, RegB: 0, ALUOutReg: 5


















38 # Cycle: 38, PCReg: 00000000000000000000000000000101,

Page 2

Untitled 12/4/2007




















Page 3

Untitled 12/4/2007




















Page 4

Untitled 12/4/2007




















94 # Cycle: 94, PCReg: 00000000000000000000000000000101,

Page 5

Untitled 12/4/2007




















Page 6

Untitled 12/4/2007




















Page 7

Untitled 12/4/2007




















150 # Cycle: 150, PCReg: 00000000000000000000000000000110,

Page 8

Untitled 12/4/2007




















Page 9

Untitled 12/4/2007




















Page 10

Untitled 12/4/2007




















206 # Cycle: 206, PCReg: 00000000000000000000000000010011,

Page 11

Untitled 12/4/2007




















Page 12

Untitled 12/4/2007




















Page 13

Untitled 12/4/2007


Instruction: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, RegA: 0, RegB: 1, ALUOutReg: 29

Page 14

Appendix C

• If there is a bug related to timing, be sure to check the use of blocking and non‐blocking

assignments.

• When programming edge triggered elements, be sure to make a distinction between what

values are actually edge triggered and which values are combinational. Usually the

combinational part is reliant upon the edge triggered values. Therefore if values are incorrectly

assigned to be edge triggered when they should be combinational, then other elements that

may receive bad data values.

• Make sure the name of the modules match the file name.

• Make sure initial, always, and if blocks have a begin and an end if there are multiple sub‐

statements.

• End a case with the endcase statement

• Make sure ports are declared correctly and have the correct bus size

• Remember only reg data types can be used within always and initial blocks

• Some elements cannot be synthesized, such as the initial block. So take care to make a

distinction between a synthesizable design and pure behavioral design.

MIPS CPU: Core Instruction Set Implementation - ULPGCnunez/clases-micros-para-com/clases-mpc-slide… · Phong Nguyen 1 MIPS CPU: Core Instruction Set Implementation Purpose This

Documents