Direct Mapped Cache

Memory Address

TAG LINE WORD

4bit

s

2bit

s

2bit

s

IMPLEMENTATION OF “DIRECT MAPPED CACHE”, IN BEHAVIORIAL VERILOG

ECE254 ASSIGNMENT 1 NEERAJ DHOTRE (perm:5483615)

1. Introduction:

Memory hierarchy is imperative due to prevalent highly pipelined and super scalar architectures. As main memory

access is lot slower compared to other tasks in the pipeline, data and instructions are stored closed to the processor in

a small and comparatively faster memory, called cache. The main aspects of cache design are cache size, memory

mapping function, write policy and replacement algorithm. In a direct mapped cache each block of memory is

mapped to a particular row in the cache. The mapping function is simple to implement but performance of this type

of cache is not the best. The synchronization of the cache with main memory, handling read miss and write miss etc.

make direct mapped cache a good candidate for this assignment, aim of which is to learn Verilog modeling and

design simulation with Model Sim.

2. Cache Design:

2.1 Assumptions: The cache is designed with the following assumptions.

This cache lies between the Processor and the Main memory.

The cache and processor run on the same fast clock.

Main memory is a single port synchronous DRAM running on a slower clock.(4 times slower)

Processor sends physical address to the cache.

Processor sends/requests one word data (32 bits wide) at a time.

The cache implements „write through‟ write policy with “No write allocate” i.e. on a write miss

data is written only to main memory.

2.2 Cache and Main Memory Size:

To keep the cache size, memory size, data with etc. flexible for any cache module instance parameters are used.

Parameters define constants which can be changed during instantiation. The main parameters with the values specified were used for simulations in this assignment.

parameter ADDR_SIZE = 8; meaning 8 bit address and 28 Main memory locations.

parameter DATA_SIZE = 32; meaning processor is 32 bit and each main memory location is 32bit making it

256 x 4 byte or 1024B memory.

parameter LINE_BITS = 2; meaning 4 lines in the cache.

parameter LINES = 1 << LINE_BITS;

parameter WORD_BITS = 2; meaning 4 words per line in the cache, making it 64B memory.

parameter WORDS = 1 << WORD_BITS;

So according to these sizes the address is broken like shown is figure 1. for direct mapping in the cache.

Figure 1. Address break up for direct mapping.

Cache Memory data 8

rd_en 7 addr

clk

reset

Chip_select

mem_addr

mem_wr

mem_data

rd_done

wr_done

8

busy

data_valid

2.3 Block Diagram:

The block diagram is shown in figure 2. Behavioral model is written for the cache and main memory blocks.

The signals from processor are given as stimulus is tech bench. Table 2 lists the ports of cache.

Figure 2. Block Diagram showing signal connections.

Port Direction Description

clk input Common clock between cache and processor

reset input Synchronous reset to the cache

rd_en input HIGH for read from cache LOW for write to cache

data [31:0] bidirectional Data from/to processor Direction determined by rd_en

addr[7:0] input Address from the processor

data_valid output Active high signal indicating output data to processor is valid

Busy output Active High signal indication that cache is busy. Processor will not send another

request when cache busy.

mem_addr[7:0] output Address bus to main memory

mem_wr output HIGH for read from main memory LOW for write to main memory

chip_select output Signal to enable main memory access

mem_data[31:0] bidirectional Data from/to main memory

rd_done input Signal from main memory that requested read operation done

wr_done input Signal from main memory that requested write operation done

Table 1. Ports of cache with direction and descriptions.

Processor

Cache

Cache memory

tag word 1 word 2 word 3 word 4

Main Memory

Register Size Description

cache_hit_reg 1 bit Indicates a tag match, meaning requested address present in cache

line 2 bit To store line index to cache from input address

tag 4bit To store location tag from input address

count 2 bit To keep track of number of main memory reads in case of read miss

data_out 32bit Registered data out before driving it onto the bidir data bus to processor

mem_data_out 32bit Registered data out before driving it onto the bidir data bus to Memory

mem_data_reg0 to 3 32bit 4 registers to store data words read in from main memory Table 2. Internal registers used in the behavioral model

3. Verilog Implementation

3.1 Verilog code

The verilog code for the cache is given in appendix A. The design is implemented in 6 always blocks which execute simultaneously. There are 2 combinational blocks and 4 sequential blocks. These blocks do

the following logical tasks and together model cache behavior.

3.1.1 Combinational Blocks:

I. Tag comparison: This block always checks weather the tag of line mentioned in input address

matches with that in the address. It sets the cache_hit_reg if there is a tag match irrespective of read

or write operation

II. Memory select: This blocks controls the enabling of Main Memory. The Main Memory needs to

be enabled only when data is needed to be transferred to/from it. This gives better control over the

rd_done and wr_done signals given out by the Main Memory.

3.1.2 Sequential Blocks:

I. Cache Hit: Only if Tag comparison is successful this block executes and does the required data

manipulation.

II. Cache Miss: Only if Tag comparison is un successful this block executes and does the required data

manipulation.

III. Data Synchronizing from Memory: There two blocks, one runs on posedge clk and other on posedge

rd_done. These are required to synchronize the reads from memory in case of a read miss, as cache

and memory run at different asynchronous clocks.

3.2 Test bench

The test bench code is present in appendix C. The test bench runs 4 test cases to test the functionality of

the direct mapped cache. The clk signal is given a period of 10ns and mem_clk period is 40ns

1) Write Miss: Initially there is nothing in the cache or Memory. Processor issues 4 writes to

consecutive memory locations all of which result in a cache write miss. The data is written only to main

memory. As seen in the waveform data 56,57,58,59 were written to memory location 120,121,122 and

123 respectively. Cache_hit_reg signal was always low meaning a cache miss and proper busy pulses

were given to the processor form every right.

Figure 3. Wave forms showing Cache write miss test case.

2) Read Miss: Now the test bench requests the data written in the previous step. This results

in a read miss and cache brings the data from main memory. In this case as the memory has only

one word at each location, cache has to do 4 reads to get a block of data and replace a line. As

seen in the waveform in figure 4 the processor requests data at location 120 resulting in a read

miss. This triggers 4 reads from main Memory. Required data is given to processor with

data_valid and the cache line 2 is written with 4 words (56,57,58,59).

Figure 4. Wave forms showing Cache read miss test case

3) Read Hit: Again the processor requests same data. This time it is a cache hit as the data

was brought into the cache in the previous step. The data requested was at location 122 and as

seen in figure 5. Correctly data 58 was returned.

4) Write Hit: Now the processor writes a word to the cache at the same address from which

it read in last step. This results in a cache hit and the data is written properly. The data 60 is

requested to be written at location 122. As seen in the waveform in figure 5. correctly 60 is

written to the cache. According to write through method this data is written to main memory too.

Figure 5. Wave forms showing Cache read and write hit test case

4. APPENDIX A CACHE Verilog code.

/*######################################################################

---------------------------

SIMPLE DIRECT MAPPED CACHE

---------------------------

Input address is broken like this [----TAG----| -----LINE------|---WORD----]

TAG ---> cache tag

LINE --> index for the line in cache. 2^line = number of lines in cache

WORD --> bits to address word in cache line. 2^word = number of data words in cache line.

######################################################################*/

module cache (

clk, //clock. same as cpu clock.

addr, //address from cpu.

rd_en, //HIGH for read from cache.LOW for write to cache.

data, //bidir data from/to CPU

mem_addr, //address to main Memory.

mem_wr, //HIGH for write to Memory.LOW for read from Memory

mem_data, //bidir data to/from main Memory

rd_done, //read done signal from main Memory

wr_done, //write done signal from main Memory

data_valid, //Signal telling CPU data is valid to read.

busy, //telling CPI cache busy when read miss.

reset, //reset to cache.

chip_select //select signal to enable memory.

);

parameter ADDR_SIZE = 8;

parameter LINE_BITS = 2;

parameter LINES = 1 << LINE_BITS;

parameter WORD_BITS = 2;

parameter WORDS = 1 << WORD_BITS;

parameter DATA_SIZE = 32;

parameter TAG_SIZE = ADDR_SIZE - LINE_BITS - WORD_BITS;

parameter LINE_WIDTH = DATA_SIZE * 4 + TAG_SIZE;

parameter TAG_INDEX_1 = ADDR_SIZE - TAG_SIZE;

parameter TAG_INDEX_2 = LINE_WIDTH - TAG_SIZE;

input clk;

input reset;

input [ADDR_SIZE-1:0] addr;

input rd_en;

input wr_done;

input rd_done;

inout [DATA_SIZE-1:0] data;

inout [DATA_SIZE-1:0] mem_data;

output chip_select;

output [ADDR_SIZE-1:0] mem_addr;

output mem_wr;

output data_valid;

output busy;

reg chip_select;

reg data_valid;

reg busy;

reg [1:0] count;

reg [ADDR_SIZE-1:0] mem_addr;

reg mem_wr;

reg [DATA_SIZE-1:0] data_out;

reg [DATA_SIZE-1:0] mem_data_out;

reg [DATA_SIZE-1:0] mem_data_reg_0;




reg cache_hit_reg;

reg [LINE_WIDTH-1:0] memory [LINES-1:0];

reg [LINE_WIDTH-1:0] line;

reg [LINE_WIDTH-1:0] tag;

wire [LINE_BITS-1:0] line_index;

assign line_index = addr[(WORD_BITS+LINE_BITS)-1:WORD_BITS];

assign data = (rd_en) ? data_out:{DATA_SIZE{1'bz}};

assign #5 mem_data = (!rd_en && chip_select) ? mem_data_out:{DATA_SIZE{1'bz}};

always @ (rd_done or wr_done or count )

begin

if (rd_done || wr_done && count == 2'b00)

begin

chip_select = 1'b0;

busy = 1'b0;

end

else if (!rd_done && !wr_done && (count != 2'b00) )

begin

chip_select = 1'b1;

busy = 1'b1;

end

end

//storing data red from main memory.

always @ (posedge rd_done)

begin

if (count == 2'b11)

begin

count <= 2'b00;

end

else

count <= count + 1;

case (mem_addr[1:0])

2'b00: begin mem_data_reg_0 <= mem_data; mem_addr[1:0] <= 2'b01; end




default: mem_data_reg_0 = mem_data;

endcase

end

//BLOCK to handle data in cache miss

always @ (posedge clk)

begin

//if read miss get data from main Memory.

if ( rd_en && !cache_hit_reg && count == 2'b01 )

begin

if(rd_done)

begin

data_out <= mem_data;

data_valid <= 1'b1;

end

else

data_valid <= 1'b0;

end

// wriritng for words read from main memory.

if ( rd_en && !cache_hit_reg && (count == 2'b00))

begin

memory[line_index] <= { addr[ADDR_SIZE-1:TAG_INDEX_1], mem_data_reg_0, mem_data_reg_1,

mem_data_reg_2, mem_data_reg_3 };

busy <= 1'b0;

end

//if write miss, write data to memory. After data written tell cpu

//that CACHE not busy

if ( wr_done && !rd_en && !cache_hit_reg)

begin

//busy <= 1'b0;

mem_wr <= 1'b0;

chip_select <=1'b0;

end

end

// BLOCK FOR TAG MATCH. CACHE HIT

always @ (posedge clk or reset)

begin

if (reset)

begin

line <= {LINE_WIDTH{1'b0}};

tag <= {LINE_WIDTH{1'b0}};

busy <= 1'b0;

count <= 1'b0;

chip_select <= 1'b0;

end

else

begin

//check tag every cycle

tag <= memory [line_index];

//read request cache hit. return data.

if( rd_en && cache_hit_reg)

begin

line <= memory [line_index];

case (addr[1:0])

0:data_out <= line[LINE_WIDTH-1:(DATA_SIZE*3)];

1:data_out <= line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)];

2:data_out <= line[(DATA_SIZE * 2)-1:DATA_SIZE];

3:data_out <= line[DATA_SIZE-1:0];

default:data_out <= {DATA_SIZE{1'b0}};

endcase

data_valid <= 1'b1;

end

//write request cache hit write data to cache and memory

else if ( !rd_en && cache_hit_reg)

begin

line <= memory [line_index];

case (addr[1:0])

0:line[LINE_WIDTH-1:(DATA_SIZE*3)] <= data;

1:line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)] <= data;

2:line[(DATA_SIZE * 2)-1:DATA_SIZE] <= data;

3:line[DATA_SIZE-1:0] <= data;

default:line <= {DATA_SIZE{1'b0}};

endcase

memory[line_index] <= {addr[ADDR_SIZE-1:TAG_INDEX_1] , line};

mem_addr <= addr;

mem_wr <= 1'b1;

chip_select <= 1‟b1;

mem_data_out <= data;

end

end

end

// BLOCK FOR TAG MIS MATCH. CACHE MISS

always @ (posedge clk )

begin

//Read miss, read from main memory. Send data to CPU later.

if( !rd_done && rd_en && !cache_hit_reg && !busy && count == 2'b00)

begin


mem_addr <= addr;

mem_wr <= 1'b0;

busy <= 1'b1;

end

//Write miss, write to main memory.

if( !wr_done && !rd_en && !cache_hit_reg && !busy )

begin


mem_addr <= addr;

mem_wr <= 1'b1;

mem_data_out <= data;

busy <= 1'b1;

end

end

//tag conparison.

always @ ( addr or tag or count or rd_done or wr_done)

begin

if ( (addr[ADDR_SIZE-1:TAG_INDEX_1] === tag[LINE_WIDTH-1:TAG_INDEX_2]) && count == 2'b00 && !busy && !rd_done &&

!wr_done )

begin

cache_hit_reg = 1'b1;

end

else

begin

cache_hit_reg = 1'b0;

end

end

endmodule

5. APPENDIX B Memory Verilog code.

//single port memory.

//simple behaviorial description.

module dram (

clk,

addr,

data,

wr_en, //high for write,low for read

rd_done,

wr_done,

chip_select

);

//busy signal?



parameter MEM_SIZE = 1 << ADDR_SIZE;

input clk;

input wr_en;

input [ADDR_SIZE-1:0] addr;

input chip_select;

inout [DATA_SIZE-1:0] data;

output rd_done;

output wr_done;

wire [1:0] word_loc;

wire [ADDR_SIZE-1:2] line_index;

reg [DATA_SIZE-1:0] data_out;

reg [DATA_SIZE-1:0] memory [MEM_SIZE-1:0];

reg rd_done;

reg wr_done;

assign data = (!wr_en && rd_done)? data_out : {DATA_SIZE{1'bz}};

always @ ( posedge clk )

begin

if( wr_en && chip_select )

begin

memory[addr] <= data;

wr_done <= 1'b1;

end

else

wr_done <= 1'b0;

end

always @ ( posedge clk )

begin

if ( !wr_en && chip_select)

begin

data_out <= memory[addr];

rd_done <= 1'b1;

end

else

rd_done <= 1'b0;

end

endmodule

6. APPENDIX C Test bench

`include "memory.v"

`include "cache.v"

module cache_tb;



reg clk;

reg mem_clk;

reg reset;

reg [ADDR_SIZE-1:0] addr;

reg rd_en;

reg [DATA_SIZE-1:0] data_in;

wire chip_select;

wire rd_done;

wire wr_done;

wire [DATA_SIZE-1:0] data;

wire [DATA_SIZE-1:0] mem_data;

wire [ADDR_SIZE-1:0] mem_addr;

wire mem_wr;

wire data_valid;

wire busy;

initial

begin

clk = 0;

mem_clk = 0;

reset = 1;

#20 reset =0;

//WRITE MISS

//write 4 times to addresses 120 to 123.

addr = 120;

#2 rd_en = 0;

data_in = 56;

#60 addr = 121;

data_in = 57;

#60 addr = 122;

data_in = 58;

#60 addr = 123;

data_in = 59;

//READ MISSS

#175 rd_en = 1'b1;

//#25 rd_en = 1'b1;

addr = 120;

//READ HIT

#280 addr = 122;

//WRITE HIT

#50 rd_en = 1'b0;

data_in = 60;

#400 rd_en = 1'bz;

//$readmemh("dram.list",

end

assign data = (!rd_en)? data_in:{DATA_SIZE{1'bz}};

always

#5 clk = !clk;

always

#20 mem_clk = !mem_clk;

cache direct_mapped(

.clk(clk),

.reset(reset),

.addr(addr),

.rd_en(rd_en),

.data(data),

.mem_addr(mem_addr),

.mem_wr(mem_wr),

.mem_data(mem_data),

.rd_done(rd_done),

.wr_done(wr_done),

.data_valid(data_valid),

.busy(busy),

.chip_select(chip_select)

);

dram main_memory(

.clk(mem_clk),

.addr(mem_addr),

.data(mem_data),

.wr_en(mem_wr),

.rd_done(rd_done),

.wr_done(wr_done),

.chip_select(chip_select)

);

//initial

//#100 $finish;

endmodule

7. REFERENCES

http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/07cache/cache%20memory.htm

http://www.asic-world.com/verilog/index.html

http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/07cache/cache%20memory.htm

http://www.asic-world.com/verilog/index.html