Asynchronous FIFO

Asynchronous FIFO: Simulation using ModelsimNote: Diagram numbers are continued from the previous post.

Test bench strategy is to generate all corner conditions like full and empty. Simulation

waveforms are shown in Figure (11) to Figure (13). These waveforms are generated using test

bench program provided in previous article. Read clock frequency 50 MHz and write clock

frequency 10 MHz are generated using initial procedural statements:

initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; endinitial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end

Duty cycle of r_clk is 10 nS and w_clk is 50 nS.

Other conditions like w_en, r_en etc are generated using below set of statements:

initial begin d_in=1;@(posedge w_en);repeat(20) @(posedge w_clk) d_in=d_in+2; repeat(20) @(posedge w_clk) d_in=d_in-1; endinitial begin reset=1;#30 reset=0;endinitial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join endinitial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join end

In the above two statements ‘fork’ and ‘join’ are used to start the simulation at zero

simulation time. Reset signal is activated for first 30 nS and then it is deactivated. For the first

5 nS reset is active which tests asynchronous reset condition

For asynchronous reset condition except d_out all other variables are initialized to

default states, including f_empty_flag. r_ptr, w_ptr, ptr_diff are initialized to zero. All FIFO

status flags are initialized to default values. Since reset signal is connected to binary counters,

reset of counters causes reset of both read and write pointers, pointer difference and all status

flags. This can be observed in the simulated waveform shown in Figure (11). Verilog code has

to be improved for the complete asynchronous reset including reset of d_out. When I tried to

add asynchronous reset to RTL code of dual port RAM, synthesizer does not infer dual port

RAM. Instead, it infers set of registers. This problem has to be sorted out.

Figure (11) Simulation waveform 1

http://asic-soc.blogspot.com/2007/12/verilog-test-bench-for-asynchronous.html


http://asic-soc.blogspot.com/2007/12/new-asynchronous-fifo-design.html

http://4.bp.blogspot.com/_Se0VANaI9uM/R2wIFAo52vI/AAAAAAAAAM8/LWBkt--WYek/s1600-h/Simulation+waveform+1.gif

After 1800nS amount of time w_en is disabled. This time gap is chosen so that FIFO full

condition can be generated. (See Figure (11)). When ptr_diff becomes equal to (fifo_depth-1)

f_full_flag goes high and w_ptr stops counting. Further data on the d_in bus will overwrite on

the last location of the FIFO. This is unavoidable since there is no control over the d_in bus.

When FIFO is half filled (i.e. fifo_depth/2) f_half_full_flag is asserted and in next w_clk cycle it

goes to normal state. Similarly when FIFO reaches almost full condition f_almost_full_flag is

asserted. Thus all status flags are activated within zero clock delay. (See Figure (13))

Signal r_en enabled to start read operation. At this time w_en is disabled so that empty

condition can be generated. R_en is detected in coming positive edge of clock cycle and hence

data is read with zero r_clk cycle delay. (See Figure (12). When ptr_diff becomes zero

f_empty_flag is asserted. r_ptr stops incrementing. But for every r_clk data is read from the last

location and put on to the d_out bus.


When both r_en and w_en are enabled read clock domain has to wait till data has been

written to FIFO. Thus empty flag goes low for the positive edge of w_clk. At the next positive

edge of the r_clk data has been read out and put in d_out bus. Since r_ptr has incremented

ptr_diff becomes zero and f_empty_flag goes high again. This status remains till next positive

edge of w_clk. Observe the asynchronous read and write operation in the Figure (13).

F_empty_flag and r_next_en signals are compliment to each other. As soon as data is written to

the FIFO r_next_en signal is enabled. Read address is incremented and both read and write

pointers become equal. This makes the ptr_diff zero and once again f_empty_flag is asserted.

Thus there is no pessimistic reporting of assertion or removal of FIFO status flags.

http://4.bp.blogspot.com/_Se0VANaI9uM/R2wGyAo52uI/AAAAAAAAAM0/twv9wyC0yOc/s1600-h/Simulation+waveform+2.gif

Thus overall performance of the designed FIFO resembles the performance of the FIFO

IP core provided by the Xilinx. Algorithm and methodology used are entirely different in both

designs. IP core uses acknowledgement signal for the confirmation of read and write

operation. But proposed design does not have any such mechanism. It is assumed that data

sending and receiving hardware takes care of the data once the FIFO full and empty condition

are asserted.


Asynchronous FIFO: Synthesis using Xilinx ISE and Spartan 3

Synthesis of the design based on two different optimization goal makes difference in usage of logic cells and maximum operating frequency of the design. Maintaining ‘speed’ as optimization goal, maximum achievable frequency is 113.830MHz. ----------------------------------------------Timing Summary:Speed Grade: -5Minimum period: 8.785ns (Maximum Frequency: 113.830MHz)Minimum input arrival time before clock: 4.692nsMaximum output required time after clock: 12.049nsMaximum combinational path delay: No path found---------------------------------------------------------

Maintaining ‘area’ as optimization goal, maximum achievable frequency is 90.212MHz. --------------------------------------------------------Timing Summary:Speed Grade: -5Minimum period: 11.085ns (Maximum Frequency: 90.212MHz)Minimum input arrival time before clock: 4.574nsMaximum output required time after clock: 13.375nsMaximum combinational path delay: No path found---------------------------------------------------------------------

http://3.bp.blogspot.com/_Se0VANaI9uM/R2wFrwo52tI/AAAAAAAAAMs/7uGe7GmkcLA/s1600-h/Simulation+waveform+3.gif

The difference in the operating frequency can be attributed to the delay in the adder-

subtractor circuit. Dual port distributed RAM is used for memory. Output data d_out is

registered (RTL schematic is shown in Figure (14)) which is one of the advantages of this

design.

Figure (14) Registered output

The part of the synthesis report generated by Xilinx ISE, shown below infers the hardware

which remains same for both optimization goals.

----------------------------------------------------------------------Synthesizing Unit .Related source file is a_fifo5.v.Found 16x8-bit dual-port distributed RAM for signal .-----------------------------------------------------------------------| aspect ratio | 16-word x 8-bit | || clock | connected to signal | rise || write enable | connected to internal node | high || address | connected to signal | || dual address | connected to signal | || data in | connected to signal | || data out | not connected | || dual data out | connected to internal node | || ram_style | Auto | |-----------------------------------------------------------------------INFO:Xst:1442 - HDL ADVISOR - The RAM contents appears to be read asynchronously. A synchronous read would allow you to take advantage of available block RAM resources, for optimized device usage and improved timings. Please refer to your documentation for coding guidelines.Found 8-bit register for signal .Found 4-bit addsub for signal <$n0003>.Found 4-bit comparator greater for signal <$n0007> created at line 60.Found 4-bit comparator less for signal <$n0008> created at line 62.Found 4-bit adder for signal <$n0009> created at line 64.Found 4 1-bit 2-to-1 multiplexers.Summary:inferred 1 RAM(s).inferred 8 D-type flip-flop(s).inferred 2 Adder/Subtracter(s).inferred 2 Comparator(s).inferred 4 Multiplexer(s).Unit synthesized.-----------------------------------------------------------------

But observation of the low level synthesis gives the device utilization summary. Device

utilization with ‘speed’ as optimization goal is as follows:

http://1.bp.blogspot.com/_Se0VANaI9uM/R2wEUQo52sI/AAAAAAAAAMk/nu3Mc5tr62o/s1600-h/registered+output.gif

-----------------------------------------Device utilization summary:---------------------------Selected Device: 3s200ft256-5 Number of Slices: 36 out of 1920 1% Number of Slice Flip Flops: 20 out of 3840 0% Number of 4 input LUTs: 50 out of 3840 1% Number of bonded IOBs: 24 out of 173 13% Number of GCLKs: 2 out of 8 25% -------------------------------------------------------------

Device utilization with ‘area’ as optimization goal is as follows:

-----------------------------------------Device utilization summary:-----------------------------------------Selected Device : 3s200ft256-5 Number of Slices: 34 out of 1920 1% Number of Slice Flip Flops: 16 out of 3840 0% Number of 4 input LUTs: 47 out of 3840 1% Number of bonded IOBs: 24 out of 173 13% Number of GCLKs: 2 out of 8 25% ------------------------------------------------------------

In the case of ‘area’, slices are reduced by 2, slice flip-flops are reduced by 4, 4 input LUTs are reduced by 3. For the present design read clock is working at 50MHz. Hence operating speed of the design is very important and optimization goal is maintained as ‘speed’. Other than the package pin constraints, area constraints or timing constraints are not provided to the design while synthesis. Since the design doesn’t consume much of the resource of the FPGA area is not important factor. Package pin constraints are given so as to implement the design on Spartan 3 development board. This is done for the fifo_top.v code which includes clock generation code for the design.

Asynchronous FIFO DesignAsynchronous FIFOs are used as buffers between two asynchronous clock

domains to exchange data safely. Data is written into the FIFO from one clock domain

and it is read from another clock domain. This requires a memory architecture wherein

two ports of memory are available- one is for input (or write or push) operation and

another is for output (or read or pop) operation. Generally FIFOs are used where write

operation is faster than read operation. However, even with the different speed and access

types the average rate of data transfer remains constant. FIFO pointers keep track of

number of FIFO memory locations read and written and corresponding control logic

circuit prevents FIFO from either under flowing or overflowing. FIFO architectures

inherently have a challenge of synchronizing itself with the pointer logic of other clock

domain and control the read and write operation of FIFO memory locations safely. A

detailed and careful analysis of synchronizer circuit along with pointer logic is required

http://asic-soc.blogspot.com/2007/11/asynchronous-fifo-design.html

http://asic-soc.blogspot.com/2007/12/verilog-code-for-asynchronous-fifo.html

to understand the synchronization of two FIFO pointer logic circuits which is responsible

for accessing the FIFO read and write ports independently controlled by different clocks.

Why Synchronization?

It is very important to understand the signal stability in multi clock domains since

for a traveling signal the new clock domain appears to be asynchronous. If the signal is

not synchronized to new clock, the first storage element of the new clock domain may go

to metastable state and the worst case is that resolution time can’t be predicted. It can

traverse throughout the new clock domain resulting in failure of functionality. To prevent

such failures setup time and hold time specification has to be obeyed in the design.

Manufacturers provide statistics of probability of failure of flip-flops due to metastability

characters in terms of MTBF (Mean Time Before Failure). Synchronizers are used to

prevent the downstream logic from entering into the metastable state in multiclock

domain with multibit data values.

Issues in Designing Asynchronous FIFO

It has been mentioned that designing of FIFO pointers for efficient working of

FIFO is the key issue while designing FIFO architecture. Let us go deep into the FIFO

read and write pointers. On reset both read and write pointers are pointing to the starting

location of the FIFO. This location is also the first location where data has to be written at

the same time this first location happens to be first read location. Therefore, in general we

can say, read pointer always points to the word to be read and write pointer always points

to the next location to which data has to be written.

Now let us examine data write operation. When both read and write pointers are

pointing to first location of FIFO empty flag is asserted indicating the FIFO status as

empty. Now data writing can be performed. Data will be written to the location where the

write pointer is pointing and after the data write operation write pointer gets incremented

pointing to the next location to be written. At the same time, empty flag is deasserted

which indicates that FIFO is not empty, some data is available. One notable point

regarding read pointer is with empty flag active the data pointed out by the read pointer is

always invalid data. When first data written and empty flag status cleared (i.e. empty flag

inactive) read pointer logic immediately drives the data from the location to which it was

pointing to the read port of the dual port RAM, ready to be read by read logic. With this

implementation of read logic the biggest advantage is that only one clock pulse is

required to read from read port since previous clock cycle has already incremented read

pointer and drives the data to read port. This will help in reducing latency in detecting

empty and full pointer flag status. Empty status flag can be asserted in one more

condition. After some n number of data write operations if same n number of read is

performed then both pointers are again equal. Hence if both pointers “catch up” each

other then empty flag is asserted.

Now let us examine about FIFO full status. When write pointer reaches the top of

the FIFO, it is pointing towards the location, which can be written and is the last location

to be written. No read operation is performed yet and read pointer is pointing to first

location itself. This is one method is to generate FIFO full condition. When write pointer

reaches the top of the FIFO, if full flag is asserted then it is not the actual FIFO full

condition, this is only ‘almost full’ as there is one location which can be written.

Similarly almost empty condition can exist in FIFO. Now a write operation causes the

location to be written and increment of write pointer. Since the location was the last one

write pointer wraps up to first location. Now both read and write pointers are equal and

hence empty flag is asserted instead of full flag assertion, which is a fatal mistake. Hence

wrap around condition of a full pointer may be a FIFO full condition.

After writing the data to FIFO (consider write pointer is in top of FIFO) some data has

been read and read pointer is somewhere in between FIFO. One more write operation

causes the write pointer to wrap. Note that even though write pointer is pointing to first

location of FIFO this is NOT FIFO full condition, since read pointer has moved up from

the first location. Further data writing pushes write pointer up. Imagine read pointer

wraps around after some more read operation. Present condition is that both pointers have

wrapped around but there is no FIFO full or FIFO empty condition. Data can be written

to FIFO or read from the FIFO. This is being the situation how to identify and generate

full and empty condition? How to synchronize and compare these two pointers to

generate full and empty status? While synchronizing how to avoid possible metastable

state and ‘pessimistic reporting’ (i.e. harmless wrong report; will be discussed later)?

These are some key issues in designing an asynchronous FIFO

Verilog code for asynchronous FIFOVerilog code for asynchronous FIFO is given below. The module “a_fifo5” should be used for Modelsim (or any other HDL simulator) simulation. The module “fifo_top” is used to synthesize the design in Spartan 3 board. Before invoking this module in ISE you should add Digital Clock Manager (DCM) code to your project. This can be done using Xilinx ISE wizard.//==================================================//a_fifo5.v; verilog code for asynchronous FIFO//This module describes FIFO//===================================================module a_fifo5(d_out,f_full_flag,f_half_full_flag,f_empty_flag,f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,r_clk,w_clk,reset);parameter f_width=8; //FIFO widthparameter f_depth=16; //FIFO depthparameter f_ptr_width=4; //because depth =16;parameter f_half_full_value=8;parameter f_almost_full_value=14;parameter f_almost_empty_value=2;output [f_width-1:0] d_out; reg [f_width-1:0] d_out; //outputs output f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;input [f_width-1:0] d_in;input r_en,w_en,r_clk,w_clk;input reset;//internal registers,wireswire [f_ptr_width-1:0] r_ptr,w_ptr;reg r_next_en,w_next_en;reg [f_ptr_width-1:0] ptr_diff;reg [f_width-1:0] f_memory[f_depth-1:0];assign f_full_flag=(ptr_diff==(f_depth-1)); //assign FIFO statusassign f_empty_flag=(ptr_diff==0);assign f_half_full_flag=(ptr_diff==f_half_full_value);assign f_almost_full_flag=(ptr_diff==f_almost_full_value);assign f_almost_empty_flag=(ptr_diff==f_almost_empty_value);


//---------------------------------------------------------always @(posedge w_clk) //write to memorybeginif(w_en) beginif(!f_full_flag)f_memory[w_ptr]<=d_in; endend//--------------------------------------------------------- always @(posedge r_clk) //read from memorybeginif(reset)d_out<=0; //f_memory[r_ptr];else if(r_en) beginif(!f_empty_flag)d_out<=f_memory[r_ptr]; endelse d_out<=0;end//--------------------------------------------------------- always @(*) //ptr_diff changes as read or write clock changebegin if(w_ptr>r_ptr)ptr_diff<=w_ptr-r_ptr;else if(w_ptrbeginptr_diff<=((f_depth-r_ptr)+w_ptr); endelse ptr_diff<=0;end//---------------------------------------------------------always @(*) //after empty flag activated fifo read counter should not increment;begin if(r_en && (!f_empty_flag))r_next_en=1;else r_next_en=0;end//--------------------------------------------------------always @(*) //after full flag activated fifo write counter should not increment;begin if(w_en && (!f_full_flag))w_next_en=1;else w_next_en=0;end//---------------------------------------------------------b_counter //instantiate address counters r_b_counter(.c_out(r_ptr),.c_reset(reset),.c_clk(r_clk),.en(r_next_en));b_counter w_b_counter(.c_out(w_ptr),.c_reset(reset),.c_clk(w_clk),.en(w_next_en));

endmodule//==============================================================//b_counter.v; 4 bit asynchronous binary up counter//==============================================================module b_counter(c_out,c_reset,c_clk,en);parameter c_width=4; //counter widthoutput [c_width-1:0] c_out; reg [c_width-1:0] c_out;input c_reset,c_clk,en;always @(posedge c_clk or posedge c_reset)if (c_reset)c_out <= 0;else if(en) c_out <= c_out + 1;endmodule//===========================================================

//===========================================================//fifo_top.v; top level verilog code of FIFO//To be used with Xilinx ISE-simulation and synthesis//For functional simulation this module is not necessary//============================================================module fifo_top(x,y,z,d_out,f_full_flag,f_half_full_flag,f_empty_flag,f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,CLKIN_IN,RST_IN,reset);parameter f_width=8;parameter f_depth=16;parameter f_ptr_width=4; parameter f_half_full_value=8;parameter f_almost_full_value=14;parameter f_almost_empty_value=2;output [f_width-1:0] d_out; //reg [f_width-1:0] d_out; //outputs output f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;output x,y,z;input [f_width-1:0] d_in;input r_en,w_en,CLKIN_IN,RST_IN;input reset;a_fifo5 a_fifo55(d_out,f_full_flag,f_half_full_flag,f_empty_flag,f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,CLK0_OUT,CLKDV_OUT,reset); //instantiate fifodcm_fifo dcm_fifo1(CLKIN_IN,RST_IN,CLKDV_OUT,CLKFX_OUT,CLKIN_IBUFG_OUT,CLK0_OUT, LOCKED_OUT); //instantiate DCMassign x=CLKIN_IBUFG_OUT; //simply to avoid errorassign y=LOCKED_OUT;

assign z=CLKFX_OUT;endmodule

Verilog Test Bench for Asynchronous FIFOHere is the verilog test bench for the asynchronous FIFO code already published. Simulation results of the asynchronous FIFO will be discussed in coming articles.//===========================================================//ta_fifo5.v; test bench for the module a_fifo5//===========================================================module ta_fifo5();

parameter f_width=8;parameter f_depth=16;

wire [f_width-1:0] d_out; wire f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;reg [f_width-1:0] d_in;reg r_en,w_en,r_clk,w_clk;reg reset;wire [3:0] r_ptr,w_ptr,ptr_diff;

assign r_ptr=ta_fifo5.r_ptr;assign w_ptr=ta_fifo5.w_ptr;assign ptr_diff=ta_fifo5.ptr_diff;assign r_next_en=ta_fifo5.r_next_en;assign w_next_en=ta_fifo5.w_next_en;

a_fifo5 ta_fifo5(d_out,f_full_flag,f_half_full_flag,f_empty_flag,f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,r_clk,w_clk,reset);

initial #5000 $stop;initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; endinitial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end//initial begin #10 r_clk=0; forever #50 r_clk=~r_clk; end //test for synchronous operation//initial begin #10 w_clk=0; forever #50 w_clk=~w_clk; end //test for synchronous operation

initial begin d_in=1;@(posedge w_en);repeat(20) @(posedge w_clk) d_in=d_in+2; repeat(20) @(posedge w_clk) d_in=d_in-1;



end

initial begin reset=1;#30 reset=0;endinitial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join endinitial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join endendmodule//======================================================// DO file for Modelsim simulator//This may not applicable for other simulator //======================================================vlog a_fifo5.vvsim a_fifo5add wave *force -freeze sim:/a_fifo5/reset 1 0runforce -freeze sim:/a_fifo5/d_in 00000001 0force -freeze sim:/a_fifo5/r_en 0 0force -freeze sim:/a_fifo5/w_en 1 0force -freeze sim:/a_fifo5/reset 0 0runforce -freeze sim:/a_fifo5/r_clk 1 0, 0 {10 ns} -r 20force -freeze sim:/a_fifo5/w_clk 1 0, 0 {50 ns} -r 100runforce -freeze sim:/a_fifo5/d_in 00000011 0runforce -freeze sim:/a_fifo5/d_in 00000111 0runforce -freeze sim:/a_fifo5/d_in 00001111 0runforce -freeze sim:/a_fifo5/d_in 00011111 0runforce -freeze sim:/a_fifo5/d_in 00111111 0runforce -freeze sim:/a_fifo5/d_in 01111111 0runforce -freeze sim:/a_fifo5/d_in 11111111 0runforce -freeze sim:/a_fifo5/d_in 11111110 0runforce -freeze sim:/a_fifo5/d_in 11111100 0runforce -freeze sim:/a_fifo5/d_in 11111000 0runforce -freeze sim:/a_fifo5/d_in 11110000 0

runforce -freeze sim:/a_fifo5/d_in 11100000 0runforce -freeze sim:/a_fifo5/d_in 11000000 0runforce -freeze sim:/a_fifo5/d_in 10000000 0runforce -freeze sim:/a_fifo5/d_in 00000000 0runforce -freeze sim:/a_fifo5/d_in 00000001 0runforce -freeze sim:/a_fifo5/d_in 00000010 0runforce -freeze sim:/a_fifo5/w_en 0 0force -freeze sim:/a_fifo5/r_en 1 0force -freeze sim:/a_fifo5/d_in 00000011 0runforce -freeze sim:/a_fifo5/d_in 00000100 0runforce -freeze sim:/a_fifo5/d_in 00000101 0runforce -freeze sim:/a_fifo5/d_in 00000110 0runforce -freeze sim:/a_fifo5/d_in 00000111 0runforce -freeze sim:/a_fifo5/d_in 8'd8 0runforce -freeze sim:/a_fifo5/w_en 1 0force -freeze sim:/a_fifo5/d_in 00001001 0runforce -freeze sim:/a_fifo5/d_in 00001010 0runforce -freeze sim:/a_fifo5/d_in 00001011 0runforce -freeze sim:/a_fifo5/d_in 00001100 0runforce -freeze sim:/a_fifo5/d_in 00001101 0runrun//============================================

Asynchronous FIFO: Simulation using Modelsim

Test bench strategy is to generate all corner conditions like full and empty.

Simulation waveforms are shown in Figure (11) to Figure (13). These waveforms are

generated using test bench program provided in previous article. Read clock frequency 50

MHz and write clock frequency 10 MHz are generated using initial procedural

statements:

initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; endinitial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end

Duty cycle of r_clk is 10 nS and w_clk is 50 nS.

Other conditions like w_en, r_en etc are generated using below set of statements:

initial begin d_in=1;@(posedge w_en);repeat(20) @(posedge w_clk) d_in=d_in+2; repeat(20) @(posedge w_clk) d_in=d_in-1; endinitial begin reset=1;#30 reset=0;endinitial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join endinitial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join end

In the above two statements ‘fork’ and ‘join’ are used to start the simulation at

zero simulation time. Reset signal is activated for first 30 nS and then it is deactivated.

For the first 5 nS reset is active which tests asynchronous reset condition

For asynchronous reset condition except d_out all other variables are initialized to

default states, including f_empty_flag. r_ptr, w_ptr, ptr_diff are initialized to zero. All

FIFO status flags are initialized to default values. Since reset signal is connected to

binary counters, reset of counters causes reset of both read and write pointers, pointer

difference and all status flags. This can be observed in the simulated waveform shown in

Figure (11). Verilog code has to be improved for the complete asynchronous reset

including reset of d_out. When I tried to add asynchronous reset to RTL code of dual port

RAM, synthesizer does not infer dual port RAM. Instead, it infers set of registers. This

problem has to be sorted out.



After 1800nS amount of time w_en is disabled. This time gap is chosen so that

FIFO full condition can be generated. (See Figure (11)). When ptr_diff becomes equal to

(fifo_depth-1) f_full_flag goes high and w_ptr stops counting. Further data on the d_in

bus will overwrite on the last location of the FIFO. This is unavoidable since there is no

control over the d_in bus. When FIFO is half filled (i.e. fifo_depth/2) f_half_full_flag is

asserted and in next w_clk cycle it goes to normal state. Similarly when FIFO reaches

almost full condition f_almost_full_flag is asserted. Thus all status flags are activated

within zero clock delay. (See Figure (13))

Signal r_en enabled to start read operation. At this time w_en is disabled so that empty

condition can be generated. R_en is detected in coming positive edge of clock cycle and

hence data is read with zero r_clk cycle delay. (See Figure (12). When ptr_diff becomes

zero f_empty_flag is asserted. r_ptr stops incrementing. But for every r_clk data is read

from the last location and put on to the d_out bus.

http://4.bp.blogspot.com/_Se0VANaI9uM/R2wIFAo52vI/AAAAAAAAAM8/LWBkt--WYek/s1600-h/Simulation+waveform+1.gif


When both r_en and w_en are enabled read clock domain has to wait till data has

been written to FIFO. Thus empty flag goes low for the positive edge of w_clk. At the

next positive edge of the r_clk data has been read out and put in d_out bus. Since r_ptr

has incremented ptr_diff becomes zero and f_empty_flag goes high again. This status

remains till next positive edge of w_clk. Observe the asynchronous read and write

operation in the Figure (13). F_empty_flag and r_next_en signals are compliment to each

other. As soon as data is written to the FIFO r_next_en signal is enabled. Read address is

incremented and both read and write pointers become equal. This makes the ptr_diff zero

and once again f_empty_flag is asserted. Thus there is no pessimistic reporting of

assertion or removal of FIFO status flags.

Thus overall performance of the designed FIFO resembles the performance of the

FIFO IP core provided by the Xilinx. Algorithm and methodology used are entirely

different in both designs. IP core uses acknowledgement signal for the confirmation of

read and write operation. But proposed design does not have any such mechanism. It is

assumed that data sending and receiving hardware takes care of the data once the FIFO

full and empty condition are asserted.

http://4.bp.blogspot.com/_Se0VANaI9uM/R2wGyAo52uI/AAAAAAAAAM0/twv9wyC0yOc/s1600-h/Simulation+waveform+2.gif


Asynchronous FIFO: Synthesis using Xilinx ISE and Spartan 3

Synthesis of the design based on two different optimization goal makes difference in usage of logic cells and maximum operating frequency of the design. Maintaining ‘speed’ as optimization goal, maximum achievable frequency is 113.830MHz. ----------------------------------------------Timing Summary:Speed Grade: -5Minimum period: 8.785ns (Maximum Frequency: 113.830MHz)Minimum input arrival time before clock: 4.692nsMaximum output required time after clock: 12.049nsMaximum combinational path delay: No path found---------------------------------------------------------

Maintaining ‘area’ as optimization goal, maximum achievable frequency is 90.212MHz.

--------------------------------------------------------Timing Summary:Speed Grade: -5Minimum period: 11.085ns (Maximum Frequency: 90.212MHz)Minimum input arrival time before clock: 4.574nsMaximum output required time after clock: 13.375nsMaximum combinational path delay: No path found

http://3.bp.blogspot.com/_Se0VANaI9uM/R2wFrwo52tI/AAAAAAAAAMs/7uGe7GmkcLA/s1600-h/Simulation+waveform+3.gif

---------------------------------------------------------------------

The difference in the operating frequency can be attributed to the delay in the adder-

subtractor circuit. Dual port distributed RAM is used for memory. Output data d_out is

registered (RTL schematic is shown in Figure (14)) which is one of the advantages of

this design.

Figure (14) Registered output

The part of the synthesis report generated by Xilinx ISE, shown below infers the

hardware which remains same for both optimization goals.

----------------------------------------------------------------------Synthesizing Unit .Related source file is a_fifo5.v.Found 16x8-bit dual-port distributed RAM for signal .-----------------------------------------------------------------------| aspect ratio | 16-word x 8-bit | || clock | connected to signal | rise || write enable | connected to internal node | high || address | connected to signal | || dual address | connected to signal | || data in | connected to signal | || data out | not connected | || dual data out | connected to internal node | || ram_style | Auto | |-----------------------------------------------------------------------INFO:Xst:1442 - HDL ADVISOR - The RAM contents appears to be read asynchronously. A synchronous read would allow you to take advantage of available block RAM resources, for optimized device usage and improved timings. Please refer to your documentation for coding guidelines.Found 8-bit register for signal .

http://1.bp.blogspot.com/_Se0VANaI9uM/R2wEUQo52sI/AAAAAAAAAMk/nu3Mc5tr62o/s1600-h/registered+output.gif

Found 4-bit addsub for signal <$n0003>.Found 4-bit comparator greater for signal <$n0007> created at line 60.Found 4-bit comparator less for signal <$n0008> created at line 62.Found 4-bit adder for signal <$n0009> created at line 64.Found 4 1-bit 2-to-1 multiplexers.Summary:inferred 1 RAM(s).inferred 8 D-type flip-flop(s).inferred 2 Adder/Subtracter(s).inferred 2 Comparator(s).inferred 4 Multiplexer(s).Unit synthesized.-----------------------------------------------------------------

But observation of the low level synthesis gives the device utilization summary. Device

utilization with ‘speed’ as optimization goal is as follows:

-----------------------------------------Device utilization summary:---------------------------Selected Device: 3s200ft256-5 Number of Slices: 36 out of 1920 1% Number of Slice Flip Flops: 20 out of 3840 0% Number of 4 input LUTs: 50 out of 3840 1% Number of bonded IOBs: 24 out of 173 13% Number of GCLKs: 2 out of 8 25% -------------------------------------------------------------

Device utilization with ‘area’ as optimization goal is as follows:

-----------------------------------------Device utilization summary:-----------------------------------------Selected Device : 3s200ft256-5 Number of Slices: 34 out of 1920 1% Number of Slice Flip Flops: 16 out of 3840 0% Number of 4 input LUTs: 47 out of 3840 1% Number of bonded IOBs: 24 out of 173 13% Number of GCLKs: 2 out of 8 25% ------------------------------------------------------------

In the case of ‘area’, slices are reduced by 2, slice flip-flops are reduced by 4, 4 input LUTs are reduced by 3. For the present design read clock is working at 50MHz. Hence operating speed of the design is very important and optimization goal is maintained as ‘speed’. Other

than the package pin constraints, area constraints or timing constraints are not provided to the design while synthesis. Since the design doesn’t consume much of the resource of the FPGA area is not important factor. Package pin constraints are given so as to implement the design on Spartan 3 development board. This is done for the fifo_top.v code which includes clock generation code for the design.


Asynchronous FIFO

Documents

fifo r

time w

signal r

reset of d

asynchronous fifo

posedge w

clk data

ns reset