This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Application Note AC177
Implementing Multi-Port Memories in Axcelerator Devices
Introduction
This application note describes a user configurable VHDLwrapper for implementing dual-port and quad-port memorystructures using a small number of programmable logic tilesand the embedded memory blocks in Actel’s AxceleratorField Programmable Gate Array (FPGA) devices.
The Axcelerator device architecture provides dedicatedblocks of RAM with independent read and write ports,which are completely independent and fully synchronous.Each memory block consists of 4,608 bits with independentread and write port configuration, which can be organizedas 128x36, 256x18, 512x9, 1kx4, 2kx2, or 4kx1 to allow forbuilt-in bus width conversion and can be cascaded to createlarger memory sizes. For additional details on embeddedmemory blocks in Axcelerator devices, refer to theAxcelerator Family FPGAs datasheet or the applicationnote Axcelerator Family Memory Blocks. A block diagramof the basic memory block is shown in Figure 1.
The memory blocks in the Axcelerator devices can be usedto implement multi-port memories with the addition ofsome simple multiplex logic and an extra clock operating atdouble the read and write clock frequency.
Basics of Multi-Port Memories
This application note discusses two types of multi-portmemories: dual-port and quad-port. In both configurations,two data access ports (data port A and data port B) can beused to perform read and write operations into theAxcelerator RAMs. Each data port has its own data bus,
address bus, read enable, and write enable signals. Thebasic principle of implementing multi-port memories inAxcelerator devices involves the use of an additional clockoperating at double the read and write clock frequency toaccess the memory space through some multiplex logic andarbitrate between data access ports. The overall bandwidthof the memory (bit/s) remains the same, and the onlydifference between the single and multi-port memory isread/write frequency versus data-width trade-off.
Although more than one data access port is now available,they share the same memory space. Since Axceleratormemories are synchronized, simultaneous read/write cyclesto the same memory address during a single data cycle willresult in data being correctly written, but it is unclearwhich value the read port will output.
Dual-Port Memory
The dual-port memory configuration consists of two dataaccess ports (two read/write ports) sharing a single clockdomain (wr_clk). The write address bus from each dataaccess port (a_wadr and b_wadr) is used for both read andwrite operations. The read enable (a_rdblk, a_rdb, b_rdblk,and b_rdb) and write enable (a_wrblk, a_wrb, b_wrblk, andb_wrb) signals are used to select between either read orwrite operation for each data access port. Figure 2 onpage 2 shows the corresponding ports of a dual-port memoryblock.
Quad-Port Memory
The quad-port memory configuration consists of two dataaccess ports, each with a separate write port and read port,clocked by separate write (wr_clk) and read (rd_clk)clocks. For each data access port, there are separateaddress busses used to perform read (a_radr and b_radr)and write (a_wadr and b_wadr) operations. The readenable (a_rdblk, a_rdb, b_rdblk, and b_rdb) and writeenable (a_wrblk, a_wrb, b_wrblk, and b_wrb) signals areused to activate the read and write operations for each dataaccess port. Figure 3 on page 2 shows the correspondingports of a quad-port memory block.
Table 1 on page 3 summarizes the interface signals of thememory block.
Implementing Multi -Port Memories in Axcelerator Devices
Implementing Multi-Port Memories
In the referenced example (see the “Appendix – DesignExample” on page 8), the multi-port memory wrapper canbe implemented in two configurations as described above:dual-port memory (PMODE=0) and quad-port memory(PMODE=1). The depth of the implemented multi-portmemories is limited to a single memory block, but the widthis variable and supports the variable aspect ratio forquad-port memory configuration. Axcelerator's RAM64K36macro is used as the basic memory block for this wrapper.
Since the implementation of multi-port memories rely onthe fact that the embedded memory block is clocked at
twice the data clock rate, a double frequency clock must begenerated. The original data clock input is easily doubled infrequency to generate the required wr_2xclk and rd_2xclksignals using the PLLs in the Axcelerator architecture. Foradditional details on how to generate a PLL for Axceleratordevices, please refer to A Guide to ACTgen Macros or theapplication note Axcelerator Family PLL and ClockManagement.
Table 2 lists the configurable parameters for the referencedesign in the “Appendix – Design Example” on page 8,which will be explained in the following sections.
Table 1 • Multi-Port Memory Interface Signals
Signal Bits In/Out Description
a_wdata variable IN Write data busa_wadr 8 IN Write / dual-port memory addressa_wren 1 IN Active high data enable
a_rdata variable IN Output data busa_radr 8 OUT Output address bus (quad-port memory mode only)a_rden 1 IN Output register enable Ab_wdata variable IN Write data bus
b_wadr 8 IN Write / dual-port memory address busb_wren 1 IN Active high data enableb_rdata variable OUT Output data busb_radr 8 IN Output address bus (quad-port memory mode only)b_rden 1 IN Output register enable B
wr_2xclk 1 IN 2x write clockwr_clk 1 IN Write port data clock / multiplexer selectrd_2xclk 1 IN 2x read clock (quad-port memory mode only)rd_clk 1 IN Read data clock (quad-port memory mode only)
reset_n 1 IN Reset signal (active low)
Table 2 • Configurable Parameters for Design Example in Appendix
Parameter Value Description
PMODE0 (default) Dual-port memory configuration
1 Quad-port memory configuration
WMODE0 (default) No register, only MUX logic for waddr
1 Register, then multiplex waddr, wenb, and wdata
RMODE0 (default) No register, only MUX logic for raddr
1 Register, then multiplex raddr
OREG0 (default) Transparent output mode
1 Registered output modeWR_DEPTH 128:4096 (default=256) Write port depthWR_WIDTH 1:288 (default = 36) Write port widthWR_DEPTH 128:4096 (default = 512) Read port depth
Implementing Multi-Port Memories in Axcelerator Devices
Read Ports - Dual-Port Memory
Figure 4 shows a block diagram of the dual-port memory implementation.
In this configuration, the write addresses and write clockare used to read the memory, while the read address inputsand read clock remain unused in the code. The values forread port depth (parameter: RD_DEPTH) and write portdepth (parameter: WR_DEPTH) have no specific requiredrelationship, and can be: 128, 256, 512, 1,024, 2,048, and4,096 bits.
If OREG = ’0,’ the data outputs propagate directly to thedata output ports; otherwise, when OREG = ’1,’ the output isvalid with the next rising-edge of wr_clk. Data forRead/Write Port A is registered on the falling-edge ofwr_clk, while data for Read/Write Port B is registered on therising-edge of wr_clk. The read operation supported by thiswrapper is synchronous. The read enable inputs are used asinputs to enable the output registers.
Read Ports – Quad-Port Memory
Figure 5 on page 5 shows a block diagram of the quad-portmemory implementation.
If RMODE is set to ’1,’ the address and enables for both readports are registered with the rising edge of the read clockrd_clk. Then the read address and enable inputs for Port Aand Port B are multiplexed to the memory and settle whilerd_clk signal is high and low respectively. Data is read fromthe memory on the next rising edge of rd_2xclk. If RMODE isset to ’0,’ the read address and enables are not registered andthe read address and read enable inputs for Port A and Port Bare simply multiplexed to access the Axcelerator memory.
If OREG = ’0,’ the data outputs propagate directly to thedata output ports; otherwise, when OREG = ’1,’ the output isvalid with the rising-edge of rd_clk. Read data for Port A isregistered on the falling-edge of rd_clk, while read data forPort B is registered on the rising-edge of rd_clk. The readoperation supported by this wrapper is synchronous. Theread enable inputs are used to enable the output registers.
Write Ports
The Write Port implementation for both dual-port andquad-port memory is the same. Data, address, and enablesfor both write ports are optionally registered with the risingedge of wr_clk when WMODE = 1. Data, address, and enablesignals for Port A and Port B are multiplexed to the memoryand settle while wr_clk signal is high and low respectively.Then data is written into the memory on the next rising edgeof wr_2xclk (next falling or rising edge of wr_clk).
Width / Depth
Because the embedded memories in the Axcelerator familyhave variable aspect ratio, this wrapper supportsindependent width and depth for write and read ports in thequad-port memory configuration. The user specifies thedepth and width for the write ports, and depth only for theread ports – read port width is derived from the other valuesin the wrapper code.
Figure 4 • Dual-Port Memory Implementation
D QD QD Q
a_wdataa_wrena_wadr
D QD QD Q
b_wdatab_wrenb_wadr
CLK
1
0
Port A
Port B
wr_clkwr_2xclk
DQE
Q DQ D
Q DQ D
DQ
E
RAM64x36
WD
WEN
WRA
WCLK
REN
REN
RRA
RCLK
b_rdata
a_rdata
a_rdena_radr
b_rdenb_radr
RMODE=1
OREG=1
(Read AND Write)
WMODE=1 T
T
2 data flows2 addresses for write (PMODE=1)2 addresses for read
If the read address is the same as the write address, data out is unknown.
4
Implementing Multi -Port Memories in Axcelerator Devices
Write data width is specified in integer multiples of theword-width associated with the specified write depth (referto the Axcelerator Family FPGAs datasheet). For example,if the specified write depth is 256, the width may be anymultiple of 18. Care must be taken to specify correct widthand depth parameters, as the user specified GENERICvalues are not validated by the wrapper.
Timing Diagrams
Figure 6 and Figure 7 on page 6 illustrate the relationshipsof the signals during Write and Read Cycles for bothdual-port and quad-port memories.
Implementing Multi-Port Memories in Axcelerator Devices
Design Considerations
The implementation of both dual-port memory andquad-port memory involves doubling the clock frequency atwhich data is clocked into the Axcelerator embeddedmemory and involves using multiplex logic to arbitratebetween Port A and Port B. The simplest way to implementthe doubled frequency is to make use of the on-chip PLLwith the exact configuration generated using the ACTgenMacro Builder. If the configuration has the outputsregistered (OREG= ‘1’), this generates flip-flops that will bepart of the critical path of the design.
Utilization
Using the reference design example in the “Appendix –Design Example” on page 8, the following tables quantifythe additional logic overhead introduced by the necessarygates, flip-flops, and PLL used in both dual-port memory andquad-port memory configurations in unregistered versusregistered inputs and outputs configuration with a readdepth equal to the write depth.
Implementing Multi -Port Memories in Axcelerator Devices
Dual-Port Memory with Unregistered Inputs and Outputs
Dual-Port Memory with Registered Inputs and Outputs
Quad-Port Memory with Unregistered Inputs and Outputs
Quad-Port Memory with Registered Inputs and Outputs
Notice the utilization increases significantly from theunregistered inputs and outputs to the registeredconfiguration. This results from the additional flip-flops(sequential R-cells) necessary to generate the registeredinputs and outputs for each bit of the data and addressbusses as well as the enable signals. Also, the utilizationshows a slight increase from the dual-port memory toquad-port memory configuration.
Conclusion
Implementation of multi-port memories using a wrappersource code to interface the basic Axcelerator memoryblock is straightforward and intuitive. Implementation ofboth dual-port and quad-port memories, although requiringadditional logic overhead, which include extra multiplexersand flip-flops, still proves useful in certain designs.
Related Documents
For more information, see the following documents:
Implementing Multi-Port Memories in Axcelerator Devices
Appendix – Design Example
This design example implements a variable width dual-portor quad-port memory with variable aspect ratio support,based on the memory blocks available in Actel’s Axceleratordevices. For deeper memories, the user must cascade blockstogether or modify this design example.
Below is a sample instantiation of the multi-port memorywrapper, which may be cut and pasted into the higher-levelVHDL code: