Brandon Ade and Srikanth Srinivasan · TM Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobileGT, PowerQUICC,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Freescale on FacebookTag yourself in photos and upload your own!
• The DPAA represents a significant advance in microcontroller architecture for Freescale. Along with the valuable capabilities comes a vast increase in complexity and details critical to proper operation. The information herein will unravel this complexity and will reduce the amount of time and effort during software development.
• This information is relevant to any customer implementing QorIQ products with the DPAA.
• This presentation contains the knowledge gleaned from an effort to translate documentation into real world working DPAA driver code.
• After completing this session you will be able to implement a simple DPAA Normal Mode configuration that will allow application layer software to move traffic across the 1G dTSECs and 10G TGEC interfaces.
• Note that the following code examples are pared down and the full source code should be referenced for a complete solution. http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=DINK32Follow DINK32FULL under Downloads tab.
• Note that all references to the Reference Manual refer to P4080RM Rev 0 and DPAARM Rev 0, the latest available reference content as of June 2011.
• The QorIQ DPAA is a comprehensive architecture which integrates all aspects of packet processing in the SoC, addressing issues and requirements resulting from the multicore nature of QorIQ SoCs.
• The DPAA includes:−Cores−Network and packet I/O−Hardware offload accelerators−The infrastructure required to facilitate the flow of packets
between the above• The DPAA also addresses various performance related
requirements especially those created by the high speed network I/O found on multicore SoCs such as the P4080.
• Multicore SoCs, like the P4080, have a number of new requirements related to packet processing when compared to single core SoCs:− Load spreading of arriving packets
across pools of cores for parallel processing
− Packet ordering issues after processing
− Pipelined processing of packets using cores
− Share network I/O between cores− “Virtualizes” hardware accelerators− Inter-core communication
• On previous platforms, access to the CCSR mapped registers was done in the same way across the entire cache-inhibited CCSR space:− Example:#define CCSRBAR 0xE0000000#define CCSR(offset) (*(ULONG*)(CCSRBAR | (offset)))CCSR(BMAN_LIODNR) = 0x20;
• P4080 introduces a new concept of cacheable register space for performance gains.• The CoreNet software portals of BMan and QMan are divided into a cache-enabled
area and a cache-inhibited area which are mapped in a 2MB area of system memory space, aligned on a 2MB boundary, and are accessible as a target across the CoreNet fabric. One of the LAWs must be programmed by software with the target ID, base address, and size (2MB) of the window in system memory occupied by [Q/B]Man’s CoreNet software portals.
• Cacheable registers require data cache zeroing and flushing, adding complexity and code overhead, but increasing performance. Cache-inhibited registers are simple to use with no dcbz or cache flushing required, but require memory access outside of the core caches. If you are concerned with performance then use the cacheable registers. If you do not care about performance and prefer simplicity then use the cache-inhibited registers.
• See DPAARM Rev 0 Section 6.4.6 Software Portals for more detail on how this works. These steps must be followed for all cacheable software portal registers!
2. Frame Processing Manager Init− See DPAARM Rev 0 Section 8.4.5 FPM Initialization and Configuration. − The FPM allocates and de-allocates TNUMs automatically so there is no need
to change the default settings of the FPM registers unless for debug purposes. However, an errata that sends the FMan halt command causes the FMan to hang after reset. In order to skirt around this then set the RFM bit to ‘1’ and other bits to release the FMan upon error.
Set the events and masks associated with the FPM error interrupt in FMFP_EE. CCSR(fman_base | FMFP_EE) = FMFP_EE_DECC | FMFP_EE_STL | FMFP_EE_SECC | FMFP_EE_RFM | FMFP_EE_DECC_EN | FMFP_EE_STL_EN | FMFP_EE_SECC_EN | FMFP_EE_EHM | FMFP_EE_UEC | FMFP_EE_CER | FMFP_EE_DER;
2. Frame Processing Manager Init− See DPAARM Rev 0 Section 8.4.5 FPM Initialization and Configuration. − The FPM allocates and de-allocates TNUMs automatically so there is no need
to change the default settings of the FPM registers unless for debug purposes. However, an errata that sends the FMan halt command causes the FMan to hang after reset. In order to skirt around this then set the RFM bit to ‘1’ and other bits to release the FMan upon error.
Set the events and masks associated with the FPM error interrupt in FMFP_EE. CCSR(fman_base | FMFP_EE) = FMFP_EE_DECC | FMFP_EE_STL | FMFP_EE_SECC | FMFP_EE_RFM | FMFP_EE_DECC_EN | FMFP_EE_STL_EN | FMFP_EE_SECC_EN | FMFP_EE_EHM | FMFP_EE_UEC | FMFP_EE_CER | FMFP_EE_DER; BMI
3. Buffer Manager Interface Common Registers− Initialization of the common registers must happen prior to initialization of any specific RX/TX port. Since
common registers affect all ports, it must be initialized by the software entity which is aware of the activity and demands of the entire BMI ports, including different software partitions. This is usually called Hypervisor. See DPAARM Rev 0 Section 8.5.6 Frame Manager BMI Initialization/Application Information. For simplicity assume that only FMan1 dTSEC1 will be configured for use.
Set the total size of Free Buffer Pool and its offset by writing to FMBM_CFG1. This is the size of the internal RAM, aka “FMan internal RAM” or just “FMan memory”.#define FMBM_FBP_SIZE 0x28000 //160k limit based on P4080RM Rev 0 Section 21.5.3 CCSR(fman_base | FMBM_CFG1) = ((FMBM_FBP_SIZE / 256) - 1) << 16; //Setting offset to '0'.Set the total allowed tasks and DMA slots to be used by BMI in FMBM_CFG2.#define FMBM_TNTSKS 128 //Maximum allowed open tasks is 128#define FMBM_TDMA 32 //Maximum allowed DMA transfers is 32CCSR(fman_base | FMBM_CFG2) = (FMBM_TDMA - 1) | ((FMBM_TNTSKS - 1) << 16);Set LIODN per port in FMBM_SPLIODN registers.for(i = 1; i < NUM_OF_BMI_PORTS; i++){ CCSR((fman_base | FMBM_SPLIODN_1) + ((i-1) * 0x4)) = i; }
Initialize BMI linked list by writing '1' to FMBM_INIT[STR].CCSR(fman_base | FMBM_INIT) = 0x80000000;Signal the specific port drivers that the common parameters are initialized, and that they can go ahead and initialize the port. Both TX and RX are in Normal Mode.FMan_BMI_RX_Init(fman_base);FMan_BMI_TX_Init(fman_base);
3. Buffer Manager Interface Common Registers− Initialization of the common registers must happen prior to initialization of any specific RX/TX port. Since
common registers affect all ports, it must be initialized by the software entity which is aware of the activity and demands of the entire BMI ports, including different software partitions. This is usually called Hypervisor. See DPAARM Rev 0 Section 8.5.6 Frame Manager BMI Initialization/Application Information. For simplicity assume that only FMan1 dTSEC1 will be configured for use.
Set the total size of Free Buffer Pool and its offset by writing to FMBM_CFG1. This is the size of the internal RAM, aka “FMan internal RAM” or just “FMan memory”.#define FMBM_FBP_SIZE 0x28000 //160k limit based on P4080RM Rev 0 Section 21.5.3 CCSR(fman_base | FMBM_CFG1) = ((FMBM_FBP_SIZE / 256) - 1) << 16; //Setting offset to '0'.Set the total allowed tasks and DMA slots to be used by BMI in FMBM_CFG2.#define FMBM_TNTSKS 128 //Maximum allowed open tasks is 128#define FMBM_TDMA 32 //Maximum allowed DMA transfers is 32CCSR(fman_base | FMBM_CFG2) = (FMBM_TDMA - 1) | ((FMBM_TNTSKS - 1) << 16);Set LIODN per port in FMBM_SPLIODN registers.for(i = 1; i < NUM_OF_BMI_PORTS; i++){ CCSR((fman_base | FMBM_SPLIODN_1) + ((i-1) * 0x4)) = i; }
Initialize BMI linked list by writing '1' to FMBM_INIT[STR].CCSR(fman_base | FMBM_INIT) = 0x80000000;Signal the specific port drivers that the common parameters are initialized, and that they can go ahead and initialize the port. Both TX and RX are in Normal Mode.FMan_BMI_RX_Init(fman_base);FMan_BMI_TX_Init(fman_base);
4. Buffer Manager Interface RX Port Registers− This function initializes the RX specific ports of the Buffer Manager Interface sub-block of the
FMan. It indicates to FMan how large the FMan Memory space is, as well as how many Buffer Pools (and associated sizes) are available from BMan. It will also indicate the NIAs for ingress frames. For simplicity only the registers pertaining to FMan1 dTSEC1 are shown here.STATUS FMan_BMI_RX_Init(ULONG fman_base) {
a. Disable Rx port by clearing FMBM_RCFG[EN]. Verify it is disabled by reading FMBM_RST[BSY].CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) &= ~FMBM_RCFG_EN;while(CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) & FMBM_RST_BSY) { asm_sync(); }
b. Set DMA attributes to be used in FMBM_RDA. For now leave stashing disabled since PAMU is bypassed.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RDA) = 0x00000000;
c. Set FIFO parameters in FMBP_RFP. Set the Priority Elevation Levels and FIFO Thresholds to the number of allocated buffers per port. If the number of buffers in FMBM_PFS[IFSZ] is consumed then this port's buffer space is full and we know we need to increase priority and decrease ingress frames via PAUSE frames.CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_RFP)) = ((FMBM_PORT_FIFO_SIZE - 1) << 16) | (FMBM_PORT_FIFO_SIZE - 1);
d. Set desired frame margins parameters in FMBP_RFED, FMBM_RIM, FMBM_REBM. Chop the last 4 bytes of the frame which will be appended CRC. This could break a software CRC check. CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFED) = 0x00040000;Set frame internal offset to 0x00. The frame data will start at the REBM + this offset.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RIM) = 0x00000000;32 bytes extra will be added to the start of every frame. This will hold the IC and other possible data. Actual frame data starts at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_REBM) = FMBM_REBM_BSM << 16;
e. Set Internal Context parameters in FMBM_RICP. Transfer the first 16 bytes of the IC to offset 0x10 in the external buffer. This means that the first 16 bytes of a buffer are null, the next 16 are the IC, and the actual frame data begins at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RICP) = (IC_COPY_OFFSET << 16) | (IC_COPY_SIZE);
4. Buffer Manager Interface RX Port Registers− This function initializes the RX specific ports of the Buffer Manager Interface sub-block of the
FMan. It indicates to FMan how large the FMan Memory space is, as well as how many Buffer Pools (and associated sizes) are available from BMan. It will also indicate the NIAs for ingress frames. For simplicity only the registers pertaining to FMan1 dTSEC1 are shown here.STATUS FMan_BMI_RX_Init(ULONG fman_base) {
a. Disable Rx port by clearing FMBM_RCFG[EN]. Verify it is disabled by reading FMBM_RST[BSY].CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) &= ~FMBM_RCFG_EN;while(CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) & FMBM_RST_BSY) { asm_sync(); }
b. Set DMA attributes to be used in FMBM_RDA. For now leave stashing disabled since PAMU is bypassed.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RDA) = 0x00000000;
c. Set FIFO parameters in FMBP_RFP. Set the Priority Elevation Levels and FIFO Thresholds to the number of allocated buffers per port. If the number of buffers in FMBM_PFS[IFSZ] is consumed then this port's buffer space is full and we know we need to increase priority and decrease ingress frames via PAUSE frames.CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_RFP)) = ((FMBM_PORT_FIFO_SIZE - 1) << 16) | (FMBM_PORT_FIFO_SIZE - 1);
d. Set desired frame margins parameters in FMBP_RFED, FMBM_RIM, FMBM_REBM. Chop the last 4 bytes of the frame which will be appended CRC. This could break a software CRC check. CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFED) = 0x00040000;Set frame internal offset to 0x00. The frame data will start at the REBM + this offset.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RIM) = 0x00000000;32 bytes extra will be added to the start of every frame. This will hold the IC and other possible data. Actual frame data starts at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_REBM) = FMBM_REBM_BSM << 16;
e. Set Internal Context parameters in FMBM_RICP. Transfer the first 16 bytes of the IC to offset 0x10 in the external buffer. This means that the first 16 bytes of a buffer are null, the next 16 are the IC, and the actual frame data begins at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RICP) = (IC_COPY_OFFSET << 16) | (IC_COPY_SIZE);
4. Buffer Manager Interface RX Port Registers− This function initializes the RX specific ports of the Buffer Manager Interface sub-block of the
FMan. It indicates to FMan how large the FMan Memory space is, as well as how many Buffer Pools (and associated sizes) are available from BMan. It will also indicate the NIAs for ingress frames. For simplicity only the registers pertaining to FMan1 dTSEC1 are shown here.STATUS FMan_BMI_RX_Init(ULONG fman_base) {
a. Disable Rx port by clearing FMBM_RCFG[EN]. Verify it is disabled by reading FMBM_RST[BSY].CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) &= ~FMBM_RCFG_EN;while(CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) & FMBM_RST_BSY) { asm_sync(); }
b. Set DMA attributes to be used in FMBM_RDA. For now leave stashing disabled since PAMU is bypassed.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RDA) = 0x00000000;
c. Set FIFO parameters in FMBP_RFP. Set the Priority Elevation Levels and FIFO Thresholds to the number of allocated buffers per port. If the number of buffers in FMBM_PFS[IFSZ] is consumed then this port's buffer space is full and we know we need to increase priority and decrease ingress frames via PAUSE frames.CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_RFP)) = ((FMBM_PORT_FIFO_SIZE - 1) << 16) | (FMBM_PORT_FIFO_SIZE - 1);
d. Set desired frame margins parameters in FMBP_RFED, FMBM_RIM, FMBM_REBM. Chop the last 4 bytes of the frame which will be appended CRC. This could break a software CRC check. CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFED) = 0x00040000;Set frame internal offset to 0x00. The frame data will start at the REBM + this offset.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RIM) = 0x00000000;32 bytes extra will be added to the start of every frame. This will hold the IC and other possible data. Actual frame data starts at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_REBM) = FMBM_REBM_BSM << 16;
e. Set Internal Context parameters in FMBM_RICP. Transfer the first 16 bytes of the IC to offset 0x10 in the external buffer. This means that the first 16 bytes of a buffer are null, the next 16 are the IC, and the actual frame data begins at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RICP) = (IC_COPY_OFFSET << 16) | (IC_COPY_SIZE);
4. Buffer Manager Interface RX Port Registers− This function initializes the RX specific ports of the Buffer Manager Interface sub-block of the
FMan. It indicates to FMan how large the FMan Memory space is, as well as how many Buffer Pools (and associated sizes) are available from BMan. It will also indicate the NIAs for ingress frames. For simplicity only the registers pertaining to FMan1 dTSEC1 are shown here.STATUS FMan_BMI_RX_Init(ULONG fman_base) {
a. Disable Rx port by clearing FMBM_RCFG[EN]. Verify it is disabled by reading FMBM_RST[BSY].CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) &= ~FMBM_RCFG_EN;while(CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) & FMBM_RST_BSY) { asm_sync(); }
b. Set DMA attributes to be used in FMBM_RDA. For now leave stashing disabled since PAMU is bypassed.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RDA) = 0x00000000;
c. Set FIFO parameters in FMBP_RFP. Set the Priority Elevation Levels and FIFO Thresholds to the number of allocated buffers per port. If the number of buffers in FMBM_PFS[IFSZ] is consumed then this port's buffer space is full and we know we need to increase priority and decrease ingress frames via PAUSE frames.CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_RFP)) = ((FMBM_PORT_FIFO_SIZE - 1) << 16) | (FMBM_PORT_FIFO_SIZE - 1);
d. Set desired frame margins parameters in FMBP_RFED, FMBM_RIM, FMBM_REBM. Chop the last 4 bytes of the frame which will be appended CRC. This could break a software CRC check. CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFED) = 0x00040000;Set frame internal offset to 0x00. The frame data will start at the REBM + this offset.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RIM) = 0x00000000;32 bytes extra will be added to the start of every frame. This will hold the IC and other possible data. Actual frame data starts at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_REBM) = FMBM_REBM_BSM << 16;
e. Set Internal Context parameters in FMBM_RICP. Transfer the first 16 bytes of the IC to offset 0x10 in the external buffer. This means that the first 16 bytes of a buffer are null, the next 16 are the IC, and the actual frame data begins at offset 0x20.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RICP) = (IC_COPY_OFFSET << 16) | (IC_COPY_SIZE);
f. Set BMI Enqueue as the next engine in FMBM_RFNE. For now, we will bypass the Parser and KeyGen and go straight to the BMI Enqueue and then to QMI Enqueue via RFENE.//CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFNE) = 0x00440000; //This is for use with Parser.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFNE) = 0x00500002; //Skip straight to BMI Enqueue
g. Set FQID and EFQID by writing to FMBM_RFQID, FMBM_REFQID registers. Provide a default FQ ID in case Parse/Classify fails, or in the case that BMI goes straight to QMI and skips PCD. In order to identify which port has received a frame, we need to keep track of the default FQIDs for each port. CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFQID) = ports[PORT1].rx_fqid ;CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_REFQID) = RX_ERROR_FQID; //RX Error FQID = 0xEE.
h. Configure the mask vectors in FMBM_RFSEM according to the desired action of BMI. Enqueue any frame with error bits set in the received status word to the Error Frame Queue ID set by REFQID. See DPAARM Rev 0 Section 8.5.3.3.20. We are not checking the FCL (Frame Color) field which is asserted by the Policer because we are not using the Policer.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFSEM) = 0x170EEFF0;
i. Configure QMI Enqueue as next engine in FMBM_RFENE and clear ORR bit.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RFENE) = 0x00540000;
j. Configure FMBM_REBMPI registers according to the external BM pools allocated to the port. The valid pools should be programmed in ascending order. The Buffer Pool with smallest buffers should be written into FMBM_REBMPI_1. Iterate through only 3 of the REBMPI registers as only 3 BMan pools exist to initialize. Make sure that all ports have access to these 3 pools.CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_REBMPI_1)) = FMBM_REBMPI_VAL | (BMAN_POOL_1_ID << 16) | BMAN_POOL_1_BUF_SIZE;CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_REBMPI_2)) = FMBM_REBMPI_VAL | (BMAN_POOL_2_ID << 16) | BMAN_POOL_2_BUF_SIZE;CCSR((fman_base | FMBM_1G_RX0_BASE | FMBM_REBMPI_3)) = FMBM_REBMPI_VAL |(BMAN_POOL_3_ID << 16) | BMAN_POOL_3_BUF_SIZE;
k. Enable a PAUSE frame signal to the MAC if 2 buffer pools are depleted, but do not send a PAUSE frame signal on any single pool being depleted.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_MPD) = 0xE0010000;
l. Enable statistic counters if desired via FMBM_RSTC[EN].CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RSTC) = 0x80000000;
m. Enable Rx port by setting FMBM_RCFG[EN]. FMBM_RCFG[IM] bit should be set to '0' (cleared) which places Rx port in Normal mode. Also set the FDOVR bit which causes "would be discarded" frames to actually enqueue to the Error FQ.CCSR(fman_base | FMBM_1G_RX0_BASE | FMBM_RCFG) = FMBM_RCFG_EN | FMBM_RCFG_FDOVR;
5. Buffer Manager Interface TX Port RegistersThis function initializes the TX specific ports of the Buffer Manager Interface sub-block of the FMan. It indicates the NIAs for
egress frames, as well as the frame meta-data to include. The 10G TX port is disabled. For simplicity only the registers pertaining to
FMan1 dTSEC1 are shown here. See DPAARM Rev 0 Section 8.5.6.4.
STATUS FMan_BMI_TX_Init(ULONG fman_base) {a. Disable Tx port by clearing FMBM_TCFG[EN]. Verify it is disabled by reading the FMBM_TST[BSY] bit until cleared.
b. Prior to BMI TX port initialization, it is required that the MAC registers be initialized and TX operation in the MAC be enabled. Refer to DPAARM Rev 0 Chapter 3 and Chapter 4.
c. Set FIFO parameters in FMBM_TFP.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TFP) = 0x00000013;
d. Set Internal Context parameters in FMBM_TICP. Transfer the first 16 bytes of the IC to offset 0x10 in the external buffer. This means that the first 16 bytes of a buffer are null, the next 16 are the IC, and the actual frame data begins at offset 0x20.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TICP) = (IC_COPY_OFFSET << 16) | (IC_COPY_SIZE);
e. Set QMI dequeue as next module in FMBM_TFNE. Set the NIA module as the QMI dequeue.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TFNE) = 0x00580000;
f. Set CFQID and EFQID by writing to FMBM_TCFQID, FMBM_TEFQID registers. The default Tx Confirmation FQID is unused. This is to save software from the responsibility of dequeuing confirmation frames. The default Tx Error FQID is 0xFF.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TCFQID) = 0x00000000;CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TEFQID) = 0x000000FF;
g. Set QMI Enqueue as next module in FMBM_TFENE and clear ORR bit. This is in case of Tx Confirmation frames.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TFENE) = 0x00540000;
h. Enable statistics counters if desired via FMBM_TSTC.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TSTC) = 0x80000000;
i. Enable Tx port by setting FMBM_TCFG[EN]. FMBM_TCFG[IM] bit should be cleared ('0') for Normal Mode.CCSR(fman_base | FMBM_1G_TX0_BASE | FMBM_TCFG) = FMBM_TCFG_EN;
10. Queue Manager Interface Init Assign sub-portal IDs to each MAC and indicate to each MAC which sub-portal they can enqueue/dequeue to/from. See DPAARM Rev 0 Section
8.10.4. The QMI is ready for operation after reset and there is no need for special configuration (only for the common part). If not using the default configuration the user may follow these initialization steps:
Common Registersa. Enable global QMI statistic counters via FMQM_GC.
CCSR(fman_base | FMQM_GC) |= FMQM_GC_STEN;
b. Enable the assertion of the interrupt and the error interrupt lines for the wanted events, using FMQM_IEN and FMQM_EIEN registers. For debug purposes the FMQM_EIF and FMQM_IF force registers can be used to test the interrupt path. See DPAARM Rev 0 Section 8.10.2.1.2.CCSR(fman_base | FMQM_EIE) = 0xC0000000; //Clear registerCCSR(fman_base | FMQM_EIEN) = 0xC0000000; //Enable all interruptsCCSR(fman_base | FMQM_IE) = 0x80000000; //Clear registerCCSR(fman_base | FMQM_IEN) = 0x80000000; //Enable all interrupts
RX Portsc. Optionally, the following initialization step can be performed. (In this section n=0x8-0xB,0x10):
The user can change the NIA that the QMI sends to the FPM after the enqueue operation. It can be done using the FMQM_PnEN registers.
Tx/Host Command/Offline Parsing PortsThe following initialization steps must be performed (In this section n=0x1-0x7,0x28-0x2B,0x30):
d. The user should config the priority level of this portID and the prefetch operation. In addition, the user must config the dequeue frame amount, mapping this port to a sub-portal to dequeue from QMan, dequeue options and the byte count level control. It can be done by FMQM_PnDC registers.CCSR((fman_base | FMQM_P1DC) + (PORTID_BASE_SPACE * (PORTID_1G_TX0))) = FMQM_PDC_FRM | (PORTID_1G_TX0_SUBP << 20) | 0x1000FFFE;
e. Finally, the user should enable this port by writing one to FMQM_PnC[EN] bit. Also enable statistics.CCSR((fman_base | FMQM_P1C) + (PORTID_BASE_SPACE * (PORTID_1G_RX0 ))) = FMQM_PC_EN | FMQM_PC_STEN;CCSR((fman_base | FMQM_P1C) + (PORTID_BASE_SPACE * (PORTID_1G_TX0 ))) = FMQM_PC_EN | FMQM_PC_STEN;
10. Queue Manager Interface Init Assign sub-portal IDs to each MAC and indicate to each MAC which sub-portal they can enqueue/dequeue to/from. See DPAARM Rev 0 Section
8.10.4. The QMI is ready for operation after reset and there is no need for special configuration (only for the common part). If not using the default configuration the user may follow these initialization steps:
Common Registersa. Enable global QMI statistic counters via FMQM_GC.
CCSR(fman_base | FMQM_GC) |= FMQM_GC_STEN;
b. Enable the assertion of the interrupt and the error interrupt lines for the wanted events, using FMQM_IEN and FMQM_EIEN registers. For debug purposes the FMQM_EIF and FMQM_IF force registers can be used to test the interrupt path. See DPAARM Rev 0 Section 8.10.2.1.2.CCSR(fman_base | FMQM_EIE) = 0xC0000000; //Clear registerCCSR(fman_base | FMQM_EIEN) = 0xC0000000; //Enable all interruptsCCSR(fman_base | FMQM_IE) = 0x80000000; //Clear registerCCSR(fman_base | FMQM_IEN) = 0x80000000; //Enable all interrupts
RX Portsc. Optionally, the following initialization step can be performed. (In this section n=0x8-0xB,0x10):
The user can change the NIA that the QMI sends to the FPM after the enqueue operation. It can be done using the FMQM_PnEN registers.
Tx/Host Command/Offline Parsing PortsThe following initialization steps must be performed (In this section n=0x1-0x7,0x28-0x2B,0x30):
d. The user should config the priority level of this portID and the prefetch operation. In addition, the user must config the dequeue frame amount, mapping this port to a sub-portal to dequeue from QMan, dequeue options and the byte count level control. It can be done by FMQM_PnDC registers.CCSR((fman_base | FMQM_P1DC) + (PORTID_BASE_SPACE * (PORTID_1G_TX0))) = FMQM_PDC_FRM | (PORTID_1G_TX0_SUBP << 20) | 0x1000FFFE;
e. Finally, the user should enable this port by writing one to FMQM_PnC[EN] bit. Also enable statistics.CCSR((fman_base | FMQM_P1C) + (PORTID_BASE_SPACE * (PORTID_1G_RX0 ))) = FMQM_PC_EN | FMQM_PC_STEN;CCSR((fman_base | FMQM_P1C) + (PORTID_BASE_SPACE * (PORTID_1G_TX0 ))) = FMQM_PC_EN | FMQM_PC_STEN;
• [dTSEC and GEC MAC initialization must occur before initialization of the FMan BMI] – Go to slide
• [The FPM only needs programming changes for debug or errata workaround purposes] – Go to slide
• [By default the Free Buffer Pool size programmed in the BMI is not set to the design limit of 160k] –Go to slide
• [For ingress frames, the 4 byte CRC should be removed so check sum calculations will pass] –Go to slide
• [The RICP IC copy settings must be heeded in REBM in order to know where the frame data begins] – Go to slide
• [PCD rules can be bypassed by going straight to BMI Enqueue via FMBM_RFNE, note the Action Code] – Go to slide
• [A default RX FQID is provided for when PCD rules are bypassed. In this case all ingress frames from this port go to the same FQID, which can be different for each port] – Go to slide
• [FMBM_RFENE does not need to be modified from its default setting of QMI Enqueue] – Go to slide
• [Knowledge of what BMan Buffer Pools are available must be programmed in to the BMI in ascending order] – Go to slide
• [To place an RX port in Normal Mode the FMBM_RCFG[IM] bit must be ‘0’] – Go to slide• [For egress packets, software must keep special track of the TICP parameters to know where
the frame data begins] – Go to slide• [The TX Confirmation frame can be disabled to save overhead of software dequeuing these
frames] –Go to slide
• [FMBM_TFENE is set to QMI Dequeue as the next path in case TX Confirmation frames are used] –Go to slide
• [To place a TX port in Normal Mode the FMBM_TCFG[IM] bit must be ‘0’] – Go to slide• [The entire PCD path can be bypassed by placing each PCD block in bypass or disabled
mode] –Go to slide
• [The P4080 provides support for more Partition ID LIODNs than there are actual partitions on the device. This is for future support of more partitions, and each Partition ID LIODN should be unique and consistent throughout the system] – Go to slide
• [The Common and RX port registers for the QMI are ready for use by default, and only need programming changes for debug or interrupt capabilities] – Go to slide
• [The FMQM_PnDC registers must be programmed to connect a dTSEC port to a dequeuesub-portal] – Go to slide
• Buffer pools are lists of available buffers which have the same characteristics− Size− Addressability/accessibility− The characteristics which are important are determined by the user(s) of the pool
• Buffer pools are identified using pool ids (BPID)Used in the programming model for the various BMan enabled blocks to specify which pool(s) to acquire/release buffers from/toSupported by FMan’s and QMan’s programming/interface model
• BMan Command Type− BMan Command Registers are 64B long− Command Verb (1B) + Buffer Pool ID (1B)
Bit 1-3: Response Type. Valid encodings are:• 1h = Acquire buffers (Acquire)• 2h = Release buffers to the pool identified in byte field 1 (Release)• 3h = Release each buffer to the pool identified in byte field immediately preceding its buffer field (Release)• 6h = Invalid command (Response)• 7h = Stockpile ECC Error (Response)Bit 4-7: Number of buffers associated with command type, maximum 8• 0h = Zero buffers• 1h = One buffer• ....• 8h = Eight buffersReturns up to eight 48bit buffer addresses
• Rings are finite size FIFO structures indexed in a circular manner• The entity placing data into the ring uses a Producer Index (PI) to indicate
the entry it will place data in• The entity reading data from the ring uses a Consumer Index (CI) to indicate
the next entry it will read data from• Math is modulo the size of the ring• PI == CI means the ring is empty• (PI – CI) == (“size of ring” – 1) means the ring is full
− A ring holds at most (“size of ring” - 1) entries• Software can update PI/CI directly or indirectly
1. Initialize software structures mapped to memory and related portal information
2. Set up Free Buffer Proxy Record memory (FBPR)a. Select a power of 2 memory size for FBPRsb. Assign 64-bit Physical Addressc. Write the base address into FBPR_BARE and FBPR_BAR
registersd. Write the size in the FBPR_AR register
3. Software Portal configurationa. RCR Interrupt Thresholdb. PI/CI (or Valid Bit) for release command ring (RCR)c. Initialize Buffer Pools and release to BMan via RCR rings
1. Initialize Software Structures Mapped to Memory and Related Portal InformationConnect the RCR rings and the CR/RR registers with the physical LAW mapped BM_CENA region. The individual entries in each RCR ring should be implicitly defined during the allocation of the encapsulating BMAN_RCR_RING. For simplicity it is assumed that only software portal zero is used in this example. See the full code for a more robust solution.static BMAN_CR* bman_cr[NUM_OF_PORTALS]; //CR for each portalstatic BMAN_PORTAL bman_portals[NUM_OF_PORTALS]; //Software construct for tracking portal information
Connect CR to mapped memory space.bman_cr[0] = (BMAN_CR*) (&BCSP_CENA(BCSP_CR));LIODN for this portal.bman_portals[0].liodn = BMAN_LIODN;Partition number of this portal.bman_portals[0].pid = 0;
See DPAARM Rev 0 Section 7.3.1.2 Command Register (CR) Functionality.The very first command written after reset must have a 1 in its valid bit, since this is the valid bit polarity after reset. bman_portals[0].rr_vbit = 0x01;
See DPAARM Rev 0 Section 7.3.4 RCR Production Notification in Software Portals (RCR_PI).The very first command written after reset must have a 1 in its valid bit, since this is the valid bit polarity after reset. The RCR valid bit should only flip after 7 entries of the RCR ring have been used and we wrap around to entry 0.bman_portals[0].pi_vbit = (BCSP_CINH(BCSP_RCR_PI_CINH) & BCSP_RCR_PI_CINH_VP) >> 3;bman_portals[0].ci_vbit = (BCSP_CINH(BCSP_RCR_CI_CINH) & BCSP_RCR_CI_CINH_VC) >> 3;
Initialize Producer and Consumer indexes to what HW expects. These are software copies only and do not directly modify the HW copies found in the BCSPn_RCR_PI/CI registers.bman_portals[0].pi = (BCSP_CINH(BCSP_RCR_PI_CINH) & BCSP_RCR_PI_CINH_PI);bman_portals[0].ci = (BCSP_CINH(BCSP_RCR_CI_CINH) & BCSP_RCR_CI_CINH_CI);
Example of writing to software portal index i://bman_portals[i].ci = (BCSP_CINH((BCSP_RCR_CI_CINH) | (i * BCSP_CINH_BASE_SPACE)) & BCSP_RCR_CI_CINH_CI);
2. Set Up Free Buffer Proxy Record Memory (FBPR)This struct holds general BMan information for use during initialization.typedef struct bman_init_struct {
ULONG fbpr_bare; //FBPR Memory Base Physical Address, upper 32 bitsULONG fbpr_bar; //FBPR Memory Base Physical Address, lower 32 bitsULONG fbpr_exponent; //FBPR Power of 2 Size Exponent for FBPR_AR[SIZE]
} BMAN_INIT;static BMAN_INIT bman_init;
a. Select a power of 2 memory size for FBPRs.bman_init.fbpr_exponent = ilog2(BMAN_FBPR_MEM_SIZE) – 1;
b. Assign 64-bit Physical Address.bman_init.fbpr_bare = 0x00000000; //Still operating in 32bit spacebman_init.fbpr_bar = BMAN_FBPR_MEM_REAL;
c. Write the base address into FBPR_BARE and FBPR_BAR registers.CCSR(FBPR_BARE) = bman_init.fbpr_bare;CCSR(FBPR_BAR) = bman_init.fbpr_bar;
d. Write the size in the FBPR_AR register.CCSR(FBPR_AR) = bman_init.fbpr_exponent;
3. Software Portal Configurationa. Use valid bit mode since it is straightforward and the RCR entry tracking is implied via the valid
bit of each RCR (which is written by software). Software must keep track of the RCR index for each software portal.for(i = 0; i < NUM_OF_PORTALS; i++) {
BCSP_CINH((BCSP_CFG) | (i * BCSP_CINH_BASE_SPACE)) = 0x00000003;} //end for
b. RCR Interrupt Thresholds. The Interrupt Threshold Registers for each portal will interrupt the core when the # of RCR entries consumed is less than the # in RCR_ITR. If zero then this interrupt mechanism is disabled and polling of the RCR entries will need to be done by software.for(i = 0; i < NUM_OF_PORTALS; i++) {
BCSP_CINH((BCSP_RCR_ITR) | (i * BCSP_CINH_BASE_SPACE)) = BMAN_RCR_IT;} //end for
c. PI/CI or Valid Bit for release command ring (RCR).Since we are in Valid Bit mode the BCSPn_RCR_PI_CENA registers are read only and do not need to be initialized.
d. Initialize Buffer Pools and release to BMan via RCR rings.
f.
static BMAN_POOL bman_pools[BMAN_NUM_OF_POOLS]; //3 Buffer Pools static BMAN_RCR_RING* bman_rcr_rings[NUM_OF_PORTALS]; //RCR ring for each portalstatic BMAN_RR* bman_rr0[NUM_OF_PORTALS]; //RR0 for each portalstatic BMAN_RR* bman_rr1[NUM_OF_PORTALS]; //RR1 for each portal
A Buffer Pool ID of ‘0’ is valid. bman_pools[0].id = BMAN_POOL_1_ID;bman_pools[0].buf_size = BMAN_POOL_1_BUF_ZIE;
The buf_count is modulus RCR_BUFFS_PER_RELEASE because we want to make sure during initialization that we have an even number of buffer releases per group. e.g. if RCR_BUFFS_PER_RELEASE is 5, we do not want the total buf_count to be 6, because we will release 5 in the first group, and have a remainder of 1 buffer. bman_pools[0].buf_count = BMAN_POOL_1_BUF_COUNT – (BMAN_POOL_1_BUF_COUNT % RCR_BUFFS_PER_RELEASE);
Malloc space for buffer poolbman_pools[0].buf_addr = (ULONG) malloc_aligned(bman_pools[0].buf_size * bman_pools[0].buf_count, 64);
See DPAARM Rev 0 Section 7.3.3 Software Portal Release Command Ring (RCR)Populate RCR ring entries with buffer pool addresses. At the end of each group the RCR will bewritten to signal to BMan to release these buffers.BMan_Release(portal_num, bman_pools[0].id, bman_pools[0].buf_count, bman_pools[0].buf_addr);
This function uses the software copy of the Consumer Index(CI) to go through the RCR ring and release buffers for BMan control via the RELEASE verb. A total of num_to_release buffers will be attempted and the CI will be incremented accordingly. For simplicity this routine has been modified to assume only up to 8 entries are released at once. See the full code for a more robust solution.
STATUS BMan_Release(BYTE portal_num, BYTE bpid, ULONG num_to_release, ULONG start_address) {BMAN_PORTAL *portal = &bman_portals[portal_num]; //The Software PortalBMAN_POOL* bp = &bman_pools[bpid]; //Buffer Pool to release in toBMAN_RCR_ENTRY* entry = NULL; //RCR Ring entry to use
entry = &bman_rcr_rings[portal_num]->entries[portal->ci]; //Get current consumable entry.asm volatile("lwsync"); //Perform a lwsync in order to ensure
//cache writes are completed
When operating on multiple buffers only the first buffer entry in each RCR ring entry needs to have the vbit written to.entry->buffer[0].verb = (portal->ci_vbit << 7) | RCR_VERB_SINGLE_RELEASE | num_to_release;
asm_dcbf(entry); //Flush the ring out of the cache using dcbf
Increment the software copy of the Consumer Indexportal->ci = (portal->ci + 1) % RCR_PER_RING;
Change valid bit polarity if the CI has wrapped around the RCR ringif(portal->ci == 0) {
This function will initiate an ACQUIRE verb to a BMan software portal (portal_num) to request a certain # of buffers (num_to_acquire) of a specific size (requested_size). The BMan will return the buffer pointers in the RR registers. The requested_size will be used to determine which buffer pool to allocate pointers from. If no pool has large enough buffers then return NULL. Note that buffer position zero in the RR registers will contain the last buffer pointer since BMan returns buffer pointers in reverse order.
• [The Response Register valid bit polarity after reset is ‘1’] – Go to slide• [The Release Command Ring valid bit polarity after reset is ‘1’] – Go to slide• [The RCR valid bit flips after 7 entries of the RCR ring have been used, not every use] – Go
to slide• [Software must track the RCR Producer and Consumer Indices separately from hardware] –
Go to slide• [The FBPR space must be a power of 2] – Go to slide• [In Valid Bit mode software must keep track of the RCR index for each software portal] – Go
to slide• [The BCSP_RCR_ITR Interrupt Threshold Registers will interrupt the core when the # of
RCR entries consumed is less than the # in RCR_ITR] – Go to slide• [In Valid Bit mode the BCSPn_RCR_PI_CENA registers are read only and do not need to be
initialized] – Go to slide• [A Buffer Pool ID of ‘0’ is valid] – Go to slide
• [The LIODN ID of BMan is arbitrary but must be unique and consistent in the system] – Go to slide• [When software issues a BMan Release it must provide the Buffer Pool ID. Remember that buffer
pool IDs start at ‘0’, but the first index in to the buffer pool array may be different] – Go to slide• [When releasing multiple buffers only the first buffer entry in each RCR ring entry needs to have the
vbit written to] – Go to slide• [BMan returns buffer pointers in the RR registers in reverse order] – Go to slide• [Software must keep track of the RR vbit in order to know which RR[0/1] register is valid] – Go to
slide• [Software must determine the Buffer Pool ID to use for BMan Acquire based on the required buffer
size] – Go to slide• [After issuing an Acquire command, software must bring in the RR response in to the cache by
invalidating RR and polling on the RR verb] – Go to slide
• QMan provides a means to inter-connect between other DPAA components− Cores (including IPC) − Hardware offload accelerators − Network interfaces – Frame Manager
• Frame queues are “logical” queues− QMan queues “ frame descriptors”, not frame− Facilitates load spreading− Lockless shared queues for load spreading and
device “virtualization”• QMan offers
− Low latency, prioritized queuing of descriptors between cores, network I/O and accelerators
− QMan supports both preservation of order of queued data (i.e. non-parallel processing) and restoration of order of queued data (after parallel processing)
− Active queue management (WRED)− Optimized core interface which can pre-position
data/context/descriptors in core’s cache− Delivery of per-queue accelerator specific
commands and context information to offload accelerators along with dequeued descriptors
Frame Descriptor (FD)− The basic queue Element that describe a frame− Usually a single IO packet will use a single frame− Other scenarios: commands with no buffer
Frame Queue Descriptor (FQD)− A linked list of FD’s− Usually a frame queue is associated with flow− Head of frame queue can be associated to ODP− Enqueue operation must include the target FQ as a
parameter− Dequeue operation may use FQ as a parameter for
operationWork Queue Structure
− Linked list of FQD− Hold flows of the same priority and designation− Dequeue operation may use WQ as a parameter for
operationChannel
− Set of eight WQ Channel served by a single type of entity
− Dequeue from channel can be configured to be:strict priorityround robin (Simple, Weighted or Deficit)
− Dequeue may use Channel as a parameter for operation
• Portals are the interface between QMan and the blocks which use them− Direct Connect Portals has direct connect signals to Dedicated Channel− Software Portals use CoreNet as the physical interconnect
It services both Dedicated and Pool ChannelsSoftware and QMan interact by “reading” and “writing” data across CoreNet
• The information which passes across portals is the FDs and queuing information, not the actual frame data
• Blocks issue commands and receive QMan’s responses across portals
• Dequeues are performed as a result of commands issued to QMan
• QMan supports 2 modes on software portals: push mode and pull mode− Push mode
In this mode, QMan will continue to push entries into DQRR in attempt to keep it “full”QMan provides 2 command registers• One register is “static” and QMan
repeatedly executes this command• One register is “volatile” and QMan
executes that command a limited number of times
Push mode is “just like” a BD ring− Pull Mode:
QMan provides a single command registerSoftware must issue a new command for each dequeue operation
• Push mode will be the most common mode. eDINK uses PUSH mode.
• Pull mode is provided because it addresses certain QoS issues
• The Context A/B fields are provided for use by the portal/consumer− They contain or point to associated information (“context”) which can be
used to describe how the consumer should process the frames in a FQ− These fields are used differently depending on the consumer− Software portals use them to :
Control cache stashing of frame data and queue contextProvide a per queue software defined identifier
− Hardware portals use this to:Pass context information to the acceleratorSpecify the “return” FQ on which to enqueue results
NOTE: CONTEXT_A is not available to Software Portals, however it is used for stashing and is covered in Part 2.
In eDINK, CONTEXT_B is used to identify the Rx port on receipt of an ingress frame. Ingress frames on each FMan port are directed to a different software portal based on the Rx FQ ID.
1. Initialize software structures mapped to memory and related portal informConnect the EQCR, DQRR, and MR rings with the physical LAW mapped QM_CENA region. The individual entries in each EQCR ring should be implicitly defined during the allocation of the encapsulating QMAN_EQCR_RING.
For simplicity it is assumed that only software portal zero is used in this example. See the full code for a more robust solution.qman_eqcr_rings[0] = (QMAN_EQCR_RING*) (&QCSP_CENA(QCSP_EQCR0));
See DPAARM Rev 0 Section 6.4.6.2.1 DQRR Production Notification and Section 6.4.6.2.2 DQRR Consumption NotificationInitialize Producer and Consumer indexes to what HW expects. These are software copies only and do not directly modify the HW copies found in the QCSPn_EQCR_PI/CI, QCSPn_DQRR_PI/CI, and QCSPn_MR_PI/CI registers.qman_portals[0].dqrr_pi = (QCSP_CINH((QCSP_DQRR_PI_CINH)) & QCSP_DQRR_PI_CINH_PI);qman_portals[0].dqrr_pi_vbit = (QCSP_CINH((QCSP_DQRR_PI_CINH)) & QCSP_DQRR_PI_CINH_VP) >> 4;qman_portals[0].dqrr_ci = (QCSP_CINH((QCSP_DQRR_CI_CINH)) & QCSP_DQRR_CI_CINH_CI);
See DPAARM Rev 0 Section 6.4.6.3.1 MR Production Notification and Section 6.4.6.3.2 MR Consumption Notificationqman_portals[0].mr_pi = (QCSP_CINH((QCSP_MR_PI_CINH)) & QCSP_MR_PI_CINH_PI);qman_portals[0].mr_pi_vbit = (QCSP_CINH((QCSP_MR_PI_CINH)) & QCSP_MR_PI_CINH_VP) >> 3;qman_portals[0].mr_ci = (QCSP_CINH((QCSP_MR_CI_CINH)) & QCSP_MR_CI_CINH_CI);
Assign SDQCR tokens to each portal starting at 0x10.qman_portals[0].token = 0x10;
Enable portal to dequeue by configuring DQRR in a static dequeue state via SDQCR. Enable the Dedicated Channel for this software portal and Pool Channel #1 for dequeue via DQ_SRC (Pool Channels 2-15 are disabled).Dequeue up to 3 frames at a time in response to a dequeue command.Dequeue with priority precedence, and Intra-Class Scheduling respected (DCT = 1).QCSP_CINH(QCSP_DQRR_SDQCR | (QCSP0_CINH_BASE)) = 0x3100C000 | (qman_portals[0].token << 16);
Example of writing to software portal index i://qman_portals[i].dqrr_ci = (QCSP_CINH((QCSP_DQRR_CI_CINH) | (i * QCSP_CINH_BASE_SPACE)) & QCSP_DQRR_CI_CINH_CI);
2. Set up Frame Queue Descriptors (FQD)This struct holds general BMan information for use during initialization.typedef struct qman_init_struct {
ULONG fqd_bare; //FQD Memory Base Physical Address, upper 32 bitsULONG fqd_bar; //FQD Memory Base Physical Address, lower 32 bitsULONG fqd_exponent; //FQD Power of 2 Size Exponent for FQD_AR[SIZE]ULONG pfdr_bare; //PFDR Memory Base Physical Address, upper 32 bitsULONG pfdr_bar; //PFDR Memory Base Physical Address, lower 32 bitsULONG pfdr_exponent; //PFDR Power of 2 Size Exponent for PFDR_AR[SIZE]
} QMAN_INIT;
a. Select a power of 2 memory size for FQDs.qman_init.fqd_exponent = ilog2(QMAN_FQD_MEM_SIZE) – 1;
b. Assign 64-bit Physical Addressqman_init.fqd_bare = 0x00000000; //Still operating in 32bit spaceqman_init.fqd_bar = QMAN_FQD_MEM_BASE;
c. Write the base address into FQD_BARE and FQD_BAR registers.CCSR(FQD_BARE) = qman_init.fqd_bare;CCSR(FQD_BAR) = qman_init.fqd_bar;
d. Write the size and set the EN bit in the FQD_AR register.CCSR(FQD_AR) = FQD_AR_EN | qman_init.fqd_exponent;
e. Zero out the FQD memory. The Initialize FQ command relies on software clearing the FQD memory such that QMan will find all FQD to be in the Out Of Service state (State field = 0 in the FQD).for(i = 0; i < QMAN_FQD_MEM_SIZE; i+=CACHE_LINE_SIZE) {asm_dcbz((void*)(QMAN_FQD_MEM_BASE + i));//Perform a lwsync in order to ensure cache writes are completedasm volatile("lwsync");asm_dcbf((void*)(QMAN_FQD_MEM_BASE + i));
3. Set up Packed Frame Descriptor Record Memory (PFDR)This struct holds general BMan information for use during initialization.typedef struct qman_init_struct {
ULONG fqd_bare; //FQD Memory Base Physical Address, upper 32 bits
ULONG fqd_bar; //FQD Memory Base Physical Address, lower 32 bits
ULONG fqd_exponent; //FQD Power of 2 Size Exponent for FQD_AR[SIZE]
ULONG pfdr_bare; //PFDR Memory Base Physical Address, upper 32 bits
ULONG pfdr_bar; //PFDR Memory Base Physical Address, lower 32 bits
ULONG pfdr_exponent; //PFDR Power of 2 Size Exponent for PFDR_AR[SIZE]
} QMAN_INIT;
a. Select a power of 2 memory size for PFDRs. qman_init.pfdr_exponent = ilog2(QMAN_PFDR_MEM_SIZE) - 1;
b. Assign 64-bit Physical Addressqman_init.pfdr_bare = 0x00000000; //Still operating in 32bit spaceqman_init.pfdr_bar = QMAN_PFDR_MEM_BASE;
c. Write the base address into PFDR_BARE and PFDR_BAR registers. CCSR(PFDR_BARE) = qman_init.pfdr_bare;CCSR(PFDR_BAR) = qman_init.pfdr_bar;
d. Write the size and set the EN bit in the PFDR_AR register. CCSR(PFDR_AR) = PFDR_AR_EN | qman_init.pfdr_exponent;
e. Zero out the PFDR memory. for(i = 0; i < QMAN_PFDR_MEM_SIZE; i+=CACHE_LINE_SIZE) {asm_dcbz((void*)(QMAN_PFDR_MEM_BASE + i));//Perform a lwsync in order to ensure cache writes are completedasm volatile("lwsync");asm_dcbf((void*)(QMAN_PFDR_MEM_BASE + i));
4. Populate the QMan CoreNet Software Portal Base AddressWrite the base address into QCSP_BARE and QCSP_BAR registers.CCSR(QCSP_BARE) = 0x00000000; //Still operating in 32bit space
CCSR(QCSP_BAR) = QM_CENA_MEM_BASE;
5. Law space interrupt configurationTrigger interrupt when DQRR ring is non-empty, or when one or more frames are available for dequeue in Pool Channel 1.QCSP_CINH((QCSP_IER)) = 0x00024000;
Clear status register.QCSP_CINH((QCSP_ISR)) = 0x001FFFFF;
Disable all but DQRI bit and the pool channel bit.QCSP_CINH((QCSP_ISDR)) = 0xFFFDBFFF;
Do not inhibit interrupt sources.QCSP_CINH((QCSP_IIR)) = 0x00000000;
Set interrupt time out period. If interrupt thresholds are used this timeout is necessary to detect single FDs.QCSP_CINH((QCSP_ITPR)) = 0x00000001;
6. Work Queue Semaphore and Context Manager Registersa. WQ Class Scheduler Config.
WQ_CS_CFG0: Configuration for all software portal dedicated channels (10 channels).CCSR(WQ_CS_CFG0) = WQ_CS_ELEV | 0x00765432;WQ_CS_CFG1: Configuration for all software portal pool channels (15 channels).CCSR(WQ_CS_CFG1) = WQ_CS_ELEV | 0x00765432;WQ_CS_CFG2: Configuration for all dedicated channels used by FMan1 (DCP0, 12 channels).CCSR(WQ_CS_CFG2) = WQ_CS_ELEV | 0x00765432;WQ_CS_CFG3: Configuration for all dedicated channels used by FMan2 (DCP1, 12 channels).CCSR(WQ_CS_CFG3) = WQ_CS_ELEV | 0x00765432;WQ_CS_CFG4: Configuration for the dedicated channel used by SEC 4.0 (DCP2, 1 channel).CCSR(WQ_CS_CFG4) = WQ_CS_ELEV | 0x00765432;WQ_CS_CFG5: Configuration for the dedicated channel used by PME (DCP3, 1 channel).CCSR(WQ_CS_CFG5) = WQ_CS_ELEV | 0x00765432;
b. WQ Default Enqueue WQID RegisterDuring an enqueue operation, if QMan encounters a FQ whose destination WQ specifies an invalid (reserved) channel, the FQ will be enqueued onto the default WQ specified in this register, and an error interrupt will be asserted (if enabled). If an invalid WQID is written to this register, the register contents are cleared to 0, specifying WQ 0 of channel 0 as the default WQ. In this case send all misqueued FQs to Channel 0 WQID 5.CCSR(WQ_DEF_ENQ_WQID) = 0x00000005;
7. Direct Connect Portal Configurationa. Direct Enqueue Rejection Notifications to hardware for DCP0 and DCP1. These are the Direct
Connect Portals for FMan1 and FMan2. See DPAARM Rev 0 Section 6.4.11.5 Enqueue RejectionsCCSR(DCP0_CFG) = 0x00000100;CCSR(DCP1_CFG) = 0x00000100;
b. In DCP2 and DCP3, the ERN Destination bit is not present because the SEC 4.0 and PME blocks do not receive enqueue rejections. Rejected enqueues from these hardware blocks are always directed to software. For now send all ERNs to software portal 0.CCSR(DCP2_CFG) = QCSP0 & 0x1F;CCSR(DCP3_CFG) = QCSP0 & 0x1F;
8. CoreNet Initiator Scheduling ConfigurationInitially stash flow control will be disabled (SRCCIV = 0). However, this will need to be fixed for use with PAMU stashing. The initial credit value used here should match the number of stash snoop queue resources available in the processor core which will attempt to snarf the stash transactions.//CCSR(CI_SCHED_CFG) = 0x80000111;
7. Direct Connect Portal Configurationa. Direct Enqueue Rejection Notifications to hardware for DCP0 and DCP1. These are the
Direct Connect Portals for FMan1 and FMan2. See DPAARM Rev 0 Section 6.4.11.5 Enqueue RejectionsCCSR(DCP0_CFG) = 0x00000100;CCSR(DCP1_CFG) = 0x00000100;
b. In DCP2 and DCP3, the ERN Destination bit is not present because the SEC 4.0 and PME blocks do not receive enqueue rejections. Rejected enqueues from these hardware blocks are always directed to software. For now send all ERNs to software portal 0.CCSR(DCP2_CFG) = QCSP0 & 0x1F;CCSR(DCP3_CFG) = QCSP0 & 0x1F;
8. CoreNet Initiator Scheduling ConfigurationInitially stash flow control will be disabled (SRCCIV = 0). However, this will need to be fixed for use with PAMU stashing. The initial credit value used here should match the number of stash snoop queue resources available in the processor core which will attempt to snarfthe stash transactions.//CCSR(CI_SCHED_CFG) = 0x80000111;
9. SFDR Configuration RegisterThreshold value of '0' means no reservation, and SFDRs are allocated
on a first come first served basis.CCSR(SFDR_CFG) = 0x00000000;
10. PFDR Configuration and Low Water MarkAccording to DPAARM Rev 0 Section 6.3.4.19 PFDR Configuration (PFDR_CFG) this
should be 64.CCSR(PFDR_CFG) = 0x00000040;
Set Low Water Mark at 10% of the # of PFDRs. This can be used to trigger an interrupt due to PFDR overflow.CCSR(PFDR_FP_LWIT) = (ULONG) (NUM_OF_PFDR / 10);
11. Configure Software Portals via QCSP registersa. Assign each software portal to a different Stashing Request Queue (SRQ).
However, assign the 2 Direct Connect Portals to SRQ 0. See DPAARM Rev 0 Section 6.4.6.9 Stash Transaction Flow Control and SchedulingIf all stash transactions from QMan are intended for the cache within a single processor, then only one SRQ should be used and all software portals must be configured to use the same SRQ. If all 8 processor cores are intended to receive QMan stash transactions, then all 8 SRQ would be used, and each software portal would be configured to use the SRQ targeted at the processor core that will receive that portal's stash trnsactions.qman_portals[0].stash_dest = 0x00; //Software portal 0qman_portals[1].stash_dest = 0x01; //Software portal 1...qman_portals[8].stash_dest = 0x00; //Direct portalqman_portals[9].stash_dest = 0x00; //Direct portal
b. DQRR entry Logical I/O Device Number for CoreNet software portals.These are arbitrary but must be unique and consistent throughout the system.qman_portals[0].dliodn = QCSP0_DLIODN;...qman_portals[9].dliodn = QCSP9_DLIODN;
c. Frame data Logical I/O Device Number for CoreNet software portals. These are arbitrary but must be unique and consistent throughout the system.qman_portals[0].fliodn = QCSP0_FLIODN;...qman_portals[9].fliodn = QCSP9_FLIODN;
d. Write the actual registersCCSR(QCSP0_LIO_CFG) = (qman_portals[0].liodn << 16) | (qman_portals[0].dliodn);CCSR(QCSP0_IO_CFG) = (qman_portals[0].stash_dest << 16) | (qman_portals[0].fliodn);
e. Disable dynamic debug tracing.CCSR(QCSP0_DD_CFG) = 0;
11. Configure Software Portals via QCSP registersa. Assign each software portal to a different Stashing Request Queue (SRQ).
However, assign the 2 Direct Connect Portals to SRQ 0. See DPAARM Rev 0 Section 6.4.6.9 Stash Transaction Flow Control and SchedulingIf all stash transactions from QMan are intended for the cache within a single processor, then only one SRQ should be used and all software portals must be configured to use the same SRQ. If all 8 processor cores are intended to receive QMan stash transactions, then all 8 SRQ would be used, and each software portal would be configured to use the SRQ targeted at the processor core that will receive that portal's stash trnsactions.qman_portals[0].stash_dest = 0x00; //Software portal 0qman_portals[1].stash_dest = 0x01; //Software portal 1...qman_portals[8].stash_dest = 0x00; //Direct portalqman_portals[9].stash_dest = 0x00; //Direct portal
b. DQRR entry Logical I/O Device Number for CoreNet software portals.These are arbitrary but must be unique and consistent throughout the system.qman_portals[0].dliodn = QCSP0_DLIODN;...qman_portals[9].dliodn = QCSP9_DLIODN;
c. Frame data Logical I/O Device Number for CoreNet software portals. These are arbitrary but must be unique and consistent throughout the system.qman_portals[0].fliodn = QCSP0_FLIODN;...qman_portals[9].fliodn = QCSP9_FLIODN;
d. Write the actual registersCCSR(QCSP0_LIO_CFG) = (qman_portals[0].liodn << 16) | (qman_portals[0].dliodn);CCSR(QCSP0_IO_CFG) = (qman_portals[0].stash_dest << 16) | (qman_portals[0].fliodn);
e. Disable dynamic debug tracing.CCSR(QCSP0_DD_CFG) = 0;
b. Write MCP0 with 0x8 to add PFDR 8-16 to the list first. Note: the first 8 PFDRs are always reserved so 0x8 is the first available.CCSR(QMAN_MCP0) = 8;
c. Write the last PFDR ID. Note: the last 256 PFDRs indices are always reserved so 0xFF_FEFF is the last available. If the number of PFDRs used is greater than this last index then cap it. Subtract 1 from NUM_OF_PFDR since QMan counts from 0 and NUM_OF_PFDR is invalid.CCSR(QMAN_MCP1) = (NUM_OF_PFDR > 0x00FFFEFF) ? 0x00FFFEFF : (NUM_OF_PFDR - 1);
d. Issue 0x01 command to MCR to initialize PFDR Free Pool.CCSR(QMAN_MCR) = 0x01000000;
e. Wait for MCR to finish.do { mcr_result = CCSR(QMAN_MCR) >> 24;
13. Software Portal ConfigurationUse EQCR valid bit mode since it is straightforward and the EQCR entry tracking is implied via the valid bit of each EQCR (whichis written by software). Software must keep track of the EQCR index for each software portal. Set DQRR Max Fill to 15. Place DQRR in Push Mode. DQRR Consumption Notification Mode is CI write mode, cache- inhibited. MR is in CI write mode, cache-inhibited. Enable all stashing with high priority.#define DQRR_ENTRIES_PER_RING 16 //By design.
16. CCSR space error interrupt configurationCCSR(QMAN_ERR_ISR) = 0xFFFFFFFF; //Clear all bits
CCSR(QMAN_ERR_IER) = 0x3F810F0F; //Enable all interrupts
CCSR(QMAN_ERR_ISDR) = 0x00000000; //Do not disable interrupt sources
CCSR(QMAN_ERR_IIR) = 0x00000000; //Do not inhibit interrupts
CCSR(QMAN_ERR_HER) = 0x3F810F0F; //Halt on any error
17. Initialize Interrupt Threshold Registers for each portalThe Interrupt Threshold Registers for each portal will interrupt the core when the # of entries consumed is LESS THAN the # in ITR register (EQCR) or when the # of entries remaining is GREATER THAN the # in the ITR register (DQRR & MR). If zero then this interrupt mechanism is disabled and polling of the entries will need to be done by software.#define QMAN_EQCR_IT 0x00000000 //Will interrupt when the ring contains fewer than EQCR_IT entries
#define QMAN_DQRR_IT 0x00000007 //Will interrupt when the ring contains greater than DQRR_IT entries
#define QMAN_MR_IT 0x00000007 //Will interrupt when the ring contains greater than MR_IT entries
18. CCSR space error thresholdsCCSR(QMAN_ECSR) = 0x83010F0F; //Clear status register
CCSR(QMAN_SBET) = 0xFE000001; //Enable ECC with an error threshold of 10
19. Initialize the Frame Queues (FQ)The use of the Initialize FQ command relies on software clearing the FQD memory such that QMan will find all FQDs to be in the Out Of Service state (State field = 0 in the FQD). See DPAARM Rev 0 Section 6.4.8.5.1 Initialize Frame Queues (FQ).
•Direct Connect Portal 0 (FMan 1) Channels for egress traffic. The remaining DCP0 channels are determined via the Sub-portal ID offsets.#define QMAN_CHANNEL_DCP0 0x40
This function will take a FQ from the Out of Service state and place in a Scheduled state by performing an Initialize FQ command. From a Scheduled state these FQs will then be able to receive Enqueue commands. See DPAARM Rev 0 Section 6.4.1.5 Frame Queue State for the FQ state diagram. Note that count + 1 FQs in total will be scheduled, thus count = 0 means 1 FQ will be Scheduled. In eDINK, for ingress frames, each QCSP receives a bundle of 16k FQs. For egress frames, each dTSEC and 10G MAC of FMan1 and FMan2 receives a bundle of 16k FQs. For simplicity, all error checking code has been removed from the following, and it is assumed that only 1 FQ is released. See the full source code for a more robust solution.
STATUS QMan_FQ_Init(BYTE portal_num, ULONG fqid, int count) {QMAN_PORTAL *portal = &qman_portals[portal_num]; //The Software PortalQMAN_CR *cr = qman_cr[portal_num]; //The Command RegisterQMAN_RR *rr = (portal->rr_vbit) ? qman_rr1[portal_num] : qman_rr0[portal_num]; //The Response RegisterULONG wq_num = 0; //WQ#0 has the highest priority/*Option 1*/ BYTE chan_num = QMAN_CHANNEL_SP0; //For ingress frames go to QCSP0/*Option 2*/ BYTE chan_num = QMAN_CHANNEL_DCP0 + PORTID_1G_TX0_SUBP; //For egress frames go to FMan1 dTSEC1
asm_dcbz(cr); //Zero out a cache entry for the CR register.
cr->cmd.fq_init.we_mask = 0xFF //WE_MASK field -- Enable mask bits in order to update the FQD fields.cr->cmd.fq_init.fqid = fqid; //FQID fieldcr->cmd.fq_init.count = count; //COUNT fieldcr->cmd.fq_init.orpc = cr->cmd.fq_init.cgid = 0x00; //ORPC field and CGID fieldcr->cmd.fq_init.fq_ctrl = 0x0000; //FQ_CTRL fieldcr->cmd.fq_init.dest_wq.field.chan_num = chan_num; //DEST_WQ field DPAARM Rev 0 Table 6-106 ORs and shifts the chan_num and wq_numcr->cmd.fq_init.dest_wq.field.wq_num = wq_num; //to form the dest_wq field: dest_wq = (chan_num << 3) | (wq_num).cr->cmd.fq_init.ics_cred = cr->cmd.fq_init.td_thresh = 0; //ICS_CRED field and TD_THRESH fieldcr->cmd.fq_init.context_a = 0; //CONTEXT_A field cr->cmd.fq_init.context_b = (ULONG) PORT1; //CONTEXT_B field. Can be used to attach custom tag or critical info.
asm volatile("lwsync"); //Perform a lwsync in order to ensure cache writes are completed.
cr->verb.field.command = FQ_INIT_SCHEDULE; //VERB field -- Write word 0 (which contains the command VERB and valid bit)cr->verb.field.vbit = portal->rr_vbit; //VERB field -- Write word 0 (which contains the command VERB and valid bit)
asm_dcbf((void*) cr); //dcbf to flush the command from the cache to QMan.
Bring in the RR response in to the cache. Poll here until the CR has completed and the subsequent RR register has been updated.do {asm_dcbi(rr); asm volatile("lwsync");} while(rr->verb.full == RR_IN_PROGRESS);portal->rr_vbit = (~portal->rr_vbit) & 0x01; //Flip the RR Valid Bit
This function will poll the DQRR ring of the requesting portal and return a pointer to a DQRR entry if such an entry exists, is valid, and is ready to be dequeued by software.
QMAN_DQRR_ENTRY* QMan_FQD_Dequeue(ULONG portal_num) {QMAN_PORTAL* portal = &qman_portals[portal_num]; //The Software PortalQMAN_DQRR_ENTRY* entry = NULL; //The return pointer to the dequeued DQRR entryBYTE pi = 0, ci = 0; //Temporary copies of the SW indices
Make sure that the SW copies of the PI/CI indices match the HW copies.QMan_Sync(portal_num);Grab software copies of the current PI and CI indices and valid bits.pi = portal->dqrr_pi;ci = portal->dqrr_ci;
If the Consumer Index is not equal to the Producer Index then there is at least one FQD to dequeue.if(ci != pi) {
Update the HW and SW copies of the Consumer Index.portal->dqrr_ci = ci;QCSP_CINH((QCSP_DQRR_CI_CINH) | (portal_num * QCSP_CINH_BASE_SPACE)) = ci;
} //end if
Check for error bits in the status word.stat = entry->stat.full;fqid = entry->fqid & 0x00FFFFFF;//Make sure that the FQ dequeued is valid.if(!(DQRR_STAT_VALID & stat) || (DQRR_STAT_EXPIRED & stat)) {
PRINT("\nFAILURE: [QMAN] The dequeue operation returned an INVALID or EXPIRED status.\n");return(NULL);
This function accepts a pointer to a Frame Descriptor (FD) and enqueues this FD to QMan via the EQCR ring and the ENQUEUE command.
STATUS QMan_FD_Enqueue(ULONG portal_num, QMAN_FD *fd, ULONG fqid) {QMAN_PORTAL* portal = &qman_portals[portal_num]; //The Software PortalQMAN_EQCR_ENTRY* entry = NULL; //The EQCR entry to write toBYTE pi = 0, ci = 0, pi_vbit = 0, ci_vbit = 0; //Temporary copies of the SW indices and Valid Bits
Make sure that the SW copies of the PI/CI indices matches the HW copies.QMan_Sync(portal_num);Grab software copies of the current PI and CI indices and valid bitspi = portal->eqcr_pi; ci = portal->eqcr_ci;pi_vbit = portal->eqcr_pi_vbit; ci_vbit = portal->eqcr_ci_vbit;
Since the PI and CI indexes begin at 0, then the EQCR ring being full is indicated by PI - CI >= (EQCR_ENTRIES_PER_RING - 1). The ring is also determined to be full if PI = CI and the PI/CI valid bits are different.if( (pi - ci >= (EQCR_ENTRIES_PER_RING - 1)) || ((pi == ci) && (pi_vbit != ci_vbit)) ) return(FAILURE);
entry = &qman_eqcr_rings[portal_num]->entries[pi]; //Assign the entry address to mapped QMan space.asm_dcbz(entry);
entry->dca = entry->seqnum = entry->orp = 0; //For simplicity we do not use Order Restoration or DCA.entry->fqid = fqid & 0xFFFFFF;entry->tag = 0xBADEFACE;Assign Frame Descriptor fields based on passed in pointer to FD.entry->fd.fd.fields.dd = fd->fd.fields.dd; entry->fd.fd.fields.pid = fd->fd.fields.pid;entry->fd.fd.fields.bpid = fd->fd.fields.bpid; entry->fd.fd.fields.addr_hi = fd->fd.fields.addr_hi;entry->fd.fd.fields.addr_lo = fd->fd.fields.addr_lo; entry->fd.fd.fields.format = fd->fd.fields.format;entry->fd.fd.fields.offset = fd->fd.fields.offset; entry->fd.fd.fields.length = fd->fd.fields.length;entry->fd.fd.fields.status_cmd = fd->fd.fields.status_cmd; entry->fd.fd.fields.liodn_offset = fd->fd.fields.liodn_offset
asm volatile(“lwsync”);entry->verb.field.vbit = pi_vbit;entry->verb.field.command = EQCR_ENQUEUE;asm_dcbf((void*) entry); //Flush the entry from the cache out to QMan space
portal->eqcr_pi = (portal->eqcr_pi + 1) % EQCR_ENTRIES_PER_RING; //Increment the software copy of the Producer Index
The EQCR valid bit should only flip after 7 entries of the EQCR ring have been used and we wrap around to entry 0.if(portal->eqcr_pi == 0)
• Production notification is through DQRR_PI and an alternating polarity valid bit in each entry− As QMan places new entries in the ring it asserts a “valid” bit in the entry− The asserted or “valid” polarity (‘0’ or ‘1’) changes each pass through the ring− Software can poll valid bits with very low overhead and latency− Valid bit mode must be used when DQRR stashing is being used
• Software can notify QMan that it has consumed data by writing DQRR_CI or using Discrete Consumption Acknowledgment (DCA) with enqueuecommands− Both cache-inhibited and cache-enabled versions of DQRR_CI are supported− DCA is optimized for datapath when frames will be RXed, processed, then
forwarded− Each dequeue has a corresponding enqueue− It is also important for “order restoration and atomicity” since it allows software
to explicitly indicate that it is finished processing a frame− DCA updates DQRR_CI: QMan “score boards” acknowledged entries and only
changes DQRR_CI when contiguous entries at the “head” of the FIFO have been acknowledged
• Software can notify QMan that enqueues are available by writing EQCR_PI or using a non-zero verb with valid bit−Both cache enabled and cache inhibited versions of EQCR_PI
exist−Polarity of valid bit alternates each time through the ring−Software must flush entries out of its cache when using valid
bits so that QMan sees them• QMan notifies software that it has consumed entries by
updating EQCR_CI−EQCR_CI is available in cache enabled and cache inhibited
//QMan_FQ_Query()////Inputs: ULONG portal_num -- The software portal ID that is quering the FQs// ULONG fqid -- The first FQ ID in the queue// ULONG num_to_query -- The # of FQs to Query, starting with fqid.// BYTE query_verb -- The verb to apply to each FQ////Returns: SUCCESS/FAILURE////This function will issue a QUERY command for a specified FQ and print results of the query.//***************************************************************************/ STATUS QMan_FQ_Query(BYTE portal_num, ULONG fqid, ULONG num_to_query, BYTE query_verb)//***************************************************************************/
//QMan_FQ_Alter()////Inputs: ULONG portal_num -- The software portal ID that is altering the FQs// ULONG fqid -- The first FQ ID in the queue// ULONG num_to_alter -- The # of FQs to Alter, starting with fqid.// BYTE alter_verb -- The verb to apply to each FQ////Returns: SUCCESS/FAILURE//This function will take a set of FQs and issue a FQ Alter command on them. //See DPAARM Rev 0 Section 6.4.1.5 for the FQ state diagram.//***************************************************************************/STATUS QMan_FQ_Alter(BYTE portal_num, ULONG fqid, ULONG num_to_alter, BYTE alter_verb)//***************************************************************************/
//QMan_Sync()////Inputs: BYTE portal_num -- The portal ID of the calling portal////Returns: SUCCESS////This function realigns the SW copies of the Producer and Consumer Indices//with those found in HW. It also realigns the PI/CI Current Valid Bit polarities.//These are the indexes that are used to Enqueue and Dequeue FQDs to QMan.//***************************************************************************/ STATUS QMan_Sync(BYTE portal_num)
1. [The CR, DQRR, EQCR, and MR Valid Bit polarities are ‘1’ after reset] – Go to slide2. [Software must track the EQCR, DQRR, and MR Producer and Consumer Indices separately from hardware] –
Go to slide3. [Each software portal can dequeue from its own dedicated channel, or from 1 of the 15 pool channels] – Go to
slide4. [The FQD space must be a power of 2] – Go to slide5. [Software must zero out the FQD memory for use with the Initialize FQ command] – Go to slide6. [The PFDR space must be a power of 2] – Go to slide7. [Software should zero out the PFDR memory] – Go to slide8. [To notify a core when a frame is available enable an interrupt when the DQRR ring is non-empty] – Go to slide9. [If interrupt thresholds are used then an interrupt time out period is necessary] – Go to slide10. [FQs whose destination WQ specifies an invalid channel will be enqueued onto the default WQ ] – Go to slide11. [ERNs for the FMan Direct Connect Portals should be sent to hardware] – Go to slide12. [ERNs for the SEC and PME Direct Connect Portals should be sent to a software portal] – Go to slide13. [The PFDR Low Water Mark can be used to trigger an interrupt due to PFDR overflow] – Go to slide14. [Stash transactions for each core require a unique Stash Request Queue ID] – Go to slide15. [DLIODNs for software portals must be unique and consistent throughout the system] – Go to slide16. [FLIODNs for software portals must be unique and consistent throughout the system] – Go to slide17. [The first 8 PFDR indices are always reserved so 0x8 is the first available] – Go to slide18. [The last 256 PFDR indices are always reserved so 0xFF_FEFF is the last available] – Go to slide
• Before PAMU drivers are developed, the PAMU can be placed in bypass mode. Note that stashing will not work when in bypass mode. This can lead to severe performance degradation. Part 2 covers PAMU usage.
• The Bypass Enable Register (PAMUBYPENR) indicates whether PAMU should be placed in bypass mode or not. This 32-bit value has a bit for each of the 16 PAMUs available to be placed in bypass mode.
• Note that on P4080 there are only 5 implemented PAMUs, however, this register has room for expansion of up to 16.
#define PAMU_BYPASS 0xFFFF0000 //Place all 16 PAMUs in bypass modeCCSR(PAMUBYPENR) = PAMU_BYPASS;
• Each FD has a format field that must be read to determine the frame type.−Short Single− Long Single−Short Multiple− Long Multiple−Compound−P4080 only supports SS and SM
• Any frame that is larger than any of your BMan Buffer Pools must be broken in to multiple buffers. This will create a Short Multiple Frame type. Software must piece together these buffers.
• Is Configuration Data mandatory for Normal Mode?− Strictly speaking no, but it is standard in Freescale BSP releases
and is used under Uboot and Linux® OS. For customers using Normal Mode, the default Configuration Data will be provided by Freescale.
• Is Configuration Data mandatory for Independent Mode and/or Coarse Classification?− Yes.
• As a customer, do I need to translate 1900+ pages of DPAA chapters into code?− No!− There are many resources available to customers to save from
reinventing the wheel.Freescale’s reference code (SDK, eDINK)Training material Expanded DPAA init documentation (P4080RM Rev G Chapter 30 DPAA Configuration and Initialization translated in to separate DPAARM Rev 0)In house Apps expertise for DPAA
Session materials will be posted @ www.freescale.com/FTF
Look for announcements in the FTF Group on LinkedIn or follow Freescale on Twitter
• We’ve learned about the basic components of the DPAA.• We’ve learned how to configure BMan, QMan, FMan, and PAMU at
a register level.• We’ve learned about key caveats when working with the DPAA.• We’ve made the connection between high level theoretical workings
of the DPAA with real world low level implementation details.• For more advanced topics attend sessions
− FTF-NET-F0526Programming the Data Path Acceleration Architecture (DPAA) Drivers on QorIQ Communications Processors (Part 2): Optimization Techniques
− FTF-NET-F0414Programming the Data Path Acceleration Architecture (DPAA) Drivers on QorIQ Communications Processors (Part 3): Program and Use the Parser and Keygen
Advanced Multiprocessing (AMP) Series enables new levels of performance through intelligent integration balanced with a focus on power efficiency
AMP Up Performance – Get 4x performance improvement with efficient, high-performance cores and application accelerators
Leading Performance, Power and Scalability
Save Power – Reduce system power using advanced 28nm process technology and cascading power management
Ultimate Scalability – Single to 24 core virtual machines, the QorIQ portfolio offers a broad range of supported applications and software compatibility