Top Banner
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas , Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute of Nuclear Physics LECC, 13.-17. Sept. 2004, Boston
17

Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

Jan 19, 2016

Download

Documents

taite

Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links. Stefan Haas , Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute of Nuclear Physics LECC, 13.-17. Sept. 2004, Boston. Outline. Introduction Interface Card Hardware Firmware Description - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

Design and Performance of a PCI Interface with four 2 Gbit/s

Serial Optical Links

Stefan Haas, Markus Joos

CERN

Wieslaw Iwanski

Henryk Niewodnicznski Institute of Nuclear Physics

LECC, 13.-17. Sept. 2004, Boston

Page 2: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 2 -

Outline

● Introduction● Interface Card Hardware● Firmware Description● Software● Performance Measurements● Summary

Page 3: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 3 -

Introduction

● DAQ systems for current and future experiments depend on reliable high-speed data transmission

● S-LINK specification addresses this type of application:► Point-to-point data link, bandwidth 160 MB/s (32-bit @ 40 MHz)► Flow control (XON/XOFF)► Error detection (e.g. CRC), ► Self-test mode & return line signals► CMC mezzanine card format

● ATLAS Read-Out Link (ROL)► ROL implementation is based on S-LINK ► Connects front-end electronics interface modules (Read-Out

Drivers) to the Read-Out system (ROS)► ROS is based on commodity PCs and custom PCI interface

cards (ROBin)► ~1650 ROLs will be used in ATLAS

Page 4: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 4 -

ROL Source Card

● High-speed Optical Link for ATLAS (HOLA)● Standard S-LINK mezzanine card● Industry standard pluggable (SFP) 850nm F/O transceiver● Serial link speed 2 Gb/s with 8B10B line encoding● Low-power: ~2W typical

SERDES

S-LINK Protocol

FPGA

CMC mezzanine Connector

160MB/s32bit @ 40MHz

Cage for SFP F/O Transceiver

Page 5: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 5 -

Quad S-LINK PCI Interface (FILAR)

● FILAR Features:► Four 2 Gb/s HOLA link channels integrated on-board► 64-bit/66MHz PCI interface (3.3V slots only)► Move data between 4 link interfaces and the host PC memory► Based on S32PCI64 interface design: one slot for S-LINK

mezzanine card

● Applications: small readout systems for lab & test beam

● FPGA-based (in-system reconfigurable) ► PCI I/F implemented using a commercial PCI IP core

● Firmware versions:► Quad S-LINK receiver (S-LINK to PCI)► Quad S-LINK transmitter (PCI to S-LINK) ► Quad S-LINK data source (for performance measurements)

Page 6: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 6 -

FILAR Hardware

HOLA Interface FPGASFP Fiber Optic Transceiver SERDES

PCI Interface FPGA 64-bit/66MHz PCI interface

3.3V only(!)

Page 7: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 7 -

Receiver Firmware Operation

● Host processor:► 1) Fills a request FIFO on the

interface card with addresses of free memory buffer pages

► 5) Reads the results from the acknowledge FIFO and processes the data

● Interface card:► 2) Transfers data fragments

from S-LINK to host memory as bus master using PCI bursts of up to 1kB for maximum performance

► 3) Stores length, status and control words for received fragments in an acknowledge FIFO

► 4) Asserts an interrupt (optional)

● Protocol overhead of ~2 PCI single-cycles (SC) per data fragment and channel:

► Write address of buffer memory page► Read length and status of received fragment

Page 8: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 8 -

Receiver Firmware Block Diagram

BACKENDCONTROL

LOGIC

BACKENDCONTROL

LOGIC

CONTROL&

STATUSREGISTERS

CONTROL&

STATUSREGISTERS

64-BIT

PCI

IP

CORE

64-BIT

PCI

IP

CORE

DMAENGINE

66 MHz64-bitPCI

66 MHz64-bitPCI

INPUTBUFFER

FIFO

PCIBURST

FIFO

REQUESTFIFO

S-LINK

ACK.FIFO

REQUESTFIFO

REQUESTFIFO

REQUESTFIFO

ACK.FIFOACK.FIFOACK.FIFO

S-LINKS-LINKS-LINK

INPUTBUFFER

FIFO

INPUTBUFFER

FIFO

INPUTBUFFER

FIFO

PCIBURST

FIFO

PCIBURST

FIFO

PCIBURST

FIFO

528MB/s

Page 9: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 9 -

Firmware Optimization

● Single-cycles do not use the PCI bus efficiently● Performance optimized version receiver firmware was

developed (DMA protocol firmware): ► Interface card transfers request and acknowledge data using

DMA► CPU prepares a descriptor block with buffer addresses for one

or more channels in system memory► Firmware fetches the block using DMA and fills the on-board

request FIFOs► Firmware transfers a block with the length and status

information from the acknowledge FIFOs to the system memory using DMA when a threshold is reached

► Requires additional memory resources in the FPGA, only 3 receive channels can be implemented on the current hardware

● Reduces PCI bus overhead and CPU load

Page 10: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 10 -

Software

● FILAR software package:► Linux device driver (loadable module)► Library provides easy to use programming API for applications ► Test and benchmarking programs

● Software written in C● Separate drivers for the different receiver firmware versions● Supports multiple channels & PCI cards● Interrupt driven: device driver is called when a predefined number

of fragments are available in any channel● Code optimised for maximising throughput

► Manage the card with minimal attention from the application layer► Reduce the number of context switches

● Fully integrated into the ATLAS DataFlow software● Requires cmem driver/library for allocation of contiguous memory ● Similar package available for the transmitter firmware

Page 11: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 11 -

Measurement Setup

● PC with Supermicro server motherboard (ServerWorks GC-LE chipset)

● 4 independent 64-bit PCI bus segments

● Intel Xeon CPU (3 GHz)● S-LINK input channels driven

by HOLA data sources

I/O BRIDGEI/O BRIDGE

64-bit PCI66MHz

FILARFILARFILARFILAR

I/O BRIDGEI/O BRIDGE

FILARFILAR

NORTHBRIDGE

NORTHBRIDGE

CPU(XEON3GHz)

CPU(XEON3GHz)

Memory(DDR 266)

Memory(DDR 266)

HOLAS-LINK

I/O BRIDGEI/O BRIDGE

64-bit PCI66MHz

FILARFILARFILARFILAR

I/O BRIDGEI/O BRIDGE

FILARFILAR

NORTHBRIDGE

NORTHBRIDGE

CPU(XEON3GHz)

CPU(XEON3GHz)

Memory(DDR 266)

Memory(DDR 266)

HOLAS-LINK

● Chipset architecture is important to obtain the maximum performance

Page 12: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 12 -

Performance: Single-Cycle Firmware

● FILAR receiver with SC firmware

● Sawtooth structure due to overhead for setting up a PCI burst (1kB)

● Performance for one channel is limited by link bandwidth

● Throughput with 3 channels is limited by PCI interface

● Maximum throughput is ~450MB/s

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ag

gre

gat

e T

hrou

ghp

ut [

Mby

te/s

]

1 Chan SC 2 Chan SC 3 Chan SC

145MB/s per channel

187MB/s per channel

360MB/s @ 1kB

Page 13: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 13 -

Performance: DMA Protocol Firmware

● FILAR receiver with DMA firmware

● Better performance than SC firmware, in particular for short fragments

● 25% improvement for 3 channels at 1kB fragment length

● Performance for long fragments is similar for both firmware versions

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ag

gre

gat

e T

hro

ug

hp

ut

[Mb

yte/

s]

1 Chan DMA 2 Chan DMA 3 Chan DMA

440MB/s @ 1kB

Page 14: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 14 -

Throughput: Multiple FILAR cards

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ba

nd

wid

th [

Mb

yte

/s]

3 Chan. 6 Chan. 8 Chan.

● DMA protocol F/W● Maximum throughput

of 1.1GB/s with 3 receiver cards

● Throughput scales with the number of channels for fragments of 2kB and more

● For fragments of 500B and less the system is rate limited147MB/s per channel

145MB/s per channel

140MB/s per channel

Page 15: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 15 -

Fragment Rate: Multiple FILAR cards

0

50

100

150

200

250

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Fragment Length [byte]

Fra

gm

ent

Fre

qu

ency

[kH

z]

3 Chan. 6 Chan. 8 Chan..

● Received data fragment frequency per channel vs. fragment length

● Fragment rates of 100kHz can be sustained with 3 cards for fragments of less than 1kB

100kHz @ 1kB

Page 16: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 16 -

S-LINK Transmitter Performance

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Fragment size [byte]

Agg

rega

te T

hrou

ghpu

t [M

byte

/s]

1 Channel 2 Channels 3 Channels

● Transmitter connected to a FILAR receiver in another PC

● PCI interface is saturated with 2 active channels

● Maximum throughput obtained is 360MB/s

● PCI memory read performance is not as good as write

Page 17: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links

S. Haas, 14. Sept. '04LECC 2004 - 17 -

Summary

● FILAR high-performance PCI interface card with 4 on-board 2 Gb/s S-LINK channels (HOLA) has been designed

● Quad S-LINK receiver, transmitter and data source firmware versions have been developed and optimized

● Software package with Linux device driver and API library are integrated in the ATLAS DataFlow software

● Maximum throughput for one receiver card ~450MB/s● Aggregate data rate of > 1GB/s to system memory has been

measured with 3 receiver cards● Event rates of over 100kHz can be achieved for 1kB fragments● FILAR applications and users:

► Test readout of front-end electronics interface modules► ATLAS subdetector groups (LAr, SCT, TileCal, TRT, Pixel, LVL1 Calo),

DAQ & ROBin, MDT chamber tests► Readout system for the ATLAS combined test beam► Stable design, ~50 cards produced so far