Top Banner
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas , Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute of Nuclear Physics LECC, 13.-17. Sept. 2004, Boston
17

Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

Jan 02, 2016

Download

Documents

Pierce Hart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

Design and Performance of a PCI Interface with four 2 Gbit/s

Serial Optical Links

Stefan Haas, Markus Joos

CERN

Wieslaw Iwanski

Henryk Niewodnicznski Institute of Nuclear Physics

LECC, 13.-17. Sept. 2004, Boston

Page 2: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 2 -

Outline

● Introduction● Interface Card Hardware● Firmware Description● Software● Performance Measurements● Summary

Page 3: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 3 -

Introduction

● DAQ systems for current and future experiments depend on reliable high-speed data transmission

● S-LINK specification addresses this type of application:► Point-to-point data link, bandwidth 160 MB/s (32-bit @ 40 MHz)► Flow control (XON/XOFF)► Error detection (e.g. CRC), ► Self-test mode & return line signals► CMC mezzanine card format

● ATLAS Read-Out Link (ROL)► ROL implementation is based on S-LINK ► Connects front-end electronics interface modules (Read-Out

Drivers) to the Read-Out system (ROS)► ROS is based on commodity PCs and custom PCI interface

cards (ROBin)► ~1650 ROLs will be used in ATLAS

Page 4: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 4 -

ROL Source Card

● High-speed Optical Link for ATLAS (HOLA)● Standard S-LINK mezzanine card● Industry standard pluggable (SFP) 850nm F/O transceiver● Serial link speed 2 Gb/s with 8B10B line encoding● Low-power: ~2W typical

SERDES

S-LINK Protocol

FPGA

CMC mezzanine Connector

160MB/s32bit @ 40MHz

Cage for SFP F/O Transceiver

Page 5: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 5 -

Quad S-LINK PCI Interface (FILAR)

● FILAR Features:► Four 2 Gb/s HOLA link channels integrated on-board► 64-bit/66MHz PCI interface (3.3V slots only)► Move data between 4 link interfaces and the host PC memory► Based on S32PCI64 interface design: one slot for S-LINK

mezzanine card

● Applications: small readout systems for lab & test beam

● FPGA-based (in-system reconfigurable) ► PCI I/F implemented using a commercial PCI IP core

● Firmware versions:► Quad S-LINK receiver (S-LINK to PCI)► Quad S-LINK transmitter (PCI to S-LINK) ► Quad S-LINK data source (for performance measurements)

Page 6: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 6 -

FILAR Hardware

HOLA Interface FPGASFP Fiber Optic Transceiver SERDES

PCI Interface FPGA 64-bit/66MHz PCI interface

3.3V only(!)

Page 7: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 7 -

Receiver Firmware Operation

● Host processor:► 1) Fills a request FIFO on the

interface card with addresses of free memory buffer pages

► 5) Reads the results from the acknowledge FIFO and processes the data

● Interface card:► 2) Transfers data fragments

from S-LINK to host memory as bus master using PCI bursts of up to 1kB for maximum performance

► 3) Stores length, status and control words for received fragments in an acknowledge FIFO

► 4) Asserts an interrupt (optional)

● Protocol overhead of ~2 PCI single-cycles (SC) per data fragment and channel:

► Write address of buffer memory page► Read length and status of received fragment

Page 8: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 8 -

Receiver Firmware Block Diagram

BACKENDCONTROL

LOGIC

BACKENDCONTROL

LOGIC

CONTROL&

STATUSREGISTERS

CONTROL&

STATUSREGISTERS

64-BIT

PCI

IP

CORE

64-BIT

PCI

IP

CORE

DMAENGINE

66 MHz64-bitPCI

66 MHz64-bitPCI

INPUTBUFFER

FIFO

PCIBURST

FIFO

REQUESTFIFO

S-LINK

ACK.FIFO

REQUESTFIFO

REQUESTFIFO

REQUESTFIFO

ACK.FIFOACK.FIFOACK.FIFO

S-LINKS-LINKS-LINK

INPUTBUFFER

FIFO

INPUTBUFFER

FIFO

INPUTBUFFER

FIFO

PCIBURST

FIFO

PCIBURST

FIFO

PCIBURST

FIFO

528MB/s

Page 9: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 9 -

Firmware Optimization

● Single-cycles do not use the PCI bus efficiently● Performance optimized version receiver firmware was

developed (DMA protocol firmware): ► Interface card transfers request and acknowledge data using

DMA► CPU prepares a descriptor block with buffer addresses for one

or more channels in system memory► Firmware fetches the block using DMA and fills the on-board

request FIFOs► Firmware transfers a block with the length and status

information from the acknowledge FIFOs to the system memory using DMA when a threshold is reached

► Requires additional memory resources in the FPGA, only 3 receive channels can be implemented on the current hardware

● Reduces PCI bus overhead and CPU load

Page 10: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 10 -

Software

● FILAR software package:► Linux device driver (loadable module)► Library provides easy to use programming API for applications ► Test and benchmarking programs

● Software written in C● Separate drivers for the different receiver firmware versions● Supports multiple channels & PCI cards● Interrupt driven: device driver is called when a predefined number

of fragments are available in any channel● Code optimised for maximising throughput

► Manage the card with minimal attention from the application layer► Reduce the number of context switches

● Fully integrated into the ATLAS DataFlow software● Requires cmem driver/library for allocation of contiguous memory ● Similar package available for the transmitter firmware

Page 11: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 11 -

Measurement Setup

● PC with Supermicro server motherboard (ServerWorks GC-LE chipset)

● 4 independent 64-bit PCI bus segments

● Intel Xeon CPU (3 GHz)● S-LINK input channels driven

by HOLA data sources

I/O BRIDGEI/O BRIDGE

64-bit PCI66MHz

FILARFILARFILARFILAR

I/O BRIDGEI/O BRIDGE

FILARFILAR

NORTHBRIDGE

NORTHBRIDGE

CPU(XEON3GHz)

CPU(XEON3GHz)

Memory(DDR 266)

Memory(DDR 266)

HOLAS-LINK

I/O BRIDGEI/O BRIDGE

64-bit PCI66MHz

FILARFILARFILARFILAR

I/O BRIDGEI/O BRIDGE

FILARFILAR

NORTHBRIDGE

NORTHBRIDGE

CPU(XEON3GHz)

CPU(XEON3GHz)

Memory(DDR 266)

Memory(DDR 266)

HOLAS-LINK

● Chipset architecture is important to obtain the maximum performance

Page 12: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 12 -

Performance: Single-Cycle Firmware

● FILAR receiver with SC firmware

● Sawtooth structure due to overhead for setting up a PCI burst (1kB)

● Performance for one channel is limited by link bandwidth

● Throughput with 3 channels is limited by PCI interface

● Maximum throughput is ~450MB/s

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ag

gre

gat

e T

hrou

ghp

ut [

Mby

te/s

]

1 Chan SC 2 Chan SC 3 Chan SC

145MB/s per channel

187MB/s per channel

360MB/s @ 1kB

Page 13: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 13 -

Performance: DMA Protocol Firmware

● FILAR receiver with DMA firmware

● Better performance than SC firmware, in particular for short fragments

● 25% improvement for 3 channels at 1kB fragment length

● Performance for long fragments is similar for both firmware versions

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ag

gre

gat

e T

hro

ug

hp

ut

[Mb

yte/

s]

1 Chan DMA 2 Chan DMA 3 Chan DMA

440MB/s @ 1kB

Page 14: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 14 -

Throughput: Multiple FILAR cards

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Length [byte]

Ba

nd

wid

th [

Mb

yte

/s]

3 Chan. 6 Chan. 8 Chan.

● DMA protocol F/W● Maximum throughput

of 1.1GB/s with 3 receiver cards

● Throughput scales with the number of channels for fragments of 2kB and more

● For fragments of 500B and less the system is rate limited147MB/s per channel

145MB/s per channel

140MB/s per channel

Page 15: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 15 -

Fragment Rate: Multiple FILAR cards

0

50

100

150

200

250

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Fragment Length [byte]

Fra

gm

ent

Fre

qu

ency

[kH

z]

3 Chan. 6 Chan. 8 Chan..

● Received data fragment frequency per channel vs. fragment length

● Fragment rates of 100kHz can be sustained with 3 cards for fragments of less than 1kB

100kHz @ 1kB

Page 16: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 16 -

S-LINK Transmitter Performance

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Fragment size [byte]

Agg

rega

te T

hrou

ghpu

t [M

byte

/s]

1 Channel 2 Channels 3 Channels

● Transmitter connected to a FILAR receiver in another PC

● PCI interface is saturated with 2 active channels

● Maximum throughput obtained is 360MB/s

● PCI memory read performance is not as good as write

Page 17: Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

S. Haas, 14. Sept. '04LECC 2004 - 17 -

Summary

● FILAR high-performance PCI interface card with 4 on-board 2 Gb/s S-LINK channels (HOLA) has been designed

● Quad S-LINK receiver, transmitter and data source firmware versions have been developed and optimized

● Software package with Linux device driver and API library are integrated in the ATLAS DataFlow software

● Maximum throughput for one receiver card ~450MB/s● Aggregate data rate of > 1GB/s to system memory has been

measured with 3 receiver cards● Event rates of over 100kHz can be achieved for 1kB fragments● FILAR applications and users:

► Test readout of front-end electronics interface modules► ATLAS subdetector groups (LAr, SCT, TileCal, TRT, Pixel, LVL1 Calo),

DAQ & ROBin, MDT chamber tests► Readout system for the ATLAS combined test beam► Stable design, ~50 cards produced so far