1 Network Processors for a 1 MHz Trigger-DAQ System RT2003, Montreal Artur Barczyk, Jean-Pierre Dufey, Beat Jost and Niko Neufeld CERN-EP & Université de Lausanne
Jan 14, 2016
1
Network Processors for a 1 MHz
Trigger-DAQ System
RT2003, MontrealArtur Barczyk, Jean-Pierre Dufey,
Beat Jost and Niko Neufeld
CERN-EP & Université de Lausanne
Niko NEUFELDCERN, EP
2
Network Processors
•Developed for high-end routers, on the market since 1999
•Dedicated processors optimised for high speed packet processing
•Large I/O capabilities (up to 10 Gigabit/s), and up to 10 Mp/s
•Large and fast buffer memories
•Software programmable
Use them in a network based DAQ system, wherever PCs can’t do it (easily)
Niko NEUFELDCERN, EP
3
Anatomy of a Network Processor
SearchEngine
ProcessorComplex
GeneralPurpose
CPU
Scheduler
HWAssist
Routing and Bridging Tables
Packet BufferMemory
MAC/FRAMEProcessor
To and From PHYs
Control and Monitoring
Integrated Network Interfaces
On-chip Memory + Interfaces for external memories
Multiple RISC processor cores
Several hardware threads
Coprocessors for many common networking tasks
Niko NEUFELDCERN, EP
4
NP module as PCI card
• All infrastructure to operate one IBM PowerNP NP4GS3
• 3 x 1000 BaseT ports
• One port converted into PCI, for development purposes
• 2 NPs can be connected via special cable
• Build by S3 corp., Ireland
Niko NEUFELDCERN, EP
5
Network Processors in a 1 MHz DAQ
MultiplexingLayer
FE FE FE FE FE FE FE FE FE FE FE FE
Switch Switch
NP NP NP
NP NP NP NP
SFC SFC SFC SFC SFC SFC
125-239Links
1.1 MHz8.8-16.9 GB/s
349Links
40 kHz2.3 GB/s
30 Switches
24 NPs77-135 NPs
77-135 Links6.4-13.6 GB/s
24 Links1.5 GB/s
73-140 Links7.9-15.1 GB/s
50-100 SFCs
37-70 NPs
Front-end Electronics
EventBuilder
50-100 Links5.5-10 GB/s
TRM
Sorter
TFCSystem
Readout NetworkL1-Decision
Switch Switch Switch
Multi-stages switching Network
Decision Sorting
Event Building
Frame Merging
Niko NEUFELDCERN, EP
6
EventBuilderEvent
Builder
Input Output
RU/FEM Application
EventBuilderEvent
Builder
Input Output
RU/FEM Application
EventBuilderEvent
Builder
Input OutputEB ApplicationEvent
BuilderEvent
Builder
Input OutputEB Application
Frame Merging
Works up to 4 MHz of incoming packets A. Barczyk’s presentation
Works for at least 2 x 100 MB streams
Niko NEUFELDCERN, EP
7
Frame Merging
•Helps to optimise link usage
•Reduces number of links into readout network
•Can do re-formatting of data – e.g. protocol adaptation (raw Ethernet IP)
•Can change Maximum Transmission Unit (MTU)– some Ethernet segments provide payload >
1500 bytes
•Reduce packet rate at output - important for receiving PCs (interrupt rate!)
Niko NEUFELDCERN, EP
8
Building your own switching network from NP
modules•Using NP modules gives you full freedom
in doing the switching
•Large output buffers
•Disadvantage:– Module has only eight ports (otherwise switch
chip is needed) need a large number of modules to build a big network
•Solution:– Use optimised connection topologies to reduce
number of elementary modules, while keeping the load on interconnecting links acceptable
Niko NEUFELDCERN, EP
9
Network Topologies
S
SS
S
S
S
S
D D D
D
DD
DD
S
SS
S
S
S
S
D D D
D
DD
DD
• Banyan Topology • “Fully Connected” Topology
63 Sources x 72 Destinations
Basic Structure
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3
0 1 2 3
4 5 6 7
4 5 6 7
8 9 10 11
8 9 10 11
12 13 14 15
12 13 14 15
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515
00 11 22 33
00 11 22 33
44 55 66 77
44 55 66 77
88 99 1010 1111
88 99 1010 1111
1212 1313 1414 1515
1212 1313 1414 1515
64 x 64 port configuration
Sources
Destinations
Niko NEUFELDCERN, EP
10
Decision Sorting
• In the LHCb trigger decisions are generated as small Ethernet packets in one of 1400 PCs 1 MHz of un-ordered decisions in
• Processing time limited but unknown decisions are taken and sent in arbitrary order
• Front-end electronics requires decisions to be ordered before sent to the trigger distribution system 1 MHz of ordered decisions out
• Limited buffer size entails maximum trigger latency
• Each event entering is made known to central entity (Decision Sorter) 1 MHz of frames
Niko NEUFELDCERN, EP
11
Readout Network
Decision Sorting
FE FE FE FE FE FE
90-153 SFCs
Front-end Electronics
90-153 Links5.5-10 GB/s
TRM
Sorter
TFCSystem
L1-Decision
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
CPUFarm
~1400 CPUs
12
3 41
23
4
24
1 3
56
1 2
34
Niko NEUFELDCERN, EP
12
Test Set-up4 x Tigon2 1000 SX
PPC 750
Tigon2 NICFeatures
•Up to 620 kHz fragment rate
•1 s resolution timer
Measurement Procedure
•Dual NP connected via back-plane to form 8-port module
•Download code into NP4GS3 via RISC Watch (JTAG) or PCI of PPC control point
•Generate traffic either via Gigabit Ethernet NICs (Tigon) or using one NP to feed the other
•Can use the internal timers of the NP and/or the NICs
RISC Watch =
JTAG via Ethernet
IBM NP4GS3R2.0 Reference Kit
8 x 1000 SX Full Duplex Ports
Niko NEUFELDCERN, EP
13
Network Processors
☺Packet processing for several millions of packets per second
☺Fast and big buffer memories
☺Hardware assists for many common tasks, like check-summing, re-framing, tree look-ups
☺Software programmable
☹Processing power optimised for header region of packets
☹Memory model optimised for the hardware (no linear addressing)
☹Programs need to be written in proprietary assembly language
Niko NEUFELDCERN, EP
14
Conclusions
•Network Processors are a powerful tool for packet processing
•They are especially useful, whenever very high rates of packets need to be coped with
•We have found a lot of useful applications, all could be done with the same standard NP module – the software defines the functionality
•But what if PCs can do it too…?
Niko NEUFELDCERN, EP
15
Backup Slides
Niko NEUFELDCERN, EP
16
Data flow in the NP4GS3
Ingress Event Building Egress Event Building
DASL DASL
Access to frame data Access to frame data