IPbus: A flexible Ethernet-based control system for xTCA hardware Tom Williams Rutherford Appleton Laboratory On behalf of the IPbus team (Bristol, Imperial, RAL, CERN) 24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 1
IPbus: A flexible Ethernet-based control system for xTCA hardware
Tom Williams
Rutherford Appleton Laboratory
On behalf of the IPbus team (Bristol, Imperial, RAL, CERN)
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 1
Content
• What is IPbus?
• IPbus firmware and software suite:• Firmware core
• uHAL library
• ControlHub
• Control system topology
• Reliability testing
• Performance measurements
• Lessons learnt & future directions
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 2
Control system requirements
• Reliable• Main link for configuring, monitoring & debugging hardware
• Control system must have reliable and predictable behaviour under all conditions
• Scalable• 100’s of new xTCA electronics boards in CMS Phase-1 upgrades
• Simple• Ideally, same ease of setup and use from single ‘board on benchtop’ scenario
to final system
• Long maintainable lifetime• Industry-standard technologies
• Complexity in software rather than firmware• Software on commercial PC hardware easier to debug than firmware
• Preferably low latency and high bandwidth
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 3
What is IPbus?• Previously: used VME standard
• Dedicated signalling, arbitration and hardware access protocols
• xTCA standards (uTCA & ATCA)• Include industry-standard communication technologies – GbE & PCIe
• Used in e.g. LHC experiment upgrades
• Ethernet & IP:• Highly flexible technology, ubiquitous (incl. Gigabit Ethernet)
• IP-based networks: easily, cheaply scalable
• IPbus: A simple IP-based control protocol for xTCA• Designed for controlling xTCA hardware – i.e. read/write registers, etc.
• Originally created by Jeremy Mans et al in 2009/2010
• Now main development from UK collaboration (CMS upgrades)
• Recent focus on production-level reliability, performance, and scalability
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 4
The IPbus protocol
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 5
• A simple IP-based control protocol • Read & write (single register, block RAM, FIFOs)
• Atomic read-modify-write
• A32/D32
• Lies in application layer
of networking model • Transport protocol agnostic
• Contains recovery mechanism for dropped/reordered/duplicated UDP packets
• Extensively-tested SoC implementation in default IPbus firmware core
• Current version: 2.0• Released early 2013, focus on reliability and bandwidth
The IPbus suite• Defining a protocol is useful, but really need implementations …
1. IPbus firmware• Reference VHDL implementation of IPbus 2.0 UDP server
• Complete system-on-chip implemetation
• Interprets and implements IPbus transactions (read, write, …) on FPGA
2. uHAL library• C++ and Python end-user Hardware Access Library
• Design mimics recursive modularity of firmware blocks
3. ControlHub• Software application analogous to VME crate controller
• Mediates/Arbitrates simultaneous hardware access from multiple clients
• Implements IPbus reliability mechanism
• Documentation, installation instructions, etc – http://cactus.web.cern.ch
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 6
IPbus firmware core• Reference VHDL SoC implementation of IPbus 2.0 UDP server
• Currently Xilinx-specifc; but successfully adapted for Altera devices & custom ASICs
• Interprets IPbus transactions (read, write, …) on FPGA
• Transport protocol: UDP vs TCP• Main processing logic firmware (e.g. trigger algos) must fit on same FPGA
• TCP: complex algorithm• Implementing full protocol in FPGA => high resource usage
• UDP: Much simpler algorithm• Can implement in firmware with low resource usage
• Use UDP, correct for packet loss with IPbus-level reliability mechanism
• Also ICMP (unix ping command), ARP, and RARP (IP address assignment)
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 7
Resource usage FFs Slices BRAMs
Fully-featured 3500 2900 17
Minimal config 2000 1000 525% of smallest Spartan-6 chip
uHAL• C++ library providing end-user API for reads, writes, etc.
• Also has Python bindings
• Register layout specified in XML files• Reflect hierarchical and modular nature of firmware
• Promotes reuse and modularity of address table files
• Includes example GUI for hardware development
• Fast and scalable in
conjunction with
ControlHub
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 8
ControlHub• Software application analogous to VME crate controller
• Purpose:• Route uHAL IPbus traffic from multiple control applications to single board
• Implement packet-loss recovery over UDP
• Also, implementation must allow multiple clients to communicate with multiple targets reliably, efficiently & independently
• Implemented in Erlang:• Concurrent programming language developed by Ericsson (J Armstrong et al)
• Scales transparently across multiple CPU cores
• Standard libraries for creating high-availability, fault-tolerant applications
• Efficient, mature network protocol implementations
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 9
Example system topologies (1)• Simplest scenario: 1 board, 1 computer
• Simple network topology; may not need ControlHub
• Several IPbus targets• E.g. integration tests, test beam, …
• Multiple control/monitoring
applications
• ControlHub arbitrates
hardware access
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 10
Example system topologies (2)• Full-scale system, large experiment
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 11
Testing with realistic layout
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 12
• Extensive program of reliability testing & performance measurements carried out with uTCA hardware for CMS upgrades
• One uTCA crate, 2 rack PCs:• 12 AMCs (GLIBs & Mini-T5s)
• Both Vadatech & NAT MCHs tested• MCH contains Ethernet switch
• Same network components
as planned for final system
System reliability• Tested full chain: uHAL – ControlHub – firmware
• Including recovery from software-induced packet loss
• Continuous testing over many hours• Random sequences of read/write & read-modify-write
• Continually verifying registers have correct values
• O(1010) transactions to various boards (GLIB, MP7, Mini-T5)
• No errors!
• Software also tested nightly• wide range of unit test executables
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 13
Performance (1)• Definitions:
• Latency = Time taken for uHAL client to perform IPbus transaction
• Throughput = Data transferred / Latency
• 1 client, 1 target:• Larger single-word latency wrt VME/PCIe
compensated by multiple reads/writes per packet, and multiple packets in flight
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 14
// E.g. for read ...
timer.start();
client->readBlock(addr, size,
uhal::NON_INCREMENTAL);
client->dispatch();
timer.stop();
100kB1MB
10kB
Number of words Number of words
Performance (2)• Polling register in targets
• Each client continuously polls 1 target
• 1 to 12 targets; 1, 2 or 4 clients per target …
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 15
Performance (3)• Block writes/reads, multiple targets
• For reads, could get congestion-induced packet loss at MCH switch
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 16
Performance (4)• Block writes/reads, multiple targets
• 1 client per target; 600MB read from / written to crate
• Default IPbus software setup
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 17
NAT MCH Vadatech MCH
Performance (5)• Block writes/reads, multiple targets
• 1 client per target; 600MB read from / written to crate
• Reducing number of IPbus packets in flight (edit ControlHub config file)• Lowers 1 client <–> 1 target throughput by 12%
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 18
NAT MCH Vadatech MCH
Lessons learnt• You can (almost) never have too much testing
• Attack the problem (system reliability) from as many angles as possible
• Software unit tests, full chain tests with real hardware, …
• If possible, test early with planned hardware
• Hardware vendors may have different interpretations of standards• Affecting seemingly trivial day-to-day operation tasks
• E.g. MAC & IP address management in CMS by IPMI
• Need a robust, failsafe system that works through all scenarios
• … have found subtly differing behaviours of MCHs from different vendors
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 19
Conclusions• IPbus protocol and suite
• Being integrated into CMS for LHC Run 2
• Also in ATLAS & ALICE upgrades; FNAL g-2, SOLID
• Past few years: Significant progress on reliability and performance• Solving subtle, rare bugs
• Improving system scalability
• Utilising bandwidth of Gigabit Ethernet
• In (pre-)production environment
• Future plans:• Code consolidation; improve debugging of error cases in complex systems
• IPbus locking mechanism for exclusive access to hardware from single client
• Update IPbus suite for 10 Gigabit Ethernet
24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 20
More information• IPbus protocol and suite
• Extensively-tested, tightly-integrated suite with Gigabit performance
• Easily scalable control system
• Applicable to any hardware
with Ethernet interface
• For more information …• Main page:
http://cactus.web.cern.ch/
• Firmware:
https://svnweb.cern.ch/trac/cactus/wiki/IPbusFirmware
• Software (uHAL + ControlHub):
Quick start tutorial (Easy installation on SL(C) 5/6)
https://svnweb.cern.ch/trac/cactus/wiki/uhalQuickTutorial
• Bug reports, feature requests, clarifications
https://svnweb.cern.ch/trac/cactus/newticket24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 21