DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009
Jan 05, 2016
DiFX Performance Testing
Chris Phillips
eVLBI Project Scientist
25 June 2009
DiFX history
• Developed by Adam Deller at Swinburne University of Technology (now NRAO) to replace LBA S2 correlator to allow disk based correlation
• Production correlator of the LBA (Australia) since 2007
• Verified against LBA, VLBA and Bonn hardware correlators
DiFX overview
• FX-style correlator implemented in C++ • 95% optimised C vector function call
(Heavy reliance of Intel IPP libraries)
• Non-clocked system, unlike HWCs• Maximum performance without compromising generality or ease of maintenance
• Modular design to support generality and enable “3rd party” contributors and local system optimisation
Capabilities
• Near-arbitrary time and frequency resolution• Advanced pulsar gating• eVLBI (LBA has done 1 Gbps eVLBI)• Correlate anything it can unpack (1/2/4/X Gbps)
• Most new formats easy to implement
Supported formats
• Input• LBA• Mk5A (Mk4/VLBA)• K5 (via translation)• Mk5B• VDIF(end 2009)
• Output• RPFITS, FITS-IDI
Current users
• Long Baseline Array (Australia)• VLBA (USA)• MPIfR (Bonn, Germany)• AuScope geodetic array (Australia/NZ, 2009)• E-LOFAR (EU)
Future/Imminent Capabilities
• Single pass, multiple phase center's• Improved (faster) fringe rotation• Band matching
• eg 2x64MHz with 1x128MHz
• Baseband pulsar "folder"• Native geodetic output format• Phase cal extraction• Frequency division multiplexing of VDIF• Polyphase filterbank
DiFX architecture
Master Node
Core 1DataStream 1
DataStream 2
DataStream N
Core 2
Core M
… …
Timerange, destination
Baseband data
Visibilities
Source dataSource data
MPI is used for inter-process communications
Each data transfer is double buffered
Large, segmented ring buffer
Up to 100s MB/a few or more seconds Visbility buffer
Visbility buffer
Visbility buffer
processing buffer
processing buffer
processing buffer
Computational Distribution
• Currently: only time division multiplexing• VDIF will allow frequency division multiplexing: implementation style?
• As currently implemented all baselines must still be correlated on one Core
Benchmarking
• Need to eliminate disk i/o go get clear indication of potential speed of specific setup
• eVLBI!• Live eVLBI not suitable as fixed data rate
• VLBIFAKE program generates eVLBI data stream
• LBADR, Mark5B and VDIF• TCP and UDP• Only TCP usable for benchmarking
• Shell script to run correlator and save logs• Rate determined by median transfer from VLBIFAKE
CSIRO. eVLBI-Aus
Cuppa
• 20 nodes, dual CPU Quad core• 6 stations• Up to 12 processing nodes• Testing number of threads and processing cores
CSIRO. eVLBI-Aus
Scaling with Cores
Date Rate Per Compute Node
Scaling with Threads
Scaling with Threads
Scaling with Spectral Points
Scaling with Stations
APSR
• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes
CSIRO. eVLBI-Aus
APSR
• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes
CSIRO. eVLBI-Aus
Date Rate Per Compute Node
Code collaboration status
• Entire codebase has been organised on SVN (hosted by ATNF)
• DiFX wiki (hosted by Curtin): http://cira.ivec.org/dokuwiki/doku.php/difx/index
• Mailing list: [email protected]• To get on the difx-users list, search out difx-users on google groups and request access, or email me
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
Thank you
ATNFChris PhillipseVLBI Project Scientist
Phone: +61 2 93724608Email: [email protected]: www.atnf.csiro.au/vlbi
Benchmarks
• Non-clocked system, unlike HWCs• Indicative number of CPU cores required to correlate at real time:
• LBA @ 1 Gbps (256 MHz agg. b/w, 2 bit): 100• VLBA @ 4 Gbps (1 GHz agg. b/w, 2 bit): 800
• Weak dependencies on e.g. num. channels• 160 CPU core system (exceeding VLBA HWC capacity) costs <$100k inc. networking, annual electricity ~$10k