DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

DiFX Performance Testing

Chris Phillips

eVLBI Project Scientist

25 June 2009

DiFX history

• Developed by Adam Deller at Swinburne University of Technology (now NRAO) to replace LBA S2 correlator to allow disk based correlation

• Production correlator of the LBA (Australia) since 2007

• Verified against LBA, VLBA and Bonn hardware correlators

DiFX overview

• FX-style correlator implemented in C++ • 95% optimised C vector function call

(Heavy reliance of Intel IPP libraries)

• Non-clocked system, unlike HWCs• Maximum performance without compromising generality or ease of maintenance

• Modular design to support generality and enable “3rd party” contributors and local system optimisation

Capabilities

• Near-arbitrary time and frequency resolution• Advanced pulsar gating• eVLBI (LBA has done 1 Gbps eVLBI)• Correlate anything it can unpack (1/2/4/X Gbps)

• Most new formats easy to implement

Supported formats

• Input• LBA• Mk5A (Mk4/VLBA)• K5 (via translation)• Mk5B• VDIF(end 2009)

• Output• RPFITS, FITS-IDI

Current users

• Long Baseline Array (Australia)• VLBA (USA)• MPIfR (Bonn, Germany)• AuScope geodetic array (Australia/NZ, 2009)• E-LOFAR (EU)

Future/Imminent Capabilities

• Single pass, multiple phase center's• Improved (faster) fringe rotation• Band matching

• eg 2x64MHz with 1x128MHz

• Baseband pulsar "folder"• Native geodetic output format• Phase cal extraction• Frequency division multiplexing of VDIF• Polyphase filterbank

DiFX architecture

Master Node

Core 1DataStream 1

DataStream 2

DataStream N

Core 2

Core M

… …

Timerange, destination

Baseband data

Visibilities

Source dataSource data

MPI is used for inter-process communications

Each data transfer is double buffered

Large, segmented ring buffer

Up to 100s MB/a few or more seconds Visbility buffer

Visbility buffer

Visbility buffer

processing buffer

processing buffer

processing buffer

Computational Distribution

• Currently: only time division multiplexing• VDIF will allow frequency division multiplexing: implementation style?

• As currently implemented all baselines must still be correlated on one Core

Benchmarking

• Need to eliminate disk i/o go get clear indication of potential speed of specific setup

• eVLBI!• Live eVLBI not suitable as fixed data rate

• VLBIFAKE program generates eVLBI data stream

• LBADR, Mark5B and VDIF• TCP and UDP• Only TCP usable for benchmarking

• Shell script to run correlator and save logs• Rate determined by median transfer from VLBIFAKE

CSIRO. eVLBI-Aus

Cuppa

• 20 nodes, dual CPU Quad core• 6 stations• Up to 12 processing nodes• Testing number of threads and processing cores

CSIRO. eVLBI-Aus

Scaling with Cores

Date Rate Per Compute Node

Scaling with Threads

Scaling with Threads

Scaling with Spectral Points

Scaling with Stations

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

Date Rate Per Compute Node

Code collaboration status

• Entire codebase has been organised on SVN (hosted by ATNF)

• DiFX wiki (hosted by Curtin): http://cira.ivec.org/dokuwiki/doku.php/difx/index

• Mailing list: [email protected]• To get on the difx-users list, search out difx-users on google groups and request access, or email me

http://cira.ivec.org/dokuwiki/doku.php/difx/index

mailto:[email protected]

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

ATNFChris PhillipseVLBI Project Scientist

Phone: +61 2 93724608Email: [email protected]: www.atnf.csiro.au/vlbi

Benchmarks

• Non-clocked system, unlike HWCs• Indicative number of CPU cores required to correlate at real time:

• LBA @ 1 Gbps (256 MHz agg. b/w, 2 bit): 100• VLBA @ 4 Gbps (1 GHz agg. b/w, 2 bit): 800

• Weak dependencies on e.g. num. channels• 160 CPU core system (exceeding VLBA HWC capacity) costs <$100k inc. networking, annual electricity ~$10k

DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Documents

live evlbi

evlbiauscuppa20 nodes

evlbi data streamlbadr

evlbiausdifx historydeveloped

evlbiausdifx architecturempi

evlbiausdate rate

evlbiausapsr18 compute

dual cpu quad core6