New Features in DiFX2.0 Adam Deller NRAO 3rd DiFX workshop, Curtin University, Perth
Mar 18, 2016
New Features in DiFX2.0Adam Deller
NRAO
3rd DiFX workshop, Curtin University, Perth
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Outline What is DiFX2.0? New features:
Spectral channel selection (“zoom bands”) In-correlator averaging Multiple phase centres (MPCs) Local oscillator (LO) offsets Skipping over useless data rather than reading
Under the hood: Station-based improvements in Mode Baseline-based improvements in Core
Adam Deller 3rd DiFX workshop, Curtin University, Perth
What is DiFX2.0? DiFX2.0 is an evolution of the DiFX code
base Adds new features and changes the way
some of the internal information is maintained (eg time: now managed by scan)
Required a big break with the existing code due to changes in file formats - necessary to provide control info for the new features
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: channel select Define a new “Frequency” band which
encompasses a subset of an existing band This “zoom” frequency can be selected as the
one to correlate at a baseline, in place of the full bandwidth
Applications: wide recorded band, narrow maser emission
(throw away the useless channels, save network) Correlate eg 1x16 MHz with 2x8MHz bands
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: channel selectDatastream 1 band Datastream 2 band
Baseline
x
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: averaging Narrow-field VLBI only requires coarse
spectral resolution eg 0.5 MHz But taking eg a 16 point FFT is not efficient! Minimum desirable FFT size is about 128 For coarser spectral resolution, visibilities had
been averaged in difx2fits Wasteful of intermediate diskspace Now averaged in correlator: saves network
capacity (enabling MPCs) and disk space
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: averagingDatastream 1 band Datastream 2 band
Thread visibility
x
Core visibility
avg
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: Multiple PCs At any given instant, the phase centre of
correlation can be changed by rotating the visibilities by a phase value equal to the LO frequency x delay (between desired phase centre and current phase centre)
This is a station-based effect, but if done after some accumulation must be done separately to each baseline
primary beam
uv-shifted“pencil” fields
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: Multiple PCs Multiple phase centres were the main driver
behind the compatibility-breaking upgrades for DiFX2.0
Need to provide separate geometric model for each phase centre (calcif2, vex2difx)
The initial correlation is directed at the pointing centre, with high spectral resolution, and typically once per subint (can be more frequent) shift is applied and chans averaged
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: Multiple PCs
Thread amp
Theadphase
Rotate phase
Thread amp
Theadphase
Average
Core amp
Corephase
Repeat for each phase centreSubint visibility
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: LO offsets An improperly set LO at a station yields
wrapping phase This can now be corrected for in DiFX It is implemented post-FFT, so is limited to
maximum offset rates of a few Hz to a few kHz, depending on FFT size
Could be done pre-FFT if people thought was really needed (discussion?)
Also required a new entry in the input file
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New features: LO offsets
One FFT
-180°
180°
time
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New feature: data skipping In file-based mode in DiFX1.5, a Datastream
will read all of every file you give it This is annoying if you want to correlate a
subset of an experiment - the file list must be cropped, and sometimes files are big so just reading from the start of one takes ages
In DiFX2.0, the read thread checks the time of the last send request, and attempts to reposition file pointer appropriately
Adam Deller 3rd DiFX workshop, Curtin University, Perth
New feature: data skipping
Time
File 1 File 2
Latest FxManagerrequest
Read thread opens file
Attempts toskip past EOF
Read thread opens next file
Skips to precedinginteger second
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Station-based efficiency gains The majority of the station-based cost was
not in the FFT, but the sin/cos to calculate the phase of the fringe rotation (pre-F)
For the situation where the phase change is linear from channel to channel (always true) can calculate sin/cos for the first N channels and then for every Mth channel, use complex multiplies to get the full NxM channel result
Saves about 20% of the overall execution time for 10 stations = ~25% of station-based
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Station-based efficiency gains
-180°
180°One FFT of data
Previously, sin/cosfor every sample
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Station-based efficiency gains
-180°
180°One FFT of data
Now, sin/cos the first M samples, and every M’th afterthat
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Baseline-based efficiency gain For many baselines/large numbers of
channels, entire output accumulator no longer fits in CPU cache - massive slowdown
Looping over baseline/freq/polarisation once per FFT is inefficient in this situation
Solution: calculate more than one FFT for each datastream, then XMAC the same baseline/freq for more than one FFT
Reduces the overhead of going from 128 to 2048 chans/band from ~5x to ~2x
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Baseline-based efficiency gainBefore:
Mode 1 Mode 2 Mode 3 … Mode N
Visibility buffer(too big for cache)
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Baseline-based efficiency gainAfter:
Mode 1 Mode 2 Mode 3 … Mode N
Visibility buffer(too big for cache)
But one slot fits in cache!
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Summary of input file changes .input file moved:
Num channels, oversample, decimation (from config entries to frequency entries)
.input file changed: Post-F fringe rotation, quadratic interpolation ->
fringe rotation order Blocks per send/guard blocks->subintNS/guardNS Delay/uvw files -> im file
.input file new: LO offset, zoom band entries (datastream)
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Summary of other file changes Calc file changes:
Added a source table, which is referenced in the scan table
Scans now have pointing centre and one or more phase centres
IM file changes: Extra entries for the phase centres, as well as the
pointing centre
Adam Deller 3rd DiFX workshop, Curtin University, Perth
Some quick benchmark results For station-based dominated (<~10 stations)
DiFX2.0 should be ~20% faster To add phase centres out to the edge of the
primary beam one must go to 2k or 4k channels = 2-3x slower than continuum 128 (was more like 4-5x slower before changes)
But then adding phase centres is basically free. Doing 100 phase centres is only about 1.2x slower than doing 1 phase centre