Top Banner
reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004
29

Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

reductiondata treatment for ARCS

Tim Kelley, Caltech Materials Science

DANSE workshop—June 22, 2004

Page 2: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

reduction

• components to reduce raw, inelastic data from time-of-flight chopper spectrometer

• essential processing—must be done for every experiment

• validation is essential

• descended from Pharos IDL procedures

Page 3: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

Page 4: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

gross testing

recognizable?

Success!

Yes

oh, $%^!

No

gross testing useful only as long as it is not needed!

Page 5: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

unit testing

• test every execution path

• programmer provides– standard inputs, expected results– means of comparing actual and expected

results to determine pass/fail

• combine unit tests to create regression tests

• unit tests part of writing code!

Page 6: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

example: execution paths

def theAnswer( yourInt): if yourInt == 42: return True else: return False

def theAnswer2( yourInt): if ‘Tim’ in os.uname(): raise RuntimeError, “I hate that guy!” if yourInt == 42: return True else: return False

Page 7: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

Page 8: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

two approaches

Page 9: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

Page 10: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

change to what?

• Object oriented– separate, equalize information sources

• histograms, instrument data, transformations

• Finer granularity, more layered– simpler components, easier to test, replace– add scientific context to generic components

• Unit tests, integrate nexus, Pyre

Page 11: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

example: layering

C++

Python

std::vector

StdVector

S_QE

histograms

basic layers have no scientific content: can be reused for any science

higher layers add context

Page 12: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

reduction structure

C++

Python

histograms instruments transformations

Page 13: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Store and access n. s. data• multidimensional data sets• 200—1400 MB

need• ability to associate metadata• efficient iteration• availability, stability

Histograms

Page 14: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Histograms: C++

Storage based on STL vector class• ANSI standard C++—universally available• Optimized by compilers

Adaptor classes extend to• Different storage schemes• Different dimensionality

Low-level routines: arithmetic, average, etc.

Page 15: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Histograms: Python

Group C++ containers to model complex data sets

• associate signal & error histograms, ...

Add context, preferred behaviors• names, axes, units• error propogation?

Page 16: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Instruments

Find, store, and serve instrument data• detector & monitor configuration, properties

– pixel-sample distances, angles, dimensions, ...

• moderator, chopper, sample positions...

Need• comprehensive description of instrument• flexibility to describe different instruments

Page 17: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Instruments: C++

Abstract base classes parallel NeXus

Concrete subclasses specialize to given instrument, detector type, etc.

STL-based containers implement sorting• e.g. pixels sorted by

– pixel-sample distance– scattering angle(s)

Page 18: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Instruments: Python

Interface to instrument descriptions• Hard-coded• NeXus or other files

Pass info to C++ layer• Python is better suited to parsing tasks• Also useful for instrument diagnostics and

simulation

Page 19: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Transformations

Variable changes, reductions, slicing• time-of-flight to energy• detector coord. to scattering angle (powder)

• S( Qi, Qj) for Qi (Qx, QY, Qz, E), ...

Need• scientific integrity• efficient (fast) operations• flexibility

Page 20: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Transformations: C++

Modular design• Building blocks perform simple tasks

– BinOverlapCalculator– BinOverlapMultiplier

Generic (iterator) interface• Independent of underlying container• Independent of instrument

Compiled, optimized for maximum speed

Page 21: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Transformations: Python

Driver routines• coordinate simpler objects to perform complex

tasks• EnergyRebinDriver, QRebinDriver • easily modified by user

Experiment with pipeline strategies• e.g. reduce data one pixel at a time

– Trivially parallel for much of process– Real-time refinement of ARCS data!

Page 22: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

example: a C++ transformnamespace PharosMath{ template <typename NumT> void reduceSum2d( std::vector<NumT> const &vec2d, std::vector<NumT> & vec1d, std::vector<size_t> const & sizes, size_t axis) ...

1 1 1 1

1 1 1 1

1 1 1 1

4

4

4

3 3 3 3

Reduce a 2d array to 1d by summing rows or columns.

Page 23: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

reduceSum2d unit tests

test_reduceSum2d

test_1

test_2

test_3

test_4

test_5

_runTest

_compareVecs

input #1 input #2

input sizes

reduceSum2d

Page 24: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

exposing it Python

template <typename NumT>static void _callReduceSum2d( PyObject *py2dv, PyObject *py1dv, std::vector<size_t> const & dimSizes, size_t axis){ std::vector<NumT> *p2dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py2dv)); std::vector<NumT> *p1dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py1dv)); Pharos::reduceSum2d<NumT>( *p2dv, *p1dv, dimSizes, axis); return;}

Page 25: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

calling it from Python

reduction.reduceSum2d( vector2d.handle(), vector2d.datatype(), vector1d.handle(), dimSizes, whichAxis)

Page 26: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

a more interesting Python call

def S_QE_To_S_E( sQE):

sE = SE( sQE.datatype(), sQE.numE())

reduction.reduceSum2d( sQE.data(), sQE.datatype(), sE.data(),

sQE.sizes(), sQE.QAxis() )

return sE

instance of class that encapsulates S(|Q|,E).

class that encapsulates S(E)

Page 27: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Implementation

Category Progress To Do

Histograms•STL routines +90%•Pharos adaptors 100 %•unit testing ~50%

•“high-level” classes•abstract Adaptor hierachy

Instrument•Pharos/ARCS C++ classes 100%

•abstract “NeX-ified” hierarchy•unit testing ~20%

Transforms•S(|Q|, E) powders 98%•S(Q, E) sin. cryst. 30%

•revise interface•unit testing ~40%

Primary: C++ ~ 4000 lines; Python ~ 1800 lines; Auxiliary: Python-C bindings~4500 lines; tests~4800+ lines

Page 28: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

how much effort?

• “topline” Original IDLC/C++ conversion 2-3 months

• topline reduction 10-12 weeks

Page 29: Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22, 2004.

Still to do

• Complete unit tests ~ 3-4 weeks• Another iteration ~ 4-6 weeks

– code cleaning/reorg (1)

– iterator interfaces for transformations (1-2)

– Python dataset classes (1-2)

– Performance questions (1)

• Test/integrate single crystal ~ 2 weeks• Pyre integration ~ 3+ weeks• Related, general tasks ~ 4-5 weeks

– Nexus C++ hierarchy (2)

– Strengthening Python-C++ integration (2-3)