Reproducible Computational Experiments Using MADAGASCAR Software Package Sergey Fomel Bureau of Economic Geology University of Texas at Austin Applied Inverse Problems Vancouver BC June 29, 2007 http://rsf.sf.net/
Jan 14, 2016
Reproducible Computational Experiments Using MADAGASCAR Software Package
Sergey FomelBureau of Economic Geology
University of Texas at Austin
Applied Inverse Problems
Vancouver BC
June 29, 2007
http://rsf.sf.net/
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
Encapsulation
Information hiding (Parnas, 1972) Separation of concerns (Dijkstra, 1974)
Separate physics from mathematics
A is physics Going from b to is mathematics
x̂ argmin Ax -b R x
x̂
http://rsf.sourceforge.net/
Example: Velocity Transform
http://rsf.sourceforge.net/
Physics of Velocity Transform
http://rsf.sourceforge.net/
http://rsf.sourceforge.net/
http://rsf.sourceforge.net/
Encapsulation in Programming
Separation of concerns– Classes or templates (C++)– Function pointers (C)– Function interfaces (Fortran-90)
/* initialize velocity transform (A) */ veltran_init (true, x0, dx, nx, s0, ds, nv, o1, d1, nt, s02, anti, psun1, psun2);
/* least-squares minimization of |A x – b|^2, x=vscan, b=cmp */sf_solver (veltran_lop, sf_cgstep, ntv, ntx, vscan, cmp, niter,
"err", error, "nmem", 0, "nfreq", miter, "mwt", mask, "end");
http://rsf.sourceforge.net/
Encapsulation in UNIX
Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because
that is a universal interface.
http://rsf.sourceforge.net/
Encapsulation in UNIX Shell
bash$ sfveltran < cmp.rsf > vtran.rsf adj=y v0=1 dv=0.025 nv=60bash$ sfdottest sfveltran mod=vtran.rsf dat=cmp.rsf v0=1 dv=0.025 nv=60sfdottest: L[m]*d=21665.9sfdottest: L'[d]*m=21665.9bash$ sfdottest sfveltran mod=vtran.rsf dat=cmp.rsf v0=1 dv=0.025 nv=60sfdottest: L[m]*d=21906.2sfdottest: L'[d]*m=21906.2bash$ sfconjgrad sfveltran < cmp.rsf > vtran.rsf niter=3 v0=1 dv=0.025 nv=60 sfconjgrad: iter 1 of 3sfconjgrad: grad=6.36797e+09sfconjgrad: iter 2 of 3sfconjgrad: grad=1.39068e+09sfconjgrad: iter 3 of 3sfconjgrad: grad=7.50257e+08
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
The Art of UNIX Programming
(Raymond, 2004) To design a perfect anti-Unix, make all file
formats binary and opaque, and require heavyweight tools to read and edit them.
If you feel an urge to design a complex binary file format, or a complex binary application protocol, it is generally wise to lie down until the feeling passes.
http://rsf.sourceforge.net/
RSF (Regularly Sampled Format)
SEPlib (Stanford Exploration Project) Data separated from text headers Conceptually N-dimensional hypercubes Multiple files for complex geometries Not application specific
Data
n1=1000 in=“/path/data.rsf@”n2=500 n3=100 d1=0.001 d2=0.1 o2=1
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
Testing
Test-driven development (Beck, 2003) YAGNI principle
– Always implement things when you actually need them, never when you just foresee that you need them.
In scientific software development, tests are computational experiments
http://rsf.sourceforge.net/
Testing with SCons
Software Construction Replacement for “make”
– reliable and extensible dependency analysis
– configuration files are Python scripts
– cross-platform– open-sourcehttp://www.scons.org
http://rsf.sourceforge.net/
SConstruct File
# Mobil AVO CMP gather 807 at well4 locationFetch('cmp807_raw.HH','rad')
# PreprocessingFlow('cmp','cmp807_raw.HH', 'dd form=native | tpow tpow=2 | mutter half=n v0=1.3 tp=0.2')Plot('cmp','grey title="Input CMP Gather" ‘)
# Velocity TransformFlow('veltran','cmp','veltran s02=0.25 v0=1.250 dv=0.025 nv=60 adj=y')Plot('veltran','grey title="Velocity Scan" ')
# Display Side by SideResult('veltran','cmp veltran','SideBySideAniso')
http://rsf.sourceforge.net/
Experimenting with SCons
bash$ sconsretrieve(["cmp807_raw.HH"], [])< cmp807_raw.HH sfdd form=native | sftpow tpow=2 | sfmutter half=n v0=1.3 tp=0.2 > cmp.rsf< cmp.rsf sfgrey title="Input CMP Gather" > cmp.vpl< cmp.rsf sfveltran s02=0.25 v0=1.250 dv=0.025 nv=60 adj=y > veltran.rsf< veltran.rsf sfgrey title="Velocity Scan" > veltran.vplvppen yscale=2 vpstyle=n gridnum=2,1 cmp.vpl veltran.vpl > Fig/veltran.vplbash$ sed s/Velocity/Slowness/ < SConstruct > SConstruct2bash$ mv SConstruct2 SConstructbash$ scons < veltran.rsf sfgrey title=“Slowness Scan" > veltran.vplvppen yscale=2 vpstyle=n gridnum=2,1 cmp.vpl veltran.vpl > Fig/veltran.vpl
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
Reproducible Research at Stanford
(Knuth, 1992)– A computer program should be written with
human readability as a primary goal. (Claerbout and Karrenbach, 1992)
– The purpose of reproducible research is to facilitate someone going a step further by changing something.
(Buckheit and Donoho, 1995)– An article about computational science in a
scientific publication is not the scholarship itself, it is merely advertising of the scholarship.
http://rsf.sourceforge.net/
Reproducible Experiments
Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel […] Where else in science can one get away with publishing observations that are claimed to prove a theory or illustrate the success of a technique without having to give a careful description of the methods used, in sufficient detail that others can attempt to repeat the experiment? (LeVeque, 2006)
http://rsf.sourceforge.net/
http://rsf.sourceforge.net/
http://rsf.sourceforge.net/
Principles of Scientific Software
EncapsulationFile FormatsTestingReproducibilityMaintenance
http://rsf.sourceforge.net/
Maintenance
Computational experiments that are not continuously maintained loose reproducibility.– Regression testing (Brooks, 1975)
Contribute computational software and experiments to a community-maintained repository to enable research productivity.
http://rsf.sourceforge.net/
Open Science
http://rsf.sourceforge.net/
Conclusions
Principles of Scientific Software– Encapsulation– File Formats– Testing– Reproducibility– Maintenance
Madagascar software package– Open source, open community, open science
http://rsf.sf.net/