Pan-STARRS Seminar IfA 2012.09.21 Pan-STARRS Seminar 5 Parallel Computing Concepts and the Pan-STARRS Image Processing Pipeline
Pan-STARRS Seminar IfA 2012.09.21
Pan-STARRSSeminar 5
Parallel Computing Conceptsand the Pan-STARRSImage Processing Pipeline
Pan-STARRS Seminar IfA 2012.09.21
IPP in PS1
PreferredScience Clients
OTIS
PSPS
MOPSScience Client
IPP
PS1 Community
Solar SystemCommunity
queries
metadata,detections
rawimages
metadata,detections
metadata
metadata,detections
orbits,identifications
filtered detections &metadata
Camera
Telescope
static sky images
PS Subsystem
Legend
pixel data
meta & object data
ExternalSystem
commands
photons
PreferredSci Client
DVOcmf / smf filesdistribution systempostage stamp server
Pan-STARRS Seminar IfA 2012.09.21
Analysis Strategies
warp & stack
reference image(static sky or other warp image)
difference image
-
cleanedstackedimage
+
The Static Sky Image Combine&
Image Difference
Pan-STARRS Seminar IfA 2012.09.21
single-imagescience analysis
chip
camera
fake
warp
imagecombinations
stack
diff
magic
image registration
register
summitcopy
detrend creation
process
stack
normalize
residuals
reject
detrend image
IPP Flowchart (simplified)
DVO
photometry calibration
astrometry calibration
data release
destreak
distribution
publish
PS1SCclientsPSPS
Pan-STARRS Seminar IfA 2012.09.21
IPP Architecture
Image Server
(Nebulous)
Metadata DB(ippdb)
Object DB(DVO)
IPP Controller (pcontrol)
Camera
Client SciencePipelines
PSPS
IPP Scheduler (pantasks)
IPPProcess
ExternalSystem
Legendpixel data
meta & object data
commands & messages
Analysis Tasks
OTIS
images
static sky images metadata
detections filtereddetections
metadata
metadata
metadata
commands
commands
images
detections
metadata, Q/A
IPPData
publishing process
observing operations / processing boundary
Pan-STARRS Seminar IfA 2012.09.21
A short digression on parallel processing● The scale of the problem for IPP:
● Full 3pi Survey = 300,000 exposures x 60 chips = 18M images● or: 300,000 exp x 1.4Gpix = 840 Terabytes (raw)
● Goal: full reprocess in ~6 months● Single Threaded thought experiment:
● Some minimal analysis might take ~1 minute per chip● 1 core would take 34 years for 3pi!
● In reality, I want to do more work than that (full IPP processing requires a total of ~900 sec per chip per core)
● Need to parallelize to make the problem tractable
Pan-STARRS Seminar IfA 2012.09.21
basic parallel processing● a simple concept...
job 1 job 2 job 3 job 4 serial jobs on onecomputer / core
job 1
job 2
job 3
job 4
parallel jobs onN computers =N x as fast..
some jargon:● job : some specific thing to be done● thread : a job or part of a job running in serial● lock : a tool to allow one thread to block other threads● message : information passed between threads or jobs
Pan-STARRS Seminar IfA 2012.09.21
some caveats...● job dependency and sequencing
job 1 job 2a
job 2b
job 2c
if some jobs depend on results of other jobs● we need to manage the sequence● total speed up is < N (here 3/5 not 1/5 of time)
(Amdahl's law, 1967)
job 3
Pan-STARRS Seminar IfA 2012.09.21
some caveats...● what is limiting resource?
job 1
job 2
job 3
job 4
if limiting resource is notcomputation, there mayno gain from more computers..
job 1 job 2 job 3 job 4
Pan-STARRS Seminar IfA 2012.09.21
some caveats...● locks
job 1
job 2
job 3
if some jobs need to modify a common resource,they need to set a lock to avoid conflicts.
careful: locks block processing and can kill your thoughput!
fine-grained locking allows higher throughput (but can be harder to program)
job 1
job 2
job 3
Pan-STARRS Seminar IfA 2012.09.21
other caveats● beware of deadlocks...
job 1
object A
object B
job 1
job 2
object A
object B
job 1
job 2
object A
object B
job 2
object A
object B
deadlock
safeimplementation
Pan-STARRS Seminar IfA 2012.09.21
computers vs cpus vs cores● 'computer' : probably a single motherboard, I/O via ethernet● 'cpu' : probably a single chunk of silicon, I/O via mobo● 'core' : subdivision of a cpu, possible core-to-core I/O
● multi-core cpus common since ~2005
ixbtlabs.com
Pan-STARRS Seminar IfA 2012.09.21
parallel vs multi-threaded● parallel : spread work across machines
● data exchange via network (eg, ethernet)● multi-threaded : spread work across cores
● dual- or multi-processor computers share memory / disk
ixbtlabs.com
Pan-STARRS Seminar IfA 2012.09.21
parallel processing strategies● fine-grained : lots of coordinates and synchronization needed
● use multi-threaded programming● coarse-grained : parallel operations on larger chunks, easier to code
● multi-thread or cluster computing?● embarrassingly parallel : many equivalent, large-scale tasks
● use a cluster...
Pan-STARRS Seminar IfA 2012.09.21
some parallel processing technologies● running parallel tasks:
● PVM (Parallel Virtual Machine)● Condor● pcontrol (IPP integrated tool)● eg, ssh machine1 job1; ssh machine2 job2
● parallel program with message passing● MPI (Message Passing Interface)
● standard for libraries● communication between cluster nodes
● multi-threaded programming : pthreads● standard UNIX / Linux library● provides locks and thread message functions
Pan-STARRS Seminar IfA 2012.09.21
Parallel Processing in the IPP● pantasks + pcontrol : high-level (embarrassing) parallelism
● pantasks : task management (beyond scope)● data and jobs are pre-assigned and co-located
● eg, chip XY03 -> ipp021● all processing on XY03 -> ipp021● of, skycell.1204.02 -> ipp053● all processing on skycell.1204.02 -> ipp053
● targeted machines are 'desired' but not usually 'required'
004 005 006 007 008 009 010
Big Switch
Pan-STARRS Seminar IfA 2012.09.21
Parallel Processing in the IPP● most programs use pthreads for multithreading
● eg, parallel analysis of object moments● eg, parallel fitting of star and/or galaxy models
● some programming care is needed to avoid collisions
Pan-STARRS Seminar IfA 2012.09.21
Multithreaded programs : avoiding collisions● threaded analysis of models : cannot process same pixels in 2
threads (because we add in and subtract the models)● how to lock?
Pan-STARRS Seminar IfA 2012.09.21
Multithreaded programs : avoiding collisions● lay down a virtual chessboard (does not need to be 8x8)● do the analysis in 4 passes● 1: red cells to threads● 2: blue cells to threads● 3: yellow cells to threads● 4: green cells to threads● limited by slowest thread
Pan-STARRS Seminar IfA 2012.09.21
GPU programming● GPUs are like multiple cores taken to an extreme...● Advantages
● many simultaneous operations (1000s)● massive aggregate floating-point-op/sec● relatively cheap
● Disadvantages● limited language support● more rigid programming model● heterogeneous hardware / incompatibilities
● FFTW has easy GPU library support
Pan-STARRS Seminar IfA 2012.09.21
Parallel data● The Pan-STARRS Data Volume is huge
● already > 800TB of raw data (compressed, 2 copies)● output data volume potentially huge (10x - 20x raw volume)
● Storage mandates a distributed solution● Current largest single machines ~120TB● PS1 currently has 3.5 PB of total storage
● Data Management strategies are critical● Keep track of ~ 1 Billion files on the cluster● RAIDs are falible: Keep duplicate copies for safety.● name abstraction is needed (easy to move real files)
Pan-STARRS Seminar IfA 2012.09.21
PS1 / IPP Data Products● FITS Images
● Chip vs Warp vs Diff vs Stack● Access via Postage Stamp Server● User Interface is still being improved...
● FITS Tables● Chip vs Camera vs Diff vs Stack Photometry● Detections (properties of things in an image)
● DVO Database(s)● simple (simplistic), organized access to detections & objects● feeds to PSPS● requires some hefty hardware (currently, ~10TB)
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
IPP Output Source Tables ('CMF' or 'SMF' File)● FITS Table format
● multiple tables per chip● PHU Carries global data for the exposure● One Table group for each image Chip
● Header for each Chip● derived from original chip header● NAXIS = 0 : no pixel data● carries metadata, Astrometric & Photometric transformation
● PSF table : PSF fits for all objects● XSRC table : Aperture-like measurements (Petrosian, etc)● XFIT table : Extended source fit measurements
● Segments are identified by EXTNAME = CHIP.psf, xscr, etc● Multiple Data Schemas available:
● PS1_V3 (most complete single image)● PS1_DV2 (most complete diff image)● PS1_SV1 (most complete 'static sky' images)
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Detections in IPP and PSPS
smf smf smf smf cmf
DVO PSPS
exp 1 exp 2 exp 3 exp 4 stack
(detection to object association) note: PSPS has complete smf data(DVO only has PSF & Kron data)
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
IPP Output Source Tables : PSF Table Contents● Format version defined by header keyword EXTNAME● Parameters based on PSF fits:
● detection ID● X, Y, error● Instrumental Mag, flux, error, aperture mag (+ raw ap mag)● Kron parameters● Peak flux as Mag● sky, error● fit chi-square● CR, EXT Nsigma deviation● PSF shape (major, minor, theta) & moments (+ high order)● psf weight factor (Sum(psf * (1 - mask))) (+ 'suspect' version)● nFrames (for stack & diff)● 32 bit analysis flags (+ flags2)● additional special fields for diff and static sky
● RA, DEC, Calibrated Mags also available● Table of Matched reference stars (photometry & astrometry cal.)
Pan-STARRS Seminar IfA 2012.09.21
IPP Output Source Tables : XFIT Table Contents● Format version defined by header keyword EXTNAME● One row for each object and model● Only a subset of objects in PSF table● Parameters based on extended model fits:
● detection ID (matched to PSF table)● X, Y, error● Instrumental Mag, error● Model and Nparams● ellipse shape (major, minor, theta)● additional parameters (model-depended)● full covariance matrix● fit chi-square
● Export version of these files with RA, DEC, Calibrated Mags
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
IPP Output Source Tables : XSRC Table Contents● Format version defined by header keyword EXTNAME● on row for each object, subset of PSF sources● Parameters:
● detection ID● X, Y from PSF fit● Petrosian Radius, Flux, errors● Elliptical Surface Brightness profile● note: not all parameters are measured for all objects
● Export version of these files with RA, DEC, Calibrated Mags
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields● IPP_IDET : IPP detection identifier index ● X_PSF : PSF x coordinate ● Y_PSF : PSF y coordinate ● X_PSF_SIG : Sigma in PSF x coordinate ● Y_PSF_SIG : Sigma in PSF y coordinate ● RA_PSF : PSF RA coordinate (degrees) ● DEC_PSF : PSF DEC coordinate (degrees) ● POSANGLE : position angle at source (degrees) ● PLTSCALE : plate scale at source (arcsec/pixel)● FLAGS : psphot analysis flags ● FLAGS2 : psphot analysis flags ● N_FRAMES : Number of frames overlapping source center● PADDING : padding
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields● PSF_INST_MAG : PSF fit instrumental magnitude ● PSF_INST_MAG_SIG : Sigma of PSF instrumental magnitude ● PSF_INST_FLUX : PSF fit instrumental flux (counts) ● PSF_INST_FLUX_SIG : Sigma of PSF instrumental flux ● AP_MAG : magnitude in standard aperture ● AP_MAG_RAW : magnitude in reported aperture ● AP_MAG_RADIUS : radius used for aperture mags ● PEAK_FLUX_AS_MAG : Peak flux expressed as magnitude ● CAL_PSF_MAG : PSF Magnitude using supplied calibration ● CAL_PSF_MAG_SIG : measured scatter of zero point calibration● SKY : Sky level ● SKY_SIGMA : Sigma of sky level
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields● PSF_CHISQ : Chisq of PSF-fit ● CR_NSIGMA : Nsigma deviations from PSF to CF ● EXT_NSIGMA : Nsigma deviations from PSF to EXT ● PSF_MAJOR : PSF width (major axis) ● PSF_MINOR : PSF width (minor axis) ● PSF_THETA : PSF orientation angle ● PSF_QF : PSF coverage/quality factor (bad) ● PSF_QF_PERFECT : PSF coverage/quality factor (poor) ● PSF_NDOF : degrees of freedom ● PSF_NPIX : number of pixels in fit
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields● MOMENTS_XX : second moments (X^2) ● MOMENTS_XY : second moments (X*Y) ● MOMENTS_YY : second moments (Y*Y) ● MOMENTS_M3C : third momemt cos theta ● MOMENTS_M3S : third momemt sin theta ● MOMENTS_M4C : fourth momemt cos theta ● MOMENTS_M4S : fourth momemt sin theta ● MOMENTS_R1 : first radial moment ● MOMENTS_RH : half radial moment ● KRON_FLUX : Kron Flux (in 2.5 R1) ● KRON_FLUX_ERR : Kron Flux Error ● KRON_FLUX_INNER : Kron Flux (in 2.5 R1) ● KRON_FLUX_OUTER : Kron Flux (in 2.5 R1)
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields (diff image version)● DIFF_NPOS : nPos (n pix > 3 sigma) ● DIFF_FRATIO : fPos / (fPos + fNeg) ● DIFF_NRATIO_BAD : nPos / (nPos + nNeg) ● DIFF_NRATIO_MASK : nPos / (nPos + nMask) ● DIFF_NRATIO_ALL : nPos / (nPos + nMask + nNeg) ● DIFF_R_P : distance to positive match source ● DIFF_SN_P : signal-to-noise of pos match src ● DIFF_R_M : distance to negative match source ● DIFF_SN_M : signal-to-noise of neg match src
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
Descriptions of SMF / CMF fields (Static Sky version)● APER_FLUX : flux within annuli ● APER_FLUX_ERR : flux error in annuli ● APER_FILL : fill factor of annuli
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/browser/trunk/psModules/src/objects/pmSourceIO_CMF.txt
Pan-STARRS Seminar IfA 2012.09.21
DVO Database● mini dvo databases are updated nightly● rsync of DVO databases is possible
● nightly: download mini db and run dvomerge yourself● monthly (or longer): download master dvo db
● total DVO data volume at end of survey ~30 TB? (probably less)● DVO dbs split by survey
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/DVO_TopLevel
Pan-STARRS Seminar IfA 2012.09.21
DVO : What is it?● DVO = Desktop Virtual Observatory● Database to track astronomy outputs
● detections + objects● image parameters (astrometry & photometry)● photometry zero points
● Used by IPP for quality assurance & calibration (astrom + photom)● High-throughput detection + object correlations● Note: DVO is not a fully-featured relational database ● But: may be interesting to end users, complements PSPS● DVO databases (or subsets) can be copied and used locally● DVO has built in visualization language
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/DVO_TopLevel
Pan-STARRS Seminar IfA 2012.09.21
DVO : Data Tables● Data stored as FITS Tables● Tables are autocode-defined
● Versioning is simple● Current Schema = PS1_V4
Images
Photcodes
Transparency
Cameras
Main Observational DataSkyRegions
Filters
Static Objects Average
Detections Measure
Object Data distributed on sky
Static Objects SecFilt
Non-Detections Missing
Other Data
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/DVO_TopLevel
Pan-STARRS Seminar IfA 2012.09.21
DVO : data organization● some tables are partitioned by sky region● sky regions are RA,DEC bounded● sky regions completely defined by definition table● hierarchical table grouping (eg: fullsky, dec bands, ra segments...)● table associated with a host (using an abstracted name)
skyregions
images
images
objects objects
objects
detections detections
detections
objects
detections
objects 1
objects 2
objects 3
objects 4
images 1
images 2
http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/DVO_TopLevel
Pan-STARRS Seminar IfA 2012.09.21
DVO : Sky Partitioning● Sky regions bounded by lines of constant RA, DEC ● partitioning increasing input / output speed for most queries● region size scales with stellar density, size is adjustable● any subset of the region files may be copied elsewhere (local DVO)
Pan-STARRS Seminar IfA 2012.09.21
DVO Sky Partitioning vs Projection Cells / Sky Cells● Projection Cells:
● Overlapping tangent planes● Define images of the sky
● DVO Sky Partition● bounded by RA,DEC● Define db catalogs
The RINGS.V3 Tessalation
Pan-STARRS Seminar IfA 2012.09.21
DVO Sky Partitioning vs the Three Pi Survey Tessalations● Three Pi Tessalation defines telescope pointings for 3pi Survey.● 6 related tessalations for successive epochs : 0.5-1 deg rotations● 5466 fields for 3pi Survey blue 3π survey tess.
green eclipticblack galactic
Pan-STARRS Seminar IfA 2012.09.21
DVO shell : visualization● rich data language for data
visualization● SQL-like queries
– avextract ra, dec, i:ave, 2MASS_J where (i:rel - 2MASS_J < 2.0)– mextract ra, dec, time, i:ave, i:rel
region 0.0 25.0 90.0 sinstyle -c red; cgridstyle -c black; imagesplot-landoltplot-sdss
region -25.5 -12.8 6.0style -c blue; pcat -allstyle -c black; images
Pan-STARRS Seminar IfA 2012.09.21
DVO shell : Praesepe Example
region 130.0 19.3 1.6images; pcat -c red -lw 2pmeasure -all -m 8 12 -pt 7 -c blue -photcode 2MASS_J
Pan-STARRS Seminar IfA 2012.09.21
DVO shell : Praesepe Example
uRA (milli-arcsec / year)
uDEC (milli-arcsec / year)
-50.0 +50.0
-50.0
-50.0
Pan-STARRS Seminar IfA 2012.09.21
Personal DVO● 3pi Survey at end of mission:
– 5x109 objects = 800 GB– 1011 detections = 12TB– 90% of detections / objects at |b| < 10 degrees– outside of Galactic Plane : 50 MB / square degree– inside Galactic Plan : 1.5GB / square degree
● End Users may have local working copy of region of interest– carry your fields on your laptop!
● Transition to PSPS – PSPS will carry IPP DVO object IDs– DVO shell will be able to query PSPS if desired
Pan-STARRS Seminar IfA 2012.09.21
Photcodes● Defines the photometric system of a magnitude● Three classes
● 'average' photcodes (SEC)● 'measure' photcodes (DEP)● 'reference' photcodes (REF)
● a photcode defines:● numerical code (for db)● name (eg, g_PS1, GPC1.r.XY33, or 2MASS_J)● type (SEC, DEP, REF)● equivalent photcode (transformation target)● photometry transformation coefficients:● M
target = M
source + ZP + K
z(secz - 1.0) + Sum(A
c,icolori)
● systematic error, flags● 'average' photcodes have ZP ~ 0.0● 'measure' photcodes have ZP of telescope + camera + filter● 'reference' photcodes have ZP == 0.0
Pan-STARRS Seminar IfA 2012.09.21
DVO queries and photcodes● mextract, avextract can return magnitudes:● interpretation is somewhat context dependent:
● mextract mag -- all 'measure' magnitudes ● mextract g -- all g-equivalent magnitudes or NaN● mextract GPC1.g.XY11 -- all g-equivalent magnitudes or NaN● mextract mag:inst -- 'measure' mags as instrumental● mextract g:inst -- g-equiv as instrumental● mextract g:sys, g:cat, g:rel -- other magnitude versions● mextract g:err -- error on 'measure' mags● mextract g:ave, g:inst -- join 'average' to 'measure'● avextract g, r, i -- 'average' magnitudes● avextract g, 2MASS_J -- limited join to 'measure' (first match)● avextract g, g:err, g:chisq -- magnitude, error on ave, chi-square● avextract g:ncode -- number of measurements in photcode● avextract g:nphot -- number in photcode used for photom.