BigSkyEarth II: Feature extraction from time-series data Ashish Mahabal Center for Data-Driven Discovery, Caltech 5 April 2016
BigSkyEarth II: Feature extraction from time-series data
Ashish Mahabal Center for Data-Driven Discovery, Caltech
5 April 2016
Ashish Mahabal - BSE II
Transformations in astronomy
Shallow to deep; Small to large; Sporadic to repeated; more wavelengths
Move towards digital movies!
Finding
transients Doing new
science 2014-06-16 Ashish Mahabal
3
Ashish Mahabal - BSE II
Something that has a large brightness change (delta-magnitude) within a short timespan (small delta-time)
What is a transient?
4
Ashish Mahabal - BSE II
Phenomenological variety
Flare Star Dwarf Nova Blazar
Catalina Real-time Transient Survey (CRTS)5
Ashish Mahabal - BSE II
irregular light curves ●●●●
●
● ●
●
●
●
16
17
18
19
20
55250 55500 55750 56000Mean Julian Date
Mag
nitu
de
9
Ashish Mahabal - BSE II
GPR fitted to light curves
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●19
20
210 1000 2000
Day
Mag
nitu
de
AGN
●
●
●
●
●
●
●
●
●
19.0
19.5
20.0
20.5
21.0
0 1000 2000Day
Mag
nitu
de
SN
●
●
●
●●●●●●●●●●●
●●
●
●●●●
●●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●
●●●
●
●●●
●
●●●●
●
●
●●●●● ●●
●●●
●●●
●●
●●
●●●
●
●
●
●●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●●●
●●
●●
14
16
18
20
0 1000 2000Day
Mag
nitu
deFlare
●●●● ●●●●●●●● ●
●●● ●●
●● ●●●●
●●●●●●●●●●●●●●●●
●●●● ● ●●
●●●●●●●●
16
17
18
19
20
210 1000 2000
Day
Mag
nitu
de
Non Transient
10
Ashish Mahabal - BSE II
A separate Kepler AGN
CRTS AGN
Juxtaposition of surveys to get training sets for newer ones
11
Ashish Mahabal - BSE II
Regular versus irregular
• Financial time series and predictions
• Variety of tools
• Statistical features
12
Ashish Mahabal - BSE II
Challenge: A Variety of Parameters • Discovery: magnitudes, delta-magnitudes • Contextual:
– Distance to nearest star – Magnitude of the star – Color of that star – Normalized distance to nearest galaxy – Distance to nearest radio source – Flux of nearest radio source – Galactic latitude
• Follow-up – Colors (g-r, r-I, i-z etc.)
• Prior classifications (event type) • Characteristics from light-curve
– Amplitude – Median buffer range percentage – Standard deviation – Stetson k – Flux percentile ratio mid80 – Prior outburst statistic
Not all parameters are always present leading to swiss-cheese like data sets
http://ki-media.blogspot.com/
Measures from Feigelson and Babu (Graham) New lightcurve-based parameters: (Faraway) • Whole curve measures • Fitted curve measures • Residual from fit measures • Cluster measures • Other
13
Ashish Mahabal - BSE II
beyond1std skew
Amplitude
freq_signif
freq_varrat
freq_y_offset
freq_model_max_delta_mag freq_model_min_delta_mag
freq_model_phi1_phi2
freq_rrd
freq_n_alias
flux_%_mid20 flux_%_mid35 flux_%_mid50 flux_%_mid65 flux_%_mid80
linear_trend
max_slope
MAD
median_buffer_range_percentage
pair_slope_trend
percent_amplitude
percent_difference_flux_percentile QSO non_QSO
std
small_kurtosis
stetson_j stetson_k
scatter_res_raw
p2p_scatter_2praw
p2p_scatter_over_mad
p2p_scatter_pfold_over_mad medperc90_p2_p
fold_2p_slope_10% fold_2p_slope_90%
p2p_ssqr_diff_over_var
Many features - not all are independent
Adam Miller
15 Jan 2015 Ashish Mahabal 20
14
Ashish Mahabal - BSE II
Importance of cadence
• Cadence wars with the LSST
• Asteroids versus fast transients versus periodic objects versus cosmology
• Constraints due to promises made
• Mechanical constraints
15
Ashish Mahabal - BSE II
continuing cadence meetingsSensitivity (visit? coadd?) by filter (especially u and g), needed for several (many? all?) variable types
Phased uniformity (periodic variables): for a given period how uniformly would the lightcurve be sampled?*
Window function (per filter/all filters) FWHM, ...statistics of revisit time histogram (per filter/all filters) e.g. min/max/median/5th & 95th percentiles
Hour angle distribution (to check aliasing), at a given sky position, maximum difference, rms ...
17
Ashish Mahabal - BSE II
Follow-up a huge issue• Depth is a big difference: faint sources
• Characterization/ML needed: part of it only from LSST
• A lot of LSST science may get done before LSST
• Synergy with GMT/TMT
• LSST as a follow-up device in the days of aLIGO
18
Ashish Mahabal - BSE II
Optimization more than in Tzolk’in
Rohit Gawande
Victory points == science
Temples Technologies Currency Buildings Resources
Large number of variables and each player wants to win.
19
Ashish Mahabal - BSE II
Optimizing is (generally) a zero-sum game
Easy to make the survey “greatest” in one science
Optimization means compromise
BUT, the sum of parts is GREATER than the whole i.e. compromise does NOT mean sacrificeIn other words, the players are NOT playing AGAINST each other
It’s the best middle ground we are seeking
LSST is its own follow-up machine in a proactive way. By coming up with a good cadence we can minimize the follow-up needed. And you can help. And get the science you love done in the process.
20
Ashish Mahabal - BSE II
CRTS light-curves
• 500M, most with hundreds of points over 10 years
• Most are non-variable (as a rule)
• Processing “done” for all SSS (100M+)
• Great training set for many procedures
21
Ashish Mahabal - BSE II
Caltech service• Not for mass-computation: http://nirgun.caltech.edu:8000/scripts/description.html
• Mass processing: xsede, IUCAA, other …
22
Ashish Mahabal - BSE II
Faraway, Mahabal et al. methods
Recursive Partitioning Param n
Type n
Numbers/names not for reading
J Faraway
10/17/2014 Ashish Mahabal, IACS 58
23
Ashish Mahabal - BSE II
domain knowledge • Peakiness - SNe separator
• Period finding variations
• Detailed features for a subset
Non-SNe (1) SNe (2)
1
2
2
2
1
1
Using 900 non-SNe and 600 SNe
80-90% completeness using just these parameters
Mahabal, Ball
24
Ashish Mahabal - BSE II
dimensionality reductionFeature selection strategies
Donalek et al. arXiv:1310.1976
• Fast Relief Algorithm (wt and threshold)
• Fisher Discriminant Ratio • Correlation based Feature
Selection • Fast Correlation Based Filter • Multi Class Feature Selection
25
Ashish Mahabal - BSE II
Using features for clustering
• Using similarity of features to seed larger unknown sets
• Exercise based on that …
• Deep learning?Open
questions26
Ashish Mahabal - BSE II
Augmenting light-curves with newer points
• Millions of sources observed each night
• Appending those points to all those light-curves
• Recomputing stats and follow-up characterizations
Open questions
27