Top Banner
BigSkyEarth II: Feature extraction from time-series data Ashish Mahabal Center for Data-Driven Discovery, Caltech 5 April 2016
27

06 ashish mahabal bse2

Apr 14, 2017

Download

Science

Marco Quartulli
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 06 ashish mahabal bse2

BigSkyEarth II: Feature extraction from time-series data

Ashish Mahabal Center for Data-Driven Discovery, Caltech

5 April 2016

Page 2: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Transformations in astronomy

Shallow to deep; Small to large; Sporadic to repeated; more wavelengths

Move towards digital movies!

Finding

transients Doing new

science 2014-06-16 Ashish Mahabal

3

Page 3: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Something that has a large brightness change (delta-magnitude) within a short timespan (small delta-time)

What is a transient?

4

Page 4: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Phenomenological variety

Flare Star Dwarf Nova Blazar

Catalina Real-time Transient Survey (CRTS)5

Page 5: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Supernova from SN Hunt

6

Page 6: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Binary Black-holes

PG 1302-102 Graham et al.

7

Page 7: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Variability Tree

8

Page 8: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

irregular light curves ●●●●

● ●

16

17

18

19

20

55250 55500 55750 56000Mean Julian Date

Mag

nitu

de

9

Page 9: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

GPR fitted to light curves

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●19

20

210 1000 2000

Day

Mag

nitu

de

AGN

19.0

19.5

20.0

20.5

21.0

0 1000 2000Day

Mag

nitu

de

SN

●●●●●●●●●●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●●

●●●●

●●●●● ●●

●●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

14

16

18

20

0 1000 2000Day

Mag

nitu

deFlare

●●●● ●●●●●●●● ●

●●● ●●

●● ●●●●

●●●●●●●●●●●●●●●●

●●●● ● ●●

●●●●●●●●

16

17

18

19

20

210 1000 2000

Day

Mag

nitu

de

Non Transient

10

Page 10: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

A separate Kepler AGN

CRTS AGN

Juxtaposition of surveys to get training sets for newer ones

11

Page 11: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Regular versus irregular

• Financial time series and predictions

• Variety of tools

• Statistical features

12

Page 12: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Challenge: A Variety of Parameters •  Discovery: magnitudes, delta-magnitudes •  Contextual:

–  Distance to nearest star –  Magnitude of the star –  Color of that star –  Normalized distance to nearest galaxy –  Distance to nearest radio source –  Flux of nearest radio source –  Galactic latitude

•  Follow-up –  Colors (g-r, r-I, i-z etc.)

•  Prior classifications (event type) •  Characteristics from light-curve

–  Amplitude –  Median buffer range percentage –  Standard deviation –  Stetson k –  Flux percentile ratio mid80 –  Prior outburst statistic

Not all parameters are always present leading to swiss-cheese like data sets

http://ki-media.blogspot.com/

Measures from Feigelson and Babu (Graham) New lightcurve-based parameters: (Faraway) • Whole curve measures • Fitted curve measures • Residual from fit measures • Cluster measures • Other

13

Page 13: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

beyond1std skew

Amplitude

freq_signif

freq_varrat

freq_y_offset

freq_model_max_delta_mag freq_model_min_delta_mag

freq_model_phi1_phi2

freq_rrd

freq_n_alias

flux_%_mid20 flux_%_mid35 flux_%_mid50 flux_%_mid65 flux_%_mid80

linear_trend

max_slope

MAD

median_buffer_range_percentage

pair_slope_trend

percent_amplitude

percent_difference_flux_percentile QSO non_QSO

std

small_kurtosis

stetson_j stetson_k

scatter_res_raw

p2p_scatter_2praw

p2p_scatter_over_mad

p2p_scatter_pfold_over_mad medperc90_p2_p

fold_2p_slope_10% fold_2p_slope_90%

p2p_ssqr_diff_over_var

Many features - not all are independent

Adam Miller

15 Jan 2015 Ashish Mahabal 20

14

Page 14: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Importance of cadence

• Cadence wars with the LSST

• Asteroids versus fast transients versus periodic objects versus cosmology

• Constraints due to promises made

• Mechanical constraints

15

Page 15: 06 ashish mahabal bse2

16

Page 16: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

continuing cadence meetingsSensitivity (visit? coadd?) by filter (especially u and g), needed for several (many? all?) variable types

Phased uniformity (periodic variables): for a given period how uniformly would the lightcurve be sampled?*

Window function (per filter/all filters) FWHM, ...statistics of revisit time histogram (per filter/all filters) e.g. min/max/median/5th & 95th percentiles

Hour angle distribution (to check aliasing), at a given sky position, maximum difference, rms ...

17

Page 17: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Follow-up a huge issue• Depth is a big difference: faint sources

• Characterization/ML needed: part of it only from LSST

• A lot of LSST science may get done before LSST

• Synergy with GMT/TMT

• LSST as a follow-up device in the days of aLIGO

18

Page 18: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Optimization more than in Tzolk’in

Rohit Gawande

Victory points == science

Temples Technologies Currency Buildings Resources

Large number of variables and each player wants to win.

19

Page 19: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Optimizing is (generally) a zero-sum game

Easy to make the survey “greatest” in one science

Optimization means compromise

BUT, the sum of parts is GREATER than the whole i.e. compromise does NOT mean sacrificeIn other words, the players are NOT playing AGAINST each other

It’s the best middle ground we are seeking

LSST is its own follow-up machine in a proactive way. By coming up with a good cadence we can minimize the follow-up needed. And you can help. And get the science you love done in the process.

20

Page 20: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

CRTS light-curves

• 500M, most with hundreds of points over 10 years

• Most are non-variable (as a rule)

• Processing “done” for all SSS (100M+)

• Great training set for many procedures

21

Page 21: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Caltech service• Not for mass-computation: http://nirgun.caltech.edu:8000/scripts/description.html

• Mass processing: xsede, IUCAA, other …

22

Page 22: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Faraway, Mahabal et al. methods

Recursive Partitioning Param n

Type n

Numbers/names not for reading

J Faraway

10/17/2014 Ashish Mahabal, IACS 58

23

Page 23: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

domain knowledge • Peakiness - SNe separator

• Period finding variations

• Detailed features for a subset

Non-SNe (1) SNe (2)

1

2

2

2

1

1

Using 900 non-SNe and 600 SNe

80-90% completeness using just these parameters

Mahabal, Ball

24

Page 24: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

dimensionality reductionFeature selection strategies

Donalek et al. arXiv:1310.1976

•  Fast Relief Algorithm (wt and threshold)

•  Fisher Discriminant Ratio •  Correlation based Feature

Selection •  Fast Correlation Based Filter •  Multi Class Feature Selection

25

Page 25: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Using features for clustering

• Using similarity of features to seed larger unknown sets

• Exercise based on that …

• Deep learning?Open

questions26

Page 26: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Augmenting light-curves with newer points

• Millions of sources observed each night

• Appending those points to all those light-curves

• Recomputing stats and follow-up characterizations

Open questions

27

Page 27: 06 ashish mahabal bse2

Ashish Mahabal - BSE II

Summary

• Astronomical time-series tend to be more gappy, heteroskedastic, diverse

• A great variety of statistical and domain-based features extractable

• Many challenges and interesting problems remain

28