ASTRONOMY 6523
Spring 2013Signal Modeling, Statistical Inference and Data Mining in Astrophysics
Professor: Jim Cordes
Place and Time: 622 Space Sciences Building, TTh 2:55-4:10 p.m.
Text: Bayesian Logical Data Analysis for the Physical Sciences, P. C. Gregory
Additional References: Unpublished notes & selected articles
Probability, Random Variables & Stochastic Processes, A. Papoulis
Bayesian Inference in Statistical Analysis, G.E.P. Box & G. C. Tiao
Probability Theory, E. Jaynes
Aims of the Course: The emphasis is on statistical descriptions, analysis, detection, inference;
model building and model fitting to empirical data.
Techniques will be demonstrated through case studies encountered in
astronomy and elsewhere and also with data challenges.
Responsibilities: Attending lectures and asking questions
Problem sets (analytical & computational)
Short projects
Term project
Final oral exam
Office, etc: 520 SSB, [email protected], 607 255-0608
Web Page: http://www.astro.cornell.edu/∼cordes/A6523
Written Materials: Instructor’s notes
Articles from astrophysical, geophysical and engineering literature
Assignments: Grading criteria include legibility, grammar, correctness, and completeness
Project:
Topic and Abstract: Due 12 March in written form and presented to class (5 min)
In class presentation: Week 12 or 13 into the semester (∼ 15 minutes)
Written report: Due during finals week;
Text edited, In journal article style, Bibliography, Plots: labeled axes,
Grading: legibility, grammar, correctness, completeness
Computations: You can use any language or package you like
(MATLAB, IDL, Python, Mathematica; C, C++, Fortran, etc.)
– 2 –
Main Topic Blocks:
1. Linear Systems and Basis Vectors
2. Probability and Stochastic Processes
3. Spectral Analysis
4. Statistical Inference (Frequentist and Bayesian)
5. Model Fitting
6. Localization Methods
7. Detection Applications
8. Classification Applications
9. Tests and Tools:
(a) Detection methods (false alarms, ROC curves)
(b) Tests: whiteness, Gaussianity, stationarity, Markovianity, chaos vs stochastic
processes . . .
(c) Bayesian priors, marginalization, and odds ratio
(d) Extreme value and order statistics
(e) Correlation functions, structure functions, and bispectra
(f) Principal component analysis (PCA)
(g) Phase retrieval methods (deconvolution)
(h) Simulation methods
(i) Optimization and sampling (simulated annealing, genetic algorithms, Markov
Chain Monte Carlo)
10. Case studies:
(a) Modeling state changes in astrophysical objects with Markov processes
(b) Detecting gravitational waves (stochastic, CW/Chirped, bursts)
(c) Characterizing processes on the sphere (e.g. Cosmic Microwave Background)
(d) Wave propagation through random media
(e) Optimal model fitting against arbitrary kinds of additive noise (especially “red”
noise)
(f) Image formation and processing
(g) Classifiers
ASTRONOMY 6523
Spring 2013Signal Modeling, Statistical Inference and Data Mining in
Astrophysics
Course Approach
The philosophy of the course reflects that of the instructor, who takes a du-
alistic view about information, data, science and engineering. It recognizes
the rich complexity of signals and phenomena we wish to identify and ana-
lyze while taking a minimalist reductionist view when choosing and applying
analysis techniques.
My goal is to present material that allows you to understand and derive
algorithms to a sufficient level that you could write the necessary code for
their implementation. This does not mean that you always should write your
own code. After all, many fine packages implement arsenals of tools that can
be used (IDL, MATLAB, Mathematical, etc.). I suggest these rules of thumb:
• Avoid using a “canned” program in one of these packages unless you
you can write down the underlying mathematics and could derive and
program the algorithm.
• “Experiment” with programs after you understand what they should do
mathematically.
• Avoid experimenting with a program to try to infer or reverse engineer
what it is claimed to do; this is very inefficient.
• Programs may not do what they claim to do or they may have built-in,
restrictive assumptions.
• Always test code with toy examples before using in an important appli-
cation.
– 2 –
A Minimal List of Signal Analysis Themes:
1. Frequentist vs. Bayesian methods
2. Detection and Discovery
3. Matched Filtering and Optimization
4. The Central Limit Theorem and Non-Gaussianity
5. Basis Vectors and Compact Support
6. Aliasing and the pros and cons of uniform sampling
7. Finding phase
8. Deconvolution Tricks (inverse problems)
9. Doing the forward problem to solve an inverse problem
10. Defeating the Uncertainty Principle
11. Deterministic vs. Chaotic vs. Stochastic signals
12. Comparing models and hypotheses (statistical inference)
13. Space Exploration:
searching parameter spaces of high dimensionality
14. Analog vs. Digital: the effects of quantization in both x and y
– 3 –
Signal Analysis Themes:
1. Frequentist vs. Bayesian methods:
Two approaches to probability translate into two broad approaches
to data analysis and inference. One considers the outcomes of ex-
periments in terms of frequency of occurrence and a hypothetical
ensemble while the other ties probability to the state of knowl-
edge before and after acquiring and analyzing data sets. The
approach of the course is dualistic.
2. Detection and Discovery:
Discovering new phenomena and objects are central to obser-
vational science. In astronomy, many challenges boil down to
finding weak signals buried in noise. Finding signals or patterns
amid clutter is another data mining problem that we will address.
3. Matched Filtering and Optimization:
Matched filtering typically means fitting a noisy data set with
a signal template that is identical to the ‘true’ signal, usually
through a convolution method. Matched filtering optimizes the
signal-to-noise ratio of a ‘test statistic.’ The notion of matched
filtering can be extended to many procedures, including:
— testing the existence of a signal in a data set (detection)
— least-squares fitting of functions to data
— estimating the time of arrival of a pulse (testing relativity
with pulsar timing)
— centroid frequency or velocity of a spectral line (red shifts,
exoplanets)
— template matching of predicted gravitational waveforms to
gravity detector data
– 4 –
4. The Central Limit Theorem and Non-Gaussianity:
The factor√N is ubiquitous in statistical modeling and anal-
ysis, as we all know from error analysis in laboratory contexts.
However, it appears in many other instances, including errors on
least-squares estimated parameters, power spectra, etc. and is
directly related to the CLT, which describes convergence of the
underlying PDF to a Gaussian ≡ normal form.
— incoherent summing procedures
— coherent summing procedures
5. Basis Vectors and Compact Support:
Spectral analysis often means “analyze the power spectrum that
is based on the Fourier Transform.” More generally, the goal
is often to characterize measurements with the smallest number
of underlying basis vectors. Fourier basis vectors (sinusoids) are
appropriate in some contexts but not others.
(a) When is Fourier analysis appropriate? When not?
(b) Other bases: wavelets, spherical harmonics, etc.
(c) Principal Component Analysis: let the data determine the
best basis vectors.
6. Aliasing and the pros and cons of uniform sampling:
Aliasing is the appearance (in Fourier analysis) of signal com-
ponents at the wrong apparent frequencies. Counteracting it
involves understanding the sampling theorem and the role of
uniform sampling. In some instances, nonuniform sampling is
beneficial for aliasing, but can make the analysis more difficult.
Techniques for spectral analysis will be discussed for the case of
non-uniform sampling.
– 5 –
7. Finding Phase:
An often encountered problem consists of inferring a function
when only the magnitude of its Fourier transform is known.
Bootstrapping the inference can be done by using additional
information or by imposing conditions on the function, such
as causality and positivity (phase retrieval). In some contexts,
phase may be more important than amplitude.
8. Deconvolution Tricks (inverse problems):
Often a measurement x(t) is the convolution of a quantity of
interest y(t) and a filter or smearing function h(t). (Many nat-
ural phenomena can be characterized by such linear systems.)
Typically the integral x(t) =�dt�y(t�)h(t− t�) partially destroys
information about y(t). Deconvolution means to estimate y(t)
from the measurements, x(t). This can be done in approximate
ways that are limited by the information-destroying aspects of
h(t) and also by the finite S/N of the measurements.
9. Doing the forward problem to solve an inverse problem:
Rather than attempting deconvolution, one can simulate the
measurement process by using trial functions or processes y(t),
passing them through the filter h(t) to obtain the trial x(t), which
is compared to the actual measurements x(t). Thus we test mod-
els in “measurement space.” By iterating, the procedure may
converge to a consistent (but usually not unique) answer. This
approach can be far more robust than deconvolution. There are
also instances where even h(t) is not known, so one can use trial
functions for h(t) as part of the iteration process.
– 6 –
10. Defeating the Uncertainty Principle:
For frequency-time variables, the uncertainty principle is∆ν∆t �1. This means that you can’t localize a signal in both time and
frequenecy to arbitrary precision. In some instances, one can do
better than what naive application of the uncertainty principle
would suggest. This is called superresolution in spectral analysis
and imaging applications.
11. Deterministic vs. Chaotic vs. Stochastic signals:
Death and taxes are deterministic events in that they are bound
to happen. But they are also stochastic in that we don’t know
by how much or when taxes may be reduced/increased or when
one will die. Random number generators appear to produce
stochastic output but they actually produce numbers compris-
ing a chaotic process, which is a particular kind of deterministic
process. How can we tell the difference for a measured data set?
Procedures exist for testing the properties of a data set in this
regard.
– 7 –
12. Comparing models and hypotheses (statistical inference):
If we don’t know the best model for a data set or phenomenon
a priori, then somehow we need to determine it from measure-
ments. Statistical inference involves determining the best pa-
rameters given a model, implying that we have some goodness
of fit metric that we apply to determine the best values for the
parameters. This notion can be extended to alternative models
or even hypotheses.
- Frequentist inference
- Bayesian inference
- incorporating prior knowledge and mathematical constraints
- Ensembles and realizations: estimation errors when only one
realization of a process can be measured (e.g. extinction
record over geologic time; cosmic evolution and cosmic vari-
ance).
13. Space Exploration: searching parameter spaces of high dimen-
sionality Statistical inference often involves finding a best-fit, nonlin-
ear solution in a parameter space whose dimensionality is too large to
explore by “brute force.” Methods exist for exploring such spaces that
adopt methods found in nature in thermodynamic or biological contexts.
These include:
- Downhill simplex
- Simulated annealing
- Markov Chain Monte Carlo methods
- Genetic algorithms
- Neural networks
– 8 –
14. Analog vs. Digital:
Often we think about physics etc. in continuous terms while do-
ing computer analysis necessarily with digital quantities. What
are the consequences? Sometimes we exploit extreme types of
quantization to develop a fast algorithm or hardware processor.
— Examples where sampling (digitization) and Fourier analysis
do not commute.
— Correlation spectrometers.
A523 Signal Modeling, Statistical Inference
and Data Mining in Astrophysics Spring 2013
Lecture 1 • Organization:
» Syllabus (text, requirements, topics) » Course approach (goals, themes)
• Book: Gregory, “Bayesian Logical Data Analysis for the Physical Sciences”
• We will cover all the topics in the book plus much more material.
• Heavy use of unpublished notes and articles from the literature
• Numerical assignments: you can use your favorite programming language or software package (note no direct use of Mathematica in this course)
• Grading: legibility and clear explanations in complete sentences are needed for all submitted homework and papers.
A523 Signal Modeling, Statistical Inference
and Data Mining in Astrophysics Spring 2011
Instructor’s focus: • Optimal signal detection at low S/N
» Pulsars, transient signals, low surface brightness objects
• Characterizing astrophysical processes seen in time series
» Deterministic? Chaotic? Stochastic? » Markov proceses and random walks
• Population analyses and modeling » Stellar populations in the Milky Way » Statistical inference of spatial, velocity
distributions of neutron stars » Galactic model of electron-density turbulence
• Data mining in large data sets » Arecibo pulsar/transient survey (103 Terabytes) » RFI mitigation algorithms » Finding astrophysical signals of both known
and unknown types • Detection of gravitational waves using pulsars
! 5+ year data sets ! Exercises in many topics of this course
Basic Course Sections
• Linear systems & Fourier methods • Probability & Random Processes • Statistical inference
• Frequentist • Bayesian
• Spectral analysis • Fourier • generalized (wavelets, PCA, etc.)
• Matched filtering & localization • Exploration of large parameter
spaces
Current Assignment
Reading: 1. “Discrete Fourier
Transforms” Appendix B of Gregory, pages 392 – 416 (continuous FTs, DFTs, FFTs)
2. Problem Set 1: Fourier transforms, due Thurs Jan 31
Course Emphasis
Principles Math and statistical methods
Algorithms
Applications and implementation
Design vs. Inference
Engineering applications
Astrophysics and Space Science
Physics + engineering
Devices, machines, software
Operations, signals
Measurements of photons, non-photonic messengers (GWs, cosmic rays, neutrinos)
Signal processing, statistical inference, hypothesis testing, classification
Physical models, testing of fundamental physics, understanding cosmic evolution
Broad Classes of Problems • Detection, analysis and modeling:
signal detection analysis Natural or artificial
Is it there?
Optimal detection schemes
Maximize S/N of a test statistic
Population of signals:
• maximize detections of real signals
• minimize false positives and false negatives
• null hypothesis: no signal there
What are its properties?
Parametric approaches:
(e.g. least squares fitting of a model with parameters)
Non-parametric approaches:
(e.g. relative comparison of distributions [KS test])
Broad Classes of Problems • Many measured quantitites (“raw
data”) are the outputs of linear systems
• Wave propagation (EM, gravitational, seismic, acoustic !)
• Many signals are the result of nonlinear operations in natural systems or in apparati
• Many analyses of data are linear operations acting on the data to produce some desired result (detection, modeling)
• E.g. Fourier transform based spectral analysis
• Many analyses are nonlinear • E.g. Maximum entropy and Bayesian
spectral analysis
Basic Points • Signal types are defined with respect to
quantization • Continuous signals are easier to work with
analytically, digital signals are what we actually use
• The relationship between digital and analog signals is sometimes trivial, sometimes not
• LSI systems obey the convolution theorem and thus have an impulse response (= Green’s function)
• LSI systems obey superposition • Examples can be found in nature as well as
in devices • The natural basis functions for LSI systems
are exponentials • Causal systems: Laplace transforms • Acausal systems: Fourier transforms
• While LSI systems are important, nonlinear systems and alternative basis functions are highly important in science and engineering
Pulsar Periodicity Search
time
Freq
uenc
y
time
DM
|FFT(f)|
FFT each DM’s time series
1/P2/
P3/
P• • •
Example Time Series and Power Spectrum for a recent PALFA discovery
(follow-up data set shown)
DM = 0 pc cm-3
DM = 217 pc cm-3
Time Series
Where is the pulsar?
Example Time Series and Power Spectrum for a recent PALFA discovery
(follow-up data set shown)
DM = 0 pc cm-3
DM = 217 pc cm-3
Time Series
Here is the pulsar
Spectral analysis as a unifying thread Signals ! Statistics
Spectral analysis: 1. Analysis of variance in a conjugate space
t " f (time and frequency domains)
u,v " " (interferometric images) • Statistical questions about the nature of the signal in
frequency space: a. Is there a signal? b. What is its frequency? c. What is the shape of the spectrum?
1. Basis functions: Sinusoids t " f Spherical harmonics ", # " l,m Wavelets time-frequency atoms Principal components the data determine the basis
The appropriate basis (often) is the one that most compactifies the signal in the conjugate domain
Spectral analysis as a unifying thread
Color coded temperature variations of the cosmic microwave background (CMB)
TCMB = 2.7 K
$T/TCMB ~ 10-5
Wilkinson Microwave Anisotropy Probe
Basis functions: spherical harmonics
TCMB = 2.7 K
$T/TCMB ~ 10-5
Wilkinson Microwave Anisotropy Probe
Detection: the CMB
J. Dunkley, et al., 2009, ApJS, 180, 306-329
Data Inference
Evidence! Confirmation
So we understand the big bang and that there is dark energy
Or maybe not:
“After scrutinizing over seven years’ worth of WMAP data, as well as data from the BOOMERanG balloon experiment in Antarctica, Penrose and Gurzadyn say they have identified a series of concentric circles within the data. These circles show regions in the microwave sky in which the range of the radiation’s temperature is markedly smaller than elsewhere. According to the researchers, the patterns correspond to gravitational waves formed by the collision of black holes in the aeon that preceded our own, and they published these claims in a paper submitted to arXiv” (Physics World).
Galaxy clustering Data from the Sloan Digital Sky Survey
SDSS galaxy distribution (Those with spectra)
Gamma-ray burst locations on the sky
Is there any clustering?
How would you test this?
“Flights within the US were grounded because of the attacks, and incoming international flights were diverted to Canada. Services resumed within a few days but it took years for the market to recover.“
From the BBC web page 04 Sept 2006
Example of a “change point”
Example of a transient event identifiable through data mining of article content:
Is there a periodicity in this time series?
• Repeat for L epochs spanning N=T/P spin periods
• N ~ 108 – 1010 cycles in one year • % P determined to
Basics of Pulsars as Clocks
• Signal average M pulses • Time-tag using template fitting
P !M&P
W
• J1909-3744: eccentricity < 0.00000013 (Jacoby et al.)
• B1937+21: P = 0.0015578064924327±0.0000000000000004 s
Phase residuals from isolated pulsars after subtracting a quadratic polynomial:
If these pulsars were simply spinning down in a smooth way, we would expect residuals that look like white noise:
Are any of these time series periodic? How can we test for periodicity?
Phase residuals from isolated pulsars after subtracting a quadratic polynomial:
If these pulsars were simply spinning down in a smooth way, we would expect residuals that look like white noise:
For these pulsars, the residuals are mostly caused by spin noise in the pulsar
Are any of these time series periodic? How can we test for periodicity?
Noise in Timing Residuals from G. Hobbs
Long period pulsars
MSPs
How Good are Pulsars as Clocks?
Clock processes are similar to random walks or Brownian motion. What are the best ways to characterize such processes?
Pulsars as Gravitational Wave Detectors
Earth
pulsar
pulses
Gravitational wave background
Gravitational wave background
The largest contribution to arrival times is on the time scale of the total data span length (~20 years for best cases)
MSP J1909-3744 P=3 ms + WD
Jacoby et al. (2005)
Weighted 'TOA = 74 ns
Shapiro delay
The best pulsar timing so far:
Correlation Function Between Pulsars
Correlation function of residuals vs angle between pulsars
Example power-law spectrum from merging supermassive black holes (Jaffe & Backer)
Estimation errors from: • dipole term from solar system
ephemeris errors
• red noise in the pulsar clock
• red interstellar noise
Potential PTA Sensitivity NANOGrav+EPTA+PPTA = IPTA