Constraining Astronomical Populations with Truncated Data Sets

Constraining Astronomical Populations with Truncated Data

Brandon C. Kelly (CfA, Hubble Fellow, bckelly@cfa.harvard.edu)

04/20/23 Brandon C. Kelly, bckelly@cfa.harvard.edu

Goal of Many Surveys: Understand the distribution and evolution of astronomical

populations

Goal of Many Surveys: Understand the distribution and evolution of astronomical

populations

But all we can observe (measure) is the light (flux density) and location of sources on the sky!But all we can observe (measure) is the light (flux density) and location of sources on the sky!

• How does the growth of supermassive black holes change over time?

• How was the stellar mass of galaxies assembled?• What is the distribution of black hole spin for

supermassive black holes? How does this evolve?

A motivating exampleA motivating example

• Recent advances in modeling of stellar evolution have made it possible to relate a galaxy’s physical parameters (e.g., mass, star formation history) to its measured fluxes

• Opens up possibility of studying evolution of galaxy population, and, in particular, evolution in the distribution of their physical quantities, and not just their measurable ones.

Simple vs. Advanced ApproachSimple vs. Advanced Approach

Simple but not Self-consistentSimple but not Self-consistent• Derive ‘best-fit’ estimates

for quantities of interest (e.g., mass, age, BH spin)

• Do this individually for each source

• Infer distribution and evolution directly from the estimates

• Provides a biased estimate of distribution and evolution

• Derive ‘best-fit’ estimates for quantities of interest (e.g., mass, age, BH spin)

• Do this individually for each source

• Infer distribution and evolution directly from the estimates

• Provides a biased estimate of distribution and evolution

Advanced and Self-ConsistentAdvanced and Self-Consistent• Derive distribution and

evolution of quantities of interest directly from observed distribution of measurable quantities

• Circumvents fitting of individual sources independently

• Self-consistently accounts for uncertainty in derived quantities and selection effects (e.g., flux limit)

• Derive distribution and evolution of quantities of interest directly from observed distribution of measurable quantities

• Circumvents fitting of individual sources independently

• Self-consistently accounts for uncertainty in derived quantities and selection effects (e.g., flux limit)

The Posterior Distribution: How to quantitatively relate the distribution of physical quantities to measurable ones

• Define p(y|x) as the measurement model, it relates the physical quantities, x, to the measured ones, y

• Define p(x|θ) as the model distribution for the physical quantities

• The posterior probability distribution of the values of x (physical quantities) and θ (parameterizes distribution of x), given the values of y (measured quantities) for the n data points:

p(x,θ | y)∝ p(y | x)p(x |θ)p(θ)

Incorporating the flux limit (truncation)Incorporating the flux limit (truncation)

1. If there is a flux limit (data truncation), denote Det(y) to be the selection function (probability of detection as a function of y). We need to normalize the posterior by the detection probability as a function of θ, Det(θ):

2. The probability distribution of the physical (missing) quantities for each source, x, and the parameters for the distribution of x, θ, given the n observed values of y, is then

Det(θ) = Det(y)∫ p(y |θ)dy

= Det(y) p(y | x)p(x |θ)∫ dx[ ]∫ dy

p(x,θ | y)∝ p(θ) Det(θ)[ ]−n

p(y i | x i)p(x i |θ)i=1

But, there are some computational complications…

• Expected fluxes are a highly non-linear, non-monotonic function of the physical parameters– Leads to multiple modes in p(y|x), and thus in the

posterior

• Calculation of expected flux for a given physical parameter set is very computationally intensive, based on running a complex computer model for stellar evolution– Typical to run model on a grid first, and then use a look-up

ExampleExample

Posterior Probability Distribution

Additional problems when there is truncation (e.g., a flux limit)

• No simple way to calculate Det(θ):

• Naïve method: Simulate a sample given the model, θ, and count the fraction of sources that are detected

• Unfortunately, this stochastic integral introduces error in Det(θ), and posterior is unstable to even small errors in Det(θ)

Det(θ) = Det(y) p(y | x)p(x |θ)∫ dx[ ]∫ dy

Example: Estimating a Luminosity Function (Distribution)

• Simulate galaxy luminosities from a Schechter function (i.e., a gamma distribution)

• Keep L > LLIM = L* (~ 30% detection fraction)

• Estimate Det(α,L*) stochastically:– For each (α,L*) simulate a sample

of 1000 and 10,000 luminosities– Keep those for which L > LLIM

p(L |α ,L*)∝ Lαe−L /L*

,α = 0, L* =1

Statistical and computational problems, and directions for future work

• Need to have more efficient algorithms– Modern and future surveys will produce tens to hundreds of thousands of

data points with several parameters (e.g., flux densities) each, how to efficiently do statistical inference (e.g., MCMC)?

– Potential algorithms need to handle multimodality in the posterior/likelihood function

• Need to efficiently and accurately compute the multi-dimensional integral for the detection probability– Alternatively, need to efficiently account for uncertainty in a more efficient

but less accurate integration method, e.g., stochastic integration

• Need to have an accurate and efficient method for interpolating the output from computationally intensive computer models (e.g., stellar evolution)

– Statistical emulators should help here

Example: The Quasar Black Hole Mass Function (Distribution)

LuminosityLuminosity

Flux LimitFlux Limit

LuminosityLuminosity

Intrinsic Distributionof MeasurablesIntrinsic Distributionof Measurables

SelectionEffectsSelectionEffects

Observed Distributionof MeasurablesObserved Distributionof Measurables

Intrinsic DistributionOf Derived QuantitiesIntrinsic DistributionOf Derived Quantities

Black Hole MassBlack Hole Mass

Eddington RatioEddington RatioEmission LineWidthEmission LineWidth

Emission LineWidthEmission LineWidth

Example on Real Data: The Quasar Black Hole Mass Function (Distribution)

From Kelly et al. (2010, ApJ, 719, 1315)From Kelly et al. (2010, ApJ, 719, 1315)

Constraining Astronomical Populations with Truncated Data Sets

distribution of x

model distribution

posterior distribution

sourceinfer distribution

evolution of quantities

kelly cfa

physical missing quantities

physical parametersleads

Documents

Euclid: Constraining ensemble photometric redshift ...

Transfer matrix of a truncated cone with viscothermal losses...

TRUNCATED PLATONIC SOLID DAN TRUNCATED …

Appendix A Useful Astronomical and Physical...

Truncated Shabbat Evening Service

Constraining planet structure from stellar chemistry ·...

Constraining the Leviathan

Constraining Sedimentary Structure Using Frequency ...

Constraining Local Dislocation dialect-geographically

Truncated Dodecahedron by Miyuki Kawamura

Nonparametric Methods for Doubly Truncated DataNonparametric...

Bruce Bassett - Constraining Exotic Cosmologies

Tianjin Case-July 2016-truncated

Censored Data and Truncated Distributions

Constraining Astronomical Populations with Truncated Data...

Efficient Gibbs Sampling of Truncated