Dictionary Learning for Photometric Redshift Estimation · 2018. 11. 5. · the photometric signal y by imposing sparsity on the learned representation ↵. In addition to the sparsity

Dictionary Learning for PhotometricRedshift Estimation

Joana Frontera-Pons ⇤, Florent Sureau⇤, Bruno Moraes†, Jérôme Bobin⇤, Filipe B. Abdalla† ‡, Jean-Luc Starck⇤⇤ IRFU, CEA, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France. Email: [email protected]† Department of Physics & Astronomy, University College London, Gower Street, London WC1E 6BT, UK‡ Department of Physics and Electronics, Rhodes University, PO Box 94, Grahamstown, 6140, South Africa

Abstract—Photometric redshift estimation and the assessmentof the distance to an astronomic object plays a key role in

modern cosmology. We present in this article a new method

for photometric redshift estimation that relies on sparse linear

representations. The proposed algorithm is based on a sparse

decomposition for rest-frame spectra in a learned dictionary.

Additionally, it provides both an estimate for the redshift together

with the full resolution spectra from the observed photometry for

a given galaxy. This technique has been evaluated on realistic

simulated photometric measurements.

I. INTRODUCTION

Measuring the angular positions of galaxies to the requiredcosmological precision is easily achievable with an opticalgalaxy survey; measuring their radial positions, on the otherhand, is one of the most challenging problems in modern ob-servational cosmology. The way we infer those radial distancesis based on their spectral energy distribution (SED): due tothe expansion of the Cosmos, galaxies are receding from usand their light is consequently redshifted, similar to a DopplerEffect. These redshifts are directly related to the galaxies’distances, and by measuring it from the spectral characteristicsof the received light, we can reconstruct their positions.Here two different approaches need to be distinguished, withtheir own characteristics, advantages and challenges. Mea-suring spectroscopic redshifts consists in observing the fullSED of a galaxy and identifying features that allow a secureredshift determination. Galaxy spectra are a consequence ofa series of relatively well-understood physical phenomena,mostly concerning the nuclear and chemical reactions insidestars and the types and ages of stellar populations within thegalaxy in question (see [1] for a review). Atomic emission andabsorption lines give rise to very distinct peaks and troughs in agalaxy SED, and the secure identification of the wavelength ofsuch a feature can easily be translated into a shift compared tothe known wavelength of such a transition observed in Earth’slaboratories.Photometric redshift measurements, on the other hand, tryto reconstruct the redshift value out of only a handful ofnumbers representing the integrated flux in broadband filters .This is an ill-posed severely underdetermined inverse problemwhere both redshift and spectra needs be estimated from afew photometric measurements. Degeneracies abound, makingresults less precise and possibly biased, but they circumventthe need of a spectrograph and can also reach fainter magni-

tudes, as light is integrated in broad wavelength ranges. Whilespectroscopic redshifts are more accurate than photometricredshifts, their acquisition is time consuming and limited toonly the brightest objects.Most of the techniques for photometric redshift estimationare based either on empirical machine learning approachesor obtained through template-fitting methods [2]. Some of themost popular codes take advantage of neural networks [3], [4],regression trees [5] among others. Other information than fluxsuch as galaxy morphology, colors, etc can also be included intheir redshift estimation to improve their accuracy. However,the major drawback of these methods is that they have to betrained with of a huge amount of representative labelled datafor which the true redshift value needs to be perfectly known.Another family of methods is based on template fitting. Theyare based on matching physically meaningful redshifted rest-frame templates (i.e. without redshift effects) to the observedspectrum, to obtain both redshift and best fit template. Thesetemplate spectra are constructed from theoretical libraries.The most widespread photometric redshift estimation templatefitting code is is LePHARE [6]. These techniques strongly relyon a good template modelling and a deep understanding ofrealistic galaxy SEDs.The main contributions of this article are:

• A new algorithm for photometric redshift estimationbased on rest-frames templates learned from data usingsparse dictionary learning; the complete spectrum of thegalaxies is also recovered;

• The evaluation of the proposed scheme on realistic galaxyphotometric simulations.

II. METHODOLOGYLet us first consider the problem of recovering the full

spectra of a galaxy, x 2 Rws , from photometric measurements,y 2 Rwp , and the vectors’ dimensions satisfy w

p

n represents the noise. Hence, we seek to retrieve the originalsignal x by solving this super-resolution task. This severlyunderdetermined ill-posed inverse problem requires constraintson the spectra x to be solved. We propose to model the spectraas a sparse linear combination of a few learned templates thenredshifted to a tested redshift value ; the best approximationof the photometric data giving the estimated redshift. In thefollowing, we first present how we build our learned rest-framerepresentation for galaxy spectra using sparse dictionary learn-ing, the sparse coding algorithm associated to the recovery ofthe spectra, and finally how we estimate the redshift.

A. Dictionary learning for rest-frame galaxy spectra

The proposed method relies on learning linear representa-tions on rest-frame training data and the spectra are approx-imated by a sparse decomposition, x = D↵. In this context,the dictionary D̂ 2 Rws⇥na with n

a

atoms is constructed froma training set X 2 Rws⇥nt . This training set is composed of n

t

examples disposed in columns and the dictionary is obtainedby solving the joint minimization problem:

D̂, Â = argminD2D,A

||X�DA||2F

s.t. 8i, ||↵i

||0 ⌧ (2)

where Â 2 Rna⇥nt is the matrix of codes and each columncorresponds to the representation for each training example,{↵

i

}. || · ||F

denotes the Frobenius norm, || · ||0 counts thenumber of non-zero entries of a vector and ⌧ is the targetedsparsity degree, D designates the set of dictionaries withatoms in the unit `2 ball. Among the different approaches tosolve (2), we use a technique based on the method of optimaldirection detailed in [7]. This procedure performs alternatelysparse coding by orthogonal matching pursuit and dictionaryupdating. The sparsity degree specified in the sparse codingstage and the number of atoms in the dictionary are freeparameters.

B. Sparse coding for rest-frame galaxy spectra

The original spectroscopic signal x is then retrieved fromthe photometric signal y by imposing sparsity on the learnedrepresentation ↵. In addition to the sparsity constraint, pos-itivity on the reconstructed spectra can also be enforced fora more constrained recovery. Although negative values of thespectra may lead to a better photometry reconstruction, thesesolutions are impossible. Therefore, we need to minimize:

↵̂ = argmin↵

1

2||y �HD↵||22 + � ||↵||1 + IC(D↵) (3)

where IC denotes the indicator function on the spectra setC that enforces non-negativity for the galaxy emitted light.The regularization parameter � controls the trade off betweenthe reconstruction error and the sparsity promoting term. Thevalue of � has been automatically set to be proportional to theestimated noise level �̂ as detailed in [8].To take into account the different constraints and the differ-ential term in the cost function, the optimisation in (3) is

performed with the Generalized Forward-Backward Splittingalgorithm introduced in [9] and recalled in algorithm 1. Theprox operator associated to the `1 norm corresponds to soft-thresholding operator; the one associated to the indicatorfunction has no closed-form expression but was computed withan inner FISTA algorithm on the dual problem, as detailed in[10].

Algorithm 1 Generalized Forward-Backward Splitting

Initialization : k = 0, t1 = 0, t2 = 0, ↵̂ = 0 and � = 3�̂2,while Have not converged do

r = � 1L

DHT (y �HDT ↵̂k�1)

t1 = t1 + prox �L ||·||1

(2 ⇤ ↵̂k�1 � t1 �r)

t2 = t2 + proxIC(·)(2 ⇤ ↵̂k�1 � t2 �r)

↵̂k

= t1+t22end while

return ↵̂

C. Photometric redshift algorithm

Similarly, we can decompose an observed spectrum xz

, at acertain redshift z, according to x

z

= D(z)↵(z). The value ofz is computed as the one providing the closest approximationfor the observed photometric signal y

z

.More specifically, for every tested value of z, the dictionaryD originally built for rest-frame representations is redshiftedto D(z) and we solve an inverse problem as the one describedin (3). Accordingly, we can write for every value of z:

↵̂(z) =argmin↵

1

2||y �HD(z)↵(z)||22

+ � ||↵(z)||1 + IC(D(z)↵(z)) (4)

and solve (4) with algorithm 1 described above. Ultimately,the value of the redshift z is obtained as the solution of thefollowing equation:

ẑ = argminz

||y �HD(z)↵(z)||22||y||22

(5)

Solving problem (5) requires a fine sampling on the range oftested redshifts, which would require solving many problems(4) and would be computationnaly extremely costly. To avoidthis, we propose a coarse-to-fine strategy for redshift testing:we evaluate the approximation error for a hierarchical grid ofz values. In other words, the whole interval that encompassesall possible values of z 2 [z

min

, zmax

] has been uniformlysampled with ten steps, and the minimum among this points,ẑ1, is retained. Then, the explored interval is reduced aroundthis minimum. The new interval is evenly re-sampled at tenpoints yielding a new minima. This process is repeated fivetimes allowing us to build a hierarchical grid for z. Thismethod reduces the computational time while keeping a goodresolution in terms of z and will be illustrated in the followingexperimental section.

D. Comparison with LePHAREIn order to assess the performance of the proposed redshift

algorithm, the proposed algorithm is compared to LePHAREcode [6]. LePHARE is a template-based redshift estimationmethod. It starts from a library of spectroscopic templates builtfrom a wide range of theoretical observations. It then appliesobservational corrections to the spectra and integrates themthrough the defined filter set. For each galaxy, LePHARE in-tegrates all spectra in the library for several redshift test valuesand finds the combination of a spectrum and a redshift valuethat provide the best possible fit to the observed photometricdata. In this way, each galaxy is assigned a best-fit templateand a redshift value.

III. EXPERIMENTAL RESULTSWe present in this section the results obtained with galaxy

simulated spectroscopy for the training stage and simulatedphotometry for testing the algorithms.

A. SimulationsIn this section we present the data used in our studies. The

first step is to define a master catalog for the analyses. Wework with the COSMOSSNAP simulation pipeline [11] togenerate a data set of simulated galaxy SEDs and correspond-ing photometric properties. The idea is to take real data asa basis, thereby ensuring that realistic relationships betweengalaxy type, color, size, redshift and SED are preserved.COSMOSSNAP chooses the COSMOS photometric redshiftcatalog [12], generated from a combination of 30 bands fromdiverse astronomical surveys covering the full spectral rangefrom the UV (GALEX), through the optical (Subaru) and allthe way to infrared bands (CFHT, UKIRT, Spitzer). This dataset is matched to Hubble ACS imaging data, to provide re-alistic size-magnitude distributions, employing weak-lensing-quality shape measurements [13]. Based on these properties,COSMOSSNAP chooses a spectral template from a predefinedlibrary such that the integrated fluxes through the 30 broad-band filters above provide the best-fit to the observations. Eachgalaxy therefore has a “true” redshift and its associated SED,and the distribution of types and redshifts follows the measureddistribution in the COSMOS field. This catalog is the basis forall COSMOSSNAP simulations.

To generate realistic photometric properties, the first stepis to integrate the best-fit spectral template through a set ofbroadband wavelength filters that will be used for a givengalaxy survey. In actuality, the full transmission curve includesnot only filter effects, but also atmospheric transmission (inthe case of ground observations), telescope optical effects andmore. The full transmission curve is commonly referred to asfilter throughput (even though it is not only due to the filteritself). COSMOSSNAP takes a defined set of filter throughputand calculates magnitudes and their corresponding errors foreach galaxy in the catalogue. For the purposes of our analysis,we choose to reproduce closely the expected properties of theLarge Synoptic Survey Telescope [14] (LSST). Fig. 1 showsthe modelled throughputs [15] for our current band selection

represented by H in the problem formulation. Therefore, theredshift value will need to be inferred only from these 6available broadbands (commonly referred to as ’ugrizY’). Atthe end of the generation procedure, we have a realistic mastergalaxy catalogue with magnitudes, colors, shapes and redshiftsfor 538 000 galaxies on an effective 1.24 deg2 region of thesky down to an i-band magnitude of 26.5. To further matchthe expected properties of the LSST Science sample, we limitour catalog to galaxies brighter than 25.3 and with signal-to-noise (S/N) > 10 in the i-band. Imposing these restrictions,we obtain a galaxy catalog with a realistic set of photometricproperties, and best-fit spectral templates with realistic contin-uum and emission line properties. We now need to forward-model the observational process in the spectroscopic case ina manner consistent with expected observational conditions.

Fig. 1: LSST filter throughputs for the considered photometricscenario.

For obtaining realistic spectral templates, we need to re-sample and integrate the best-fit SEDs. As given by thesimulations, these SEDs are pure functional forms. At the endof the observational process, what we obtain is an integratedflux in logarithmic wavelength bins at a resolution of R. Fromthe simulation run described above, we select two randomsubsets.

B. Dictionary Learning

Fig. 2: Example of the subtraction of high-frequency featuresfor rest-frame spectra. The original spectra is represented bya blue solid line and the retained information after emissionlines subtraction is displayed with black circles.

Fig. 3: Example of five atoms learned using dictionary learningand imposing a sparsity degree of 3 on rest-frame spectra.

Firstly, we chose a subset of noiseless low-redshift galaxiesthat have been blueshifted to z = 0 in order to form the train-ing set. Hence, the X is composed of n

t

= 10000 clean rest-frame example spectra covering the range [1250Å, 10499Å]and w

s

= 4258. Moreover, high frequency information fromthese rest-frame spectra has been removed through waveletfiltering retaining four scales and keeping the baseline asillustrated in Fig. 2. Finally, the dictionary D is learned byspecifying the desired sparsity degree ⌧ = 3 and the numberof atoms of the dictionary n

a

= 40. The code developedin C++ was iterated for 100 repetitions which allowed forconvergence in the dictionary estimation measured as theaveraged approximation error variations through iterations.Fig. 3 displays five atoms from the adapted dictionary usedfrom now on.

C. Redshift estimation

Secondly, the testing is performed on a different randomlyselected subset. We have evaluated the algorithm on n = 1000galaxies lying in a redshift range of z 2 [0, 1] and includingonly w

p

= 6 photometric measures for each galaxy.Let us now discuss the results obtained for redshift estima-

tion in the simulated catalogue.The considered strategy of building a hierarchical grid meshfor testing the different z values is illustrated in Fig. 4. Thegrid search starts by exploring the whole z 2 [0, 1] intervaland the approximation error as a function of the tested redshiftis depicted in Fig. 4 (a). Hence, the minimum is chosen andthe considered interval is reduced in Fig. 4 (b). We repeatthe process five times to achieve the desired resolution in z.The smoothness of the approximation curves as a function ofredshift allows to attain the same minima with this hierarchicalapproach as the one obtained with a one level grid with a muchfiner resolution as shown in Fig. 5, although the computationaltime is significantly lower, which justifies the choice of ourapproach.

Fig. 6 displays the estimated redshift for all the galaxiesin the test set with respect to their true redshift value. Theperformance of the method is quantified through the bias overthe entire test set h�

z

i = hzest

� ztrue

i = �0.004, and the68th percentile scatter �68 = 0.0475. Then, one can define

the number of catastrophic failures as those galaxies fallingoutside 3�68, yielding ⌫ = 53.

Finally, Fig. 7 shows the results of the simulated cataloguewith LePHARE photometric estimation. The correspondingbias is h�

z

i = 0.0421, the 68th percentile scatter �68 = 0.0708and the number of catastrophic failures ⌫ = 22.

It is important to point out two main differences with ouralgorithm. On one hand, the templates used in the LePHAREcode are theoretical while ours are derived directly fromthe data. Moreover, while LePHARE is based on templatefitting, the proposed method allows for a linear combinationof more than one template leading to greater flexibility andrepresentational capacity.

IV. CONCLUSION

We have introduced a new method to compute redshift fromphotometric data. The proposed algorithm allows to recoverthe full-spectra of the galaxies from broad-band photometrysolving a super-resolution problem. This estimation schemehas been analyzed on simulated galaxies’ spectra and com-pared to classical LePHARE code.

Further developments will explore other representation ap-proaches where the emission lines are included. The per-formances will be compared to other photometric redshiftestimation based on machine learning as ANNz2 [4]. Finally,we aim to investigate the performance of this algorithm onreal photometric data.

ACKNOWLEDGMENT

This work is funded by the DEDALE project (contract no.665044) and LENA (ERC StG no. 678282) within the H2020Framework Program of the European Commission.

REFERENCES

[1] H. Mo, F. Van den Bosch, and S. White, Galaxy formation and evolution.Cambridge University Press, 2010.

[2] H. Hildebrandt, S. Arnouts, P. Capak, L. Moustakas, C. Wolf, F. B.Abdalla, R. Assef, M. Banerji, N. Benı́tez, G. Brammer et al., “Phat:Photo-z accuracy testing,” Astronomy & Astrophysics, vol. 523, p. A31,2010.

[3] R. Tagliaferri, G. Longo, S. Andreon, S. Capozziello, C. Donalek, andG. Giordano, “Neural networks for photometric redshifts evaluation,” inItalian Workshop on Neural Nets. Springer, 2003, pp. 226–234.

[4] I. Sadeh, F. B. Abdalla, and O. Lahav, “Annz2: photometric redshiftand probability distribution function estimation using machine learning,”Publications of the Astronomical Society of the Pacific, vol. 128, no. 968,p. 104502, 2016.

[5] A. Boselli, A panchromatic view of galaxies. John Wiley & Sons, 2012.[6] S. Arnouts and O. Ilbert, “Lephare: Photometric analysis for redshift

estimate,” Astrophysics Source Code Library, 2011.[7] K. Engan, S. O. Aase, and J. H. Husoy, “Method of optimal directions

for frame design,” in Acoustics, Speech, and Signal Processing, 1999.Proceedings., 1999 IEEE International Conference on, vol. 5. IEEE,1999, pp. 2443–2446.

[8] D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by waveletshrinkage,” biometrika, vol. 81, no. 3, pp. 425–455, 1994.

[9] H. Raguet, J. Fadili, and G. Peyré, “A generalized forward-backwardsplitting,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1199–1226, 2013.

(a) (b) (c)

(d) (e)

Fig. 4: Different levels in the hierarchical grid mesh for testing the values of z. The whole z range is explored in (a) and theminimum is computed at each layer reducing the considered interval up to the finest resolution in (e).

Fig. 5: One-level grid uniformly sampled at 100 steps betweenz = 0 and z = 1.

[10] J. Rapin, J. Bobin, A. Larue, and J.-L. Starck, “NMF with sparseregularizations in transformed domains,” SIAM journal on ImagingSciences, vol. 7, no. 4, pp. 2020–2047, 2014.

[11] S. Jouvel, J.-P. Kneib, O. Ilbert, G. Bernstein, S. Arnouts, T. Dahlen,A. Ealet, B. Milliard, H. Aussel, P. Capak et al., “Designing future darkenergy space missions-i. building realistic galaxy spectro-photometriccatalogs and their first applications,” Astronomy & Astrophysics, vol.504, no. 2, pp. 359–371, 2009.

[12] O. Ilbert, P. Capak, M. Salvato, H. Aussel, H. McCracken, D. Sanders,N. Scoville, J. Kartaltepe, S. Arnouts, E. Le Floc’h et al., “Cosmosphotometric redshifts with 30-bands for 2-deg2,” The AstrophysicalJournal, vol. 690, no. 2, p. 1236, 2008.

[13] A. Leauthaud, R. Massey, J.-P. Kneib, J. Rhodes, D. E. Johnston et al.,“Weak gravitational lensing with cosmos: galaxy selection and shapemeasurements,” The Astrophysical Journal Supplement Series, vol. 172,no. 1, p. 219, 2007.

[14] https://www.lsst.org/.[15] https://github.com/lsst/throughputs.

Fig. 6: True vs estimated redshifts for the proposed dictionarylearning photometric redshift estimation algorithm.

Fig. 7: True vs estimated redshifts from the benchmark LeP-HARE code.

Dictionary Learning for Photometric Redshift Estimation · 2018. 11. 5. · the photometric signal y by imposing sparsity on the learned representation ↵. In addition to the sparsity

Documents