Top Banner
Website: https://press3.mcs.anl.gov/cpac/projects/scidac Software portal: http://www.hep.anl.gov/cosmology/CosmicEmu/emu.html Workshop: Argonne, Sep 24-25, 2018 – “Advanced Statistics Meets Machine Learning” (https://indico.fnal.gov/event/18318/overview) SciDAC: Accelerating HEP Science — Inference and Machine Learning at Extreme Scales 1 Team: P. Balaprakash, M. Binois, S. Habib (PI), K. Heitmann (Argonne PI), E. Kovacs, N. Ramachandra, S. Wild (Argonne); A. Fadikar, R. Gramacy, D. Higdon (Va Tech PI) (Va Tech); E. Lawrence (LANL, Dep. PI); Y. Lin, A. Slosar (BNL PI), S. Yoo (BNL); Z. Lukic (LBNL PI), D. Morozov (LBNL) Focus Areas: • Cosmology: Unique arena for advanced stats/ML applications — big data, big compute, large-scale inverse problems ‘Stats/ML at Scale’: Need to speed up methods by many orders of magnitude to enable dealing with datasets and science requirements in the multi-PB to EB era • Accuracy: Many problems in a regime where statistical errors are subdominant — need to understand how to deal with modeling/ mitigating systematics
8

SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Aug 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Website: https://press3.mcs.anl.gov/cpac/projects/scidac Software portal: http://www.hep.anl.gov/cosmology/CosmicEmu/emu.html Workshop: Argonne, Sep 24-25, 2018 – “Advanced Statistics Meets Machine Learning” (https://indico.fnal.gov/event/18318/overview)

SciDAC: Accelerating HEP Science — Inference and Machine Learning at Extreme Scales

!1

Team: P. Balaprakash, M. Binois, S. Habib (PI), K. Heitmann (Argonne PI), E. Kovacs, N. Ramachandra, S. Wild (Argonne); A. Fadikar, R. Gramacy, D. Higdon (Va Tech PI) (Va Tech); E. Lawrence (LANL, Dep. PI); Y. Lin, A. Slosar (BNL PI), S. Yoo (BNL); Z. Lukic (LBNL PI), D. Morozov (LBNL)

Focus Areas: • Cosmology: Unique arena for advanced stats/ML applications —

big data, big compute, large-scale inverse problems • ‘Stats/ML at Scale’: Need to speed up methods by many orders of

magnitude to enable dealing with datasets and science requirements in the multi-PB to EB era

• Accuracy: Many problems in a regime where statistical errors are subdominant — need to understand how to deal with modeling/mitigating systematics

Page 2: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Science with Surveys as an Inverse Problem: Extreme-Scale Computing meets Statistics and Machine Learning

!2

StatsML+Stats ML+Stats

ML+Stats

ML+Stats

▪ Use of HPC resources as high-fidelity, large data-volume sources for state-of-the-art data-intensive statistical and machine learning (ML) methods

▪ Need to speed up the forward modeling process, deal with ‘curse of dimensionality’ in the inverse problem

▪ How to control errors if the modeling and measurement error PDFs are uncertain?

Cosmologicalscientificinferenceprocessshowingforwardmodelingandsystematicerrorexploration/controlloop

Page 3: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

New Techniques for Photometric Redshift Estimation

!3

Scientific Achievement Estimationofgalaxyredshiftdistributionusingphotometricinformation,morphology,andspatialcorrelations;applicationtoLSST

Significance and Impact Characterizationandreductionofphotometricredshiftestimationerrorsessentialforsuccessofimagingsurveys

Research Details • Largesyntheticdatasetbasedonrealistictemplatesfor

spectralenergydistributions(SEDs)ofdifferentgalaxytypes

• Machinelearningtechniquesforclassification(hiddenspacevariables),useofmixturemodels;BayesianlearningforposteriorPDFs

• Earlyresultsshowgreatpromiseforphotometricredshiftestimationapplicationsanderrormitigation

Multi-GaussianProcessapproachtoobtainestimatedredshiftPDFsandcomparisonstotrainingsetvalues(redverticallines)

True redshift

Imagecredit:A.Fadikar

Page 4: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Precision Emulation of CMB Power Spectra

!4

Scientific Achievement Fast,accuratepredictionofcosmicmicrowavebackground(CMB)variables(~2000Xspeedup)with0.2%errorsoverthedesireddynamicrange

Significance and Impact Predictions/forecastsfornext-generationCMBsurveys(CMB-S4),analysisofcurrent-generationdata(ACTPol,Planck,SPT-3G,—)

Research Details • Largetraining/validationdatasetgeneratedusing

theCAMBcodewithsymmetricLatinhypercubesampling

• Dimensionalreductionviaunsupervisedlearning(comparisonofvariationalautoencodersandPCA)

• Non-parametric,GaussianProcess-basedinterpolation;errorestimatesviaMCMC

PlanckvsWMAPcosmologicalparametersusingtheemulator

CMBTTpowerspectrumwitherrorestimates(lowerpanel)

Imagecredit:M.Binois

Imagecredit:N.Ramachandra

Page 5: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Neural Network Prediction of CMB Dust Foreground

!5

Scientific Achievement SignificantimprovementinpredictionfordustforegroundusingconvolutionalneuralnetworksandGalacticneutralhydrogendata

Significance and Impact Currentworkusesintensitydata,thenextgenerationwillfocusonpolarizationtohelpwithoptimalfieldselectionanddataanalysisforasmallapertureCMB-S4experiment

Research Details • Inputdataare50velocityslicesingalacticneutralhydrogen

astracedbythe21cmline • OutputdataarethedifferenceinPlanck353GHzand143GHz

datawhichisdominatedbythedustsignal • Trainedonsoutherngalactichemisphere,validated/tested

onthenortherngalactichemisphere • Optimallinearmodelgivesnegligibleimprovement:neural

netispickingupnontrivialinformation

Improvementincross-correlationcoefficientwithtargetmapcomparedtonaivetotalintensitymap;abovered-lineindicatesimprovement,belowindicatesdeterioration(Greenisfor~1degscales,blueisfor~10degscales)

Exampleprediction:Planckdifferencemap(left)andmodelprediction(right);blackcirclesarepoint-sourcemask

Imagescredit:G.Zhang

Page 6: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Basic Emulation for Ly-alpha Forest Statistics

!6

Scientific Achievement HPCframeworktoinfercosmologicalandthermalparametersusingLy-alphapowerspectrumandselectedcomputationalmodelruns

Significance and Impact Ly-alphaforestobservationsarethemainwindowintostructureformationathighredshifts(2<z<5)andasensitiveprobeofnon-CDMcosmologies.P(k)emulationisnecessaryforrecoveryofcosmologicalparametersfromobservations

Research Details • Automatedsystemforiterativelyrunningcosmological

simulationsandanalysistasksonHPCsystems • Newiterativemethodtodeterminemostinformative

pointsinparameterspaceforrunningthenextbatchofsimulations

• MultiplewaystodoGPemulationofvectorsummarystatistics,i.e.,exploringwaysofcombiningk-andcosmologicalparameterdependenceofemulatedP(k)

Spaceofcosmologicalparameters(θ) Expensive3Dsimulation

Posteriorprobability

Summarystatistic

Inferringcosmologicalparametersina3-parametertestcase:Simulationsproducematterconfigurationsthatdependoncosmologicalparameters;Ly-alphaanalysisproducesoutputscomparabletoskysurveymeasurements.Combiningpredictionsandmeasurementsweinfer“currentbest”parameterprobabilitiesaswellasthe“promising”pointsforthenextsetofsimulationsinouriterativeprocedure.

Imagecredit:Z.Lukic

Page 7: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Image Classification/Regression for Strong Lensing

!7

Scientific Achievement Fast(10microseconds/image),robust(80-90%accuracy)classificationofstronglylensedbackgroundgalaxies

Significance and Impact LSSTwillhavetensofbillionsofobjectswith~100Kstronglylensedsources—automatedsourcedetection/filteringisessential

Research Details • Largesyntheticdatasetbasedonfullray

tracingalgorithmwith1)modelhalomassdistributionaslensesand2)halosfromcosmologicalsimulations,realistictelescopeproperties;singleandstackedimages

• DeepCNNclassification/regression • GANsforfastgenerationofnewimages

SingleandstackednoisylensedtrainingimagesforLSST;acompanionsetofnon-lensedimagesisnotshown

Performance:ModernGPUsaresignificantlyfasterthanmanycorearchitectures

Imagecredit:N.Li

Imagecredit:N.Ramachandra

Page 8: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Future Work

!8

▪Emulation Landscape: ▪Extend work on summary statistics to problems with significantly higher

dimensionality, O(10) to O(100) ▪Multi-fidelity emulation ▪Develop new methods for applications to likelihood-free scenarios (e.g.,

semi-analytic galaxy modeling) ▪Fast generation of multiple realizations of ‘raw’ sky data; develop

techniques for ensuring dynamic consistency (causality vs. correlations) ▪ Image Applications: Image cross-validation, source de-blending algorithms,

application to calibration studies ▪ML/DL Methods on HPC Platforms: Work on scaling up ML and statistical

methods on HPC platforms with GPU acceleration (e.g., Cooley@ALCF, Summit@OLCF) ▪Stats meets ML: Improve methods by incorporating model information into

‘black box’ techniques; incorporate optimization methods into Bayesian calibration, many other topics —