CS267 - April 24th, 2012 Big Bang, Big Data, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center, LBL Space Sciences Laboratory, UCB and the BOOMERanG, MAXIMA, Planck, EBEX & PolarBear collaborations
29
Embed
Big Bang, Big Data, Big Iron High Performance Computing and the Cosmic Microwave Background
Big Bang, Big Data, Big Iron High Performance Computing and the Cosmic Microwave Background. Julian Borrill Computational Cosmology Center, LBL Space Sciences Laboratory, UCB and the BOOMERanG , MAXIMA, Planck, EBEX & PolarBear collaborations. The Cosmic Microwave Background. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CS267 - April 24th, 2012
Big Bang, Big Data, Big Iron High Performance Computing and the Cosmic Microwave Background
Julian BorrillComputational Cosmology Center, LBL
Space Sciences Laboratory, UCB
and the BOOMERanG, MAXIMA, Planck, EBEX & PolarBear collaborations
CS267 - April 24th, 2012
The Cosmic Microwave Background
About 400,000 years after the Big Bang, the expanding Universe cools through the ionization temperature of hydrogen: p+ + e- => H.
Without free electrons to scatter off, the photons free-stream to us today.
• COSMIC - filling all of space.• MICROWAVE - redshifted by
the expansion of the Universe from 3000K to 3K.
• BACKGROUND - primordial photons coming from “behind” all astrophysical sources.
CS267 - April 24th, 2012
CMB Science• Primordial photons give the earliest possible image of the Universe.• The existence of the CMB supports a Big Bang over a Steady State
cosmology (NP1).• Tiny fluctuations in the CMB temperature (NP2) and polarization encode the
fundamentals of– Cosmology
• geometry, topology, composition, history, …– Highest energy physics
• grand unified theories, the dark sector, inflation, …• Current goals:
– definitive T measurement provides complementary constraints for all dark energy experiments.
– detection of cosmological B-mode gives energy scale of inflation from primordial gravity waves. (NP3)
CS267 - April 24th, 2012
The Concordance Cosmology
Supernova Cosmology Project (1998):
Cosmic Dynamics (- m)
BOOMERanG & MAXIMA (2000):
Cosmic Geometry (+ m)
70% Dark Energy + 25% Dark Matter + 5% Baryons
95% Ignorance
What (and why) is the Dark Universe ?
CS267 - April 24th, 2012
Observing the CMB• With very sensitive, very cold,
detectors.• Scanning all of the sky from
space, or just some of it from the stratosphere or high dry ground.
– Calculate individual detector pointing on the fly.
• Remove redundant write/read of time-streams between simulation & mapping
– Generate simulations on the fly only when map-maker requests data.
• Put MC loop inside map-maker– Amortize common data reads over all realizations.
CS267 - April 24th, 2012
IO – After NowRead telescope pointingFor each detector
Calculate detector pointingFor each MC realization SimMap
For all detectorsSimulate time-stream
Write map
Read: Sparse Observations Write: Realizations x Pixels
E.g. for Planck, read 2GB & write 70TB => 108 read & 103 write compression.
CS267 - April 24th, 2012
Communication Details• The time-ordered data from all the detectors are distributed over the
processes subject to:– Load-balance – Common telescope pointing
• Each process therefore holds– some of the observations– for some of the pixels.
• In each PCG iteration, each process solves with its observations.• At the end of each iteration, each process needs to gather the total result for
all of the pixels in its subset of the observations.
CS267 - April 24th, 2012
Communication - Before• Initialize a process & MPI task on every core• Distribute time-stream data & hence pixels• After each PCG iteration
– Each process creates a full map vector by zero-padding– Call MPI_Allreduce(map, world) – Each process extracts the pixels of interest to it & discards the rest
CS267 - April 24th, 2012
Communication – Optimizations• Reduce the number of MPI tasks
– Only use MPI for off-node communication– Use threads on-node
• Minimize the total volume of the messages– Determine processes’ pair-wise pixel overlap– If the data volume is smaller, use scatter/gather in place of reduce
CS267 - April 24th, 2012
Communication – After Now• Initialize a process & MPI task on every node• Distribute time-stream data & hence pixels• Calculate common pixels for every pair of processes• After each PCG iteration
– If most pixels are common to most processes• use MPI_Allreduce(map, world) as before
– Else• Each process prepares its send buffer• Call MPI_Alltoallv(sbuffer, rbuffer, world) • Each process only receives/accumulates data for pixels it sees.
CS267 - April 24th, 2012
Communication - ImpactFewer communicators & smaller message volume:
FRANKLIN HOPPER
CS267 - April 24th, 2012
TARGET:104 maps
9 freqs2.5 years105 cores
10 hours
12x217
FFP1
CTP3
OnTheFly(I/O)
Hybridize(COMM)
Abstract(I/O)
CS267 - April 24th, 2012
Current Status• Calculation scale with #observations.• IO & communication scale with #pixels.• Observations/pixel ~ S/N: science goals will help scaling!
– Planck: O(103) observations per pixel – PolarBear: O(106) observations per pixel
• For each experiment, fixed data volume => strong scaling.• Between experiments, growing data volume => weak scaling.
CS267 - April 24th, 2012
HPC System Evaluation• Well-characterized & -instrumented science application codes can be a
powerful tool for whole-system performance evaluation.• Compare
– unthreaded/threaded– allreduce/allgather
on Cray XT4, XT5, XE6on 200 – 16000 cores
CS267 - April 24th, 2012
HPC System Evolution• Clock speed is no longer able to maintain Moore’s Law.• Multi-core CPU and GPGPU are two major approaches.• Both of these will require