Deep‐Learning for Climate 25‐05‐2018 – SAMA Machine Learning J. Brajard*, A. Charantonis**, P. Gallinari* *Sorbonne Universite, Paris, France, ** ENSIEE [email protected]
Deep‐Learning for Climate25‐05‐2018 – SAMA Machine Learning
J. Brajard*, A. Charantonis**, P. Gallinari**Sorbonne Universite, Paris, France, ** ENSIEE
Outline
• Context• Deep‐Learning models used for problems in climate modeling
• CNNs, RNNs, STNs, generative models
• Examples of Deep Learning for Climate Applications• Event detection• Spatio‐temporal modeling• NN as dynamic models
• Links between NNs and ODEs
2018‐05‐25 Deep Learning for Climate 2
Context
• Brief review of the literature on Deep learning applications to climate modeling
• Literature• Increasing number of application papers from both the « climate » and CS communities
• e.g. 4 papers + 2 invited talks at « Climate Informatics 2017 »• Most are still preliminary work:
• basic applications of Deep Learning methods• or « toy » problems
• Several platforms available (Google‐tensorFlow, Facebook PyTorch, etc) make Deep Learning experimentation easy
• Some innovative papers from the Machine Learning community• Application validity?
2018‐05‐25 Deep Learning for Climate 3
Context
• Mainly found two application topics• Event detection
• Eddy detection + following, Extreme Weather detection• Models: CNN, convolution‐deconvolution CNNs
• Spatio temporal modeling for different phenomena• SST, Precipitation Nowcasting, etc• Models: RNN extensions, Generative Models, Physically inspiredmodels
• Type of data used in these papers• Satellite data• Reanalysis data• Simulations e.g. from atmosphere models
2018‐05‐25 Deep Learning for Climate 4
Deep‐Learning models used for problems in climate modeling
2018‐05‐25 Deep Learning for Climate 5
Convolutional nets
Deep Learning for Climate 6
• ConvNet architecture (Y. LeCun since 1988)• Deployed e.g. at Bell Labs in 1989‐90• Character recognition• Convolution: non linear embedding in high dimension• Pooling: average, max
2018‐05‐25
Fig. LeCun
Convolutions and Pooling
• Convolution, stride 1, from 3x3 image to 2x2 image, 2x2 filter
• Pooling• Max pooling, stride 2
Deep Learning for Climate 7
5 7
2 4
Filter
2018‐05‐25
Transpose convolution
• This is the reverse operation –• From 2x2 image to 3x3 image, 2x2 filter, Stride 1 with Padding
• Unpooling• Reverse pooling operation• Different solutions
2018‐05‐25 Deep Learning for Climate 8
0 0 0 0
0 0
0 0
0 0 0 0
Filter
Convolutional NetsResNet (He et al. 2016)• 152 ResNet 1st place ILSVRC classification competition• Other ResNets 1st place ImageNet detection, 1st place ImageNet localization, MS‐
COCO detection and segmentation• Building block
• Identity probably helps propagating gradients• is called the residual
• General architecture• Mainly 3x3 convolutional filters
Deep Learning for Climate 9
Fig. from (He 2016)
2018‐05‐25
Spatial tranformer networks(Jaderberg 2015)
• Proposed initially as a module for learning image transformations• Such as: cropping, rotations, etc• Differentiable module that allows image warping
• This is the interesting mechanism for us• Adaptations are used e.g. in de Bezenac 2018: implements advection mechanism
• Illustration (Fig. from (Jaderberg 2015))
2018‐05‐25 Deep Learning for Climate 10
Spatial tranformer networks(Jaderberg 2015)
• STN implements a pointwise image transformation• All the parameters are learned• 2 main components
• Sampling mechanism• For each target point , , sample a source point ,• is a learned transformation, with parameters , is a NN,
is the Source Image
2018‐05‐25 Deep Learning for Climate 11
Source image Target image
,,
, ,
Spatial tranformer networks(Jaderberg 2015)
• 2 main components• Transformation (warping mechanism)
• For each sampled source point , , compute the value of the corresponding target point ,
• Apply a kernel transformation centered on the source point ,
2018‐05‐25 Deep Learning for Climate 12Source image Target image
,
,
Kernel transformation
, ∑ , , , ∈
, pixel intensity at ,
Recurrent neural networks ‐ RNNs
• Basic architecture: state space model
2018‐05‐25 Deep Learning for Climate 13
x
s
U
V
Wmemory st
U
Vs2
U
Ws1
U
WV V
• Up to the 90s RNN were of no practical use, too difficult to train• Mid 2000s successful attempts to implement RNN
• e.g. A. Graves for speech and handwriting recognition• Today
• RNNs SOTA for a variety of applications e.g., speech decoding, translation, language generation, etc
Google Neural Machine Translation System(Wu et al 2016)https://research.googleblog.com/2016/09/a‐neural‐network‐for‐machine.html
• General Architecture
2018‐05‐25 Deep Learning for Climate 14
Figure fromWu et al. 2016
Encoder: 8 stacked LSTM RNN + residual connections
Decoder: 8 stacked LSTM RNN + residual connections + Softmax output layer
Attention mechanism
• NMT seminal papers: Cho et al. 2014, Sutskever et al. 2014• Comparison and evaluation of NMT RNNs options (Fritz et al. 2017)
• 250 k‐hours GPU ‐> a 250 k$ paper !
Generative Adversarial Networks (Goodfellows 2014)Generative models intuition
• Provided a sufficiently powerful model z• It should be possible to learn complex mappings from latent spaceto real world spaces such as:
2018‐05‐25 Deep Learning for Climate 15
z
z
zLatent space Real space
Generative Adversarial Networks (Goodfellows 2014)Generative models intuition
• Given a probability distribution on the latent space , defines a probability distribution on the observation space
2018‐05‐25 Deep Learning for Climate 16
G z
z
zLatent space Real space
Generative Adversarial Networks (Goodfellows 2014)Generative models intuition
Deep Learning for Climate 17
• Generative latent variable model
• Given a simple distribution , e.g ~ 0, I , use a NN to learna possibly complex mapping
Z x
θ
NN
G
2018‐05‐25
GANs (Goodfellow, 2014)
• Principle• A generative network generates data after sampling from a latent distribution
• A discriminant network tells if the data comes from the generativenetwork or from real samples
• The two networks are trained together• The generative network tries to fool the discriminator, while the discriminator tries to distinguish between true and artificiallygenerated data
• Formulated as a MinMax game• Hope: the Discriminator will force the Generator to be clever
• Applications• Data generation, Semi‐supervised learning, super resolution, …
Deep Learning for Climate 182018‐05‐25
GANs
Deep Learning for Climate 19
• Discriminator is presented alternatively with true ( and fakedata
Generator NetworkG
~ |
Generated data
Discriminator Network 1 if
0 if Latent variable sampling
~Real data sampling
and are typically MLPs
2018‐05‐25
GAN Training
Deep Learning for Climate 20
• Algorithm alternates between optimizing and
Train Train
Train Train
2018‐05‐25
GANs examples Deep Convolutional GANs (Radford 2015) ‐Image generation
• LSUN bedrooms dataset ‐ over 3 million training examples
2018‐05‐25 Deep Learning for Climate 21
Fig. Radford 2015
Gan exampleMULTI‐VIEW DATA GENERATION WITHOUT VIEWSUPERVISION (Chen 2018)• Objective
• Generate images by disantangling content and view• Eg. Content 1 person, View: position, illumination, etc
• 2 latent spaces: view and content• Generate image pairs: same item with 2 different views• Learn to discriminate between generated and real pairs
•
2018‐05‐25 Deep Learning for Climate 22
1 row = 1 content
Column = view Column = view
Adversarial training: video sequence prediction
•
2018‐05‐25 Deep Learning for Climate 23
Video prediction, (Mathieu et al. 2016)
Predicting video future segmentations (Luc et al. 2017 << LJK Grenoble)
Examples of Deep Learning applications in the ClimateDomain
2018‐05‐25 Deep Learning for Climate 24
Event DetectionEddy detectionExtreme weather event detection
2018‐05‐25 Deep Learning for Climate 25
Eddy Identification and Tracking(Lguensat 2017)
• Objective : pixelwise eddy classification• 3 classes: anticyclonic, cyclonic, no Eddy
• Data• SSH maps southwest Atlantic (AVISO‐SSH)• Labeled by PET14 algorithm (Mason 2014)
• Provides eddy center + speed and contour• Transformed into segmentation maps using the speed contour coordinates
• Speed contour with the highest mean geostrophic rotational current• Pixels inside each contour is labeled A‐eddy, C‐eddy, No‐eddy
• 15 years, 1 map/ day, 14 1st years used for training, last year for testing
• Input = 128x128 patch randomly sampled from the SSH map• 5 k training examples
2018‐05‐25 Deep Learning for Climate 26
Eddy Identification and Tracking(Lguensat 2017)
•
2018‐05‐25 Deep Learning for Climate 27
Patch sampling
Fig. from (Lguensat 2017)
Eddy Identification and Tracking(Lguensat 2017)• Model
• Convolution‐Deconvolution architecture• Inspired from CNN for biomedical image segmentation
• Task: classification• Training criterion
• Cross Entropy• Dice‐Loss = 1 – mean‐softDiceCoef (better reflects segmentation…)
• , ∑ .∑ ∑
• : predicted output (matrix), : Target output (matrix)• T: one hot encoding (3 D) for each position, P: also 3 D for each
position ( ∈ 0,1• predicted probability, 1 , 0• mean‐softDiceCoef: mean for the 3 coefficients• , should be 1 for perfect segmentation, 0 for
completely mistaken segmentation
2018‐05‐25 Deep Learning for Climate 28
Eddy Identification and Tracking(Lguensat 2017)
•
2018‐05‐25 Deep Learning for Climate 29
Fig. from (Lguensat 2017)
Eddy Identification and Tracking(Lguensat 2017)
• Experiments• 2 variants of the network
• Code available, data available• Mentionned extensions
• 3D altimetry with 3D CNNs• SST as additional inputs
2018‐05‐25 Deep Learning for Climate 30
Extreme Weather Event Detection(Racah 2017)• Objective: detection of local events from earth observation
• 4 classes: tropical depressions, tropical cyclones, extra tropical cyclones, atmospheric rivers
• Data• Simulated data from CAM5, a 3 D physical model of the atmosphere.
• Generates 768x1152 images (8) per day, each with 16 channels !! (Channels: Surface temp, surface pressure, etc), for 27 years
• Labeled with TECA (Toolkit for Extreme Climate Analysis)• Produces : event center coordinates in the image, bounding box for the event, event class
• Prone to errors, + imbalanced event classes
• Method• Convolution‐Deconvolution NN + supervision for predicting eventlocalization, size and class
2018‐05‐25 Deep Learning for Climate 31
Extreme Weather Event Detection(Racah 2017)
• Model: 3D Conv – Deconv NN
2018‐05‐25 Deep Learning for Climate 32
Reconstruction error
Object presentin the grid Y/N
Object classLocation/ size of object
Input image is split into a 12x18 grid of 64x64 pixels
Fig. from (Racah 2017)
Extreme Weather Event Detection(Racah 2017)
• Exemple
2018‐05‐25 Deep Learning for Climate 33
Fig. from (Racah 2017)
Spatio‐temporal modelingNowcastingIntegration of NN in numerical modelsIncorporating prior physical knowledge in Deep learning modelsSolving inverse problems with NNs
2018‐05‐25 Deep Learning for Climate 34
Precipitation Nowcasting(Shi 2015, Shi 2017, Zhang 2017)
• Precipitation Nowcasting• Very short term (some hours) prediction of rainfall intensity in a local region
• Classical methods• Numerical Weather Prediction (NWP) methods: based on physicalequations of an atmostphere model
• Extrapolation based methods using radar data• Optical flow based methods inspired from vision• Does not fully exploit available data (Shi 2015)
• Objective• Learning from spatio temporal series of radar measures
• k‐step prediction• End to end learning
• Data• Local radar maps
2018‐05‐25 Deep Learning for Climate 35
Precipitation Nowcasting(Shi 2015)
• Model• Extension of LSTM by incorporating explicit spatial dependencies
• ConvLSTMs• Inspired from early video prediction models
• Analogy with the video prediction tasks but on dense images• Note: several recent papers for video prediction with NN (without optical Flow)
• convolutions both for input to state and state to state connections
2018‐05‐25 Deep Learning for Climate 36
Precipitation Nowcasting(Shi 2015)• Data
• Radar reflectivity maps from 97 rainy days in Hong Kong• 1 radar map every 6 mn, 240 frames per day• Small dataset
• Radar map preprocessed into 100x100 grayscale « image» + smoothing• Sequences = 20 successive frames, 5 as input, 15 as prediction
• Model• 2 layers ConvLSTM• Training criterion: Cross‐Entropy (rain/ no rain ????) or MSE + thresholding ?
• Evaluation• Several measures
• MSE is measured on the predicted values (regression)• The other measures require binary decisions: rain vs no rain, the preicted values are converted to 0/1 using a threshold of
0.5 / rainfall rate• Rover is an optical flow based method
• Lessons• State to state convolutions are essential for handling spatio‐temporal dependencies• Better than ROVER (sota Optical Flow based method) and Full LSTM
2018‐05‐25 Deep Learning for Climate 37
Precipitation Nowcasting(Shi 2017)
• Extension of the ConvLSTM work• Based on GRUs• Main ideas
• Use convolution GRUs instead of fully connected GRUs: ConvGRU• The spatial dependency structure between states should be contextdependent and not fixed like in ConvLSTMs
• They consider a spatial context• Basic unit is called TrajGRU
• New and larger dataset• New evaluation metrics (weighted MSE)
2018‐05‐25 Deep Learning for Climate 38
Precipitation Nowcasting(Shi 2017)
• Selection of neighborhood at time t (Warping mechanism)• For cell , in select neighborhood cells at • Function , generates a bilinear mapping which is then usedto select points in
2018‐05‐25 Deep Learning for Climate 39
Precipitation Nowcasting(Shi 2017)• Dataset: HKO‐7
• Echo radar data from 2009 to 2015 in Hong Kong• 1 radar map every 6 mn, 240 frames per day• Resolution 480x480 pixels, altitude 2 km, cover 512x512 km in Hong Kong• Radar images are transformed to 0, 255 pixel values + filtering• Rainy days: 812 days for training, 50 for validation, 131 for test• Prediction: radar reflectivity values are converted to rainfall intensity
values• Model
• 3 layer Encoding – Forecasting model• Training criterion: weighted MSE (higher weights for heavy rainfall –compensates for data imbalance – see next slide)
• Evaluation• MSE and weighted MSE (regression)• Different measures requiring a binary decision: rain or no rain
• Evaluation is performed at different threshold values 0.5, 5, 10, 30• Predicted pixel values are converted to 0/1 values for each threshold• Scores are computed for each threshold
2018‐05‐25 Deep Learning for Climate 40
Precipitation Nowcasting(Shi 2017)
• Rain statistics (dataset)
• Performance comparison
2018‐05‐25 Deep Learning for Climate 41
Precipitation Nowcasting(Zhang 2017)
• Number of preliminary analyses, e.g. (Zhang 2017)• Nowcasting based on
• 3 D Radar maps – multiple altitudes• Reanalysis data from VDRAS (NCAR US)• Classification: rain/ no rain
• Vertical velocity and buoyancy of an air parcel (also 3 D data)• Objective: nowcasting, storm initiation and growth (*)
• Argument: radar data not sufficient for (*)
2018‐05‐25 Deep Learning for Climate 42
Integration of NN in numerical models(Brajard 2018)
• Can Machine Learning (ML) techniques be used in weather and climate models to replace physical forcings
• Example
• Question: can be represented by a neural network ?
2018‐05‐25 Deep Learning for Climate 43
More generally, the forcing terms mimic unresolved processes like turbulence, precipitation, radiation, clouds, friction, etc. Typically computed via complicatedphysical parameterizations with empirical parameters
Integration of NN in numerical models(Brajard 2018)
• Proof of concept• Data generated by a fully specified shallow water model
• i.e. the are modeled by a physical model• Train a MLP to learn the , supervised learning
2018‐05‐25 Deep Learning for Climate 44
: speed: heigth of mixture levelsurface wind
Integration of NN in numerical models(Brajard 2018)
• The neural network simulation diverges after a few hundred days (kinetic and potential energy explode)
• Solution: add a mass conservation constraint (hmean = constant) to the neural network training algorithm (physics‐informed machine learning)
2018‐05‐25 Deep Learning for Climate 45
Incorporating prior knowledgeDeep Learning for Physical Processes: Incorporating Prior Scientific Knowledge, (de Bezenac 2018)
• Motivations• DL SOTA for perception problems• Natural physical phenomenon are much more complex thanproblems handled by Deep Learning today
• Can we incorporate prior knowledge from physics in statistical models?
• Challenge• Interaction between the Physical and the Statistical paradigms
• Illustration: Sea Surface Temperature Prediction
Deep Learning for Climate 462018‐05‐25
Incorporating prior knowledge ‐ (de Bezenac 2018) Physical model for fluid transportAdvection – Diffusion equation• Describes transport of through advection and diffusion
.• : quantity of interest (Temperature Image)
• motion vector, diffusion coefficient
• There exists a closed form solution• ∗
Deep Learning for Climate 47
• If we knew the motion vector and the diffusion coefficient we couldcalculate from • and unknown• ‐> Learn and
2018‐05‐25
Incorporating prior knowledge ‐ (de Bezenac 2018) Prediction ModelObjective: predict from past , , …• 2 components:
Deep Learning for Climate 48
Convolution‐ Deconvolution NN for estimating motion vector
• End to End learning using only supervision• Stochastic gradient optimization• Performance on par with SOTA assimilation models
Past Images Target image
Warping SchemeImplements discretizedAdvection‐Diffusion solution
2018‐05‐25
Solving inverse problems with NNs(de Bezenac et al. ongoing work)
• Objective• Given noisy observed data, and possibly some priors how to generate an approximation of the underlying true data ?
• Priors may come from a physical model
• Applications• Improve physical model predictions using observed data• Inpainting for physical data
• Method• Based on an extension of ambiant GANs (Bora et al. 2018)
2018‐05‐25 Deep Learning for Climate 49
Solving inverse problems with NNs(de Bezenac et al. ongoing work)
• Ambiant GANs (Bora et al. 2018)• Train generative models from incomplete or noisy samples• Hyp: the noise/ measurement process is known
• Works for some classes of measurements (theoretical results for kernels + noise distributions – empirical results for large class of processes)
• The NN is trained to distinguish a real measurement from a simulated measurement of a generated image
2018‐05‐25 Deep Learning for Climate 50
Fig. from Bora et al. 2018
Solving inverse problems with NNs(de Bezenac et al. ongoing work)
• AmbiantGAN example
2018‐05‐25 Deep Learning for Climate 51
Fig. from Bora et al. 2018
Solving inverse problems with NNs(de Bezenac et al. ongoing work)
• Conditional ambiant GANs• Objective
• Given a stochastic measurement process model learn so that isindistinguishable from
2018‐05‐25 Deep Learning for Climate 52
Process
Discriminator
Generator network
Solving inverse problems with NNs(de Bezenac et al. ongoing work)
• Preliminary illustrations• Data from Shallow Water model
• Left: 90% pixels eliminated (0) + noise 0,1 on remaining pixels• Right: « clouds »
2018‐05‐25 Deep Learning for Climate 53
True State
Observation
GAN model
BLUE
True State
Observation
GAN model
NN as Dynamical Systems
2018‐05‐25 Deep Learning for Climate 54
NN as Dynamical Systems
• Recent papers on the interpretation of NNs as discretizationschemes for differential equations
• Links between data driven approaches (NNs) and physical modelsused in climate modeling
• Allows learning efficient discretization schemes for unknown ODE• Motivates the alternative design of NN modules/ architectures• Not yet a clear application to climate pb.
2018‐05‐25 Deep Learning for Climate 55
Resnet as a discretization scheme for ODEs
• ODE• , , 0 (1)
• Resnet module• , (2)• , , ∈ 0,1• ,
• Forward Euler Scheme for the ODE• time step
• Note: this type of additive structure (2) is also present in LSTM and GRU units
• Resnet• Input , output • Multiple Resnet modules implement a multi‐step discretization schemefor the ODE
• ,• , , …
2018‐05‐25 Deep Learning for Climate 56
Resnet as a discretization scheme for ODEs
• This suggests that alternative discretization schemes will correspond to alternative Resnet like NN models
• Backward Euler, Runge‐Kutta, linear multi‐step …• Example (Lu 2018) linear multi‐step discretization scheme
• 1 ,
• Applications• Classification (a la ResNet)• Modeling dynamical systems
• (Fablet 2017) Runge Kutta for dynamical systems, Toy problems
2018‐05‐25 Deep Learning for Climate 57
Fig. (Lu 2018)
References
• Brajard, J., (1), Charantonis A., Sirven J., Can a neural network learn a numerical model ?, Geophysical Research Abstracts, Vol. 20, EGU2018‐13973, 2018
• de Bezenac, E., Pajot, A., & Gallinari, P. (2018). Deep Learning For Physical Processes: Incorporating Prior Scientific Knowledge. In ICLR.
• Fablet, R., Ouala, S., & Herzet, C. (2017). Bilinear residual Neural Network for the identification and forecasting of dynamical systems, 2(1). Retrieved from http://arxiv.org/abs/1712.07003
• Franz, K., Roscher, R., Milioto, A., & Wenzel, S. (n.d.). Ocean Eddy Identification and Tracking using Neural Networks. ArXiv Computer Science. https://doi.org/arXiv:1803.07436v1
• Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial Transformer Networks. Nips, 2017‐‐2025. https://doi.org/10.1038/nbt.3343
• Kim, S., Hong, S., Joh, M., & Song, S. (2017). DeepRain: ConvLSTM Network for Precipitation Prediction using Multichannel Radar Data, 3–6. Retrieved from http://arxiv.org/abs/1711.02316
• Lguensat, R., Sun, M., Fablet, R., Mason, E., Tandeo, P., & Chen, G. (2017). EddyNet: A Deep Neural Network For Pixel‐Wise Classification of Oceanic Eddies (pp. 1–5). Retrieved from http://arxiv.org/abs/1711.03954
• Liu, Y., Racah, E., Prabhat, Correa, J., Khosrowshahi, A., Lavers, D., … Collins, W. (2016). Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets. ArXive, 1605.01156, 81–88. https://doi.org/10.475/123
• Lu, Y., Zhong, A., Li, Q., & Dong, B. (2017). Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations (pp. 1–15). Retrieved from http://arxiv.org/abs/1710.10121
• Racah, E., Beckham, C., Maharaj, T., Kahou, S. E., Prabhat, & Pal, C. (2017). ExtremeWeather: A large‐scale climate dataset for semi‐supervised detection, localization, and understanding of extreme weather events. In NIPS (pp. 1–12). Retrieved from http://arxiv.org/abs/1612.02095
• Shi, X., Chen, Z., Wang, H., Yeung, D.‐Y., Wong, W., & Woo, W. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems 28, 802–810.
• Shi, X., Gao, Z., Lausen, L., Wang, H., Yeung, D.‐Y., Wong, W., & Woo, W. (2017). Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model (pp. 1–11). Retrieved from http://arxiv.org/abs/1706.03458
• Zhang, W., Han, L., Sun, J., Guo, H., & Dai, J. (2017). Application of Multi‐channel 3D‐cube Successive Convolution Network for Convective Storm Nowcasting. ArXiv Preprint ArXiv:1702.04517, 1–9.
2018‐05‐25 Deep Learning for Climate 58