Deep Learning for Climate · Deep Learning for Climate 9 Fig. from(He 2016) 2018‐05‐25. Spatial tranformernetworks ... • SST as additional inputs 2018‐05‐25 Deep Learning

Deep‐Learning for Climate25‐05‐2018 – SAMA Machine Learning

J. Brajard*, A. Charantonis**, P. Gallinari**Sorbonne Universite, Paris, France, ** ENSIEE

[email protected]

Outline

• Context• Deep‐Learning models used for problems in climate modeling

• CNNs, RNNs, STNs, generative models

• Examples of Deep Learning for Climate Applications• Event detection• Spatio‐temporal modeling• NN as dynamic models

• Links between NNs and ODEs

2018‐05‐25 Deep Learning for Climate 2

Context

• Brief review of the literature on Deep learning applications to climate modeling

• Literature• Increasing number of application papers from both the « climate » and CS communities

• e.g. 4 papers + 2 invited talks at « Climate Informatics 2017 »• Most are still preliminary work:

• basic applications of Deep Learning methods• or « toy » problems

• Several platforms available (Google‐tensorFlow, Facebook PyTorch, etc) make Deep Learning experimentation easy

• Some innovative papers from the Machine Learning community• Application validity?


Context

• Mainly found two application topics• Event detection

• Eddy detection + following, Extreme Weather detection• Models: CNN, convolution‐deconvolution CNNs

• Spatio temporal modeling for different phenomena• SST, Precipitation Nowcasting, etc• Models: RNN extensions, Generative Models, Physically inspiredmodels

• Type of data used in these papers• Satellite data• Reanalysis data• Simulations e.g. from atmosphere models


Deep‐Learning models used for problems in climate modeling


Convolutional nets

Deep Learning for Climate 6

• ConvNet architecture (Y. LeCun since 1988)• Deployed e.g. at Bell Labs in 1989‐90• Character recognition• Convolution: non linear embedding in high dimension• Pooling: average, max

2018‐05‐25

Fig. LeCun

Convolutions and Pooling

• Convolution, stride 1, from 3x3 image to 2x2 image, 2x2 filter

• Pooling• Max pooling, stride 2


5 7

2 4

Filter

2018‐05‐25

Transpose convolution

• This is the reverse operation –• From 2x2 image to 3x3 image, 2x2 filter, Stride 1 with Padding

• Unpooling• Reverse pooling operation• Different solutions


0 0 0 0

0 0

0 0

0 0 0 0

Filter

Convolutional NetsResNet (He et al. 2016)• 152 ResNet 1st place ILSVRC classification competition• Other ResNets 1st place ImageNet detection, 1st place ImageNet localization, MS‐

COCO detection and segmentation• Building block

• Identity probably helps propagating gradients• is called the residual

• General architecture• Mainly 3x3 convolutional filters


Fig. from (He 2016)

2018‐05‐25

Spatial tranformer networks(Jaderberg 2015)

• Proposed initially as a module for learning image transformations• Such as: cropping, rotations, etc• Differentiable module that allows image warping

• This is the interesting mechanism for us• Adaptations are used e.g. in de Bezenac 2018: implements advection mechanism

• Illustration (Fig. from (Jaderberg 2015))



• STN implements a pointwise image transformation• All the parameters are learned• 2 main components

• Sampling mechanism• For each target point , , sample a source point ,• is a learned transformation, with parameters , is a NN,

is the Source Image


Source image Target image

,,

, ,


• 2 main components• Transformation (warping mechanism)

• For each sampled source point , , compute the value of the corresponding target point ,

• Apply a kernel transformation centered on the source point ,

2018‐05‐25 Deep Learning for Climate 12Source image Target image

,

,

Kernel transformation

, ∑ , , , ∈

, pixel intensity at ,

Recurrent neural networks ‐ RNNs

• Basic architecture: state space model


x

s

U

V

Wmemory st

U

Vs2

U

Ws1

U

WV V

• Up to the 90s RNN were of no practical use, too difficult to train• Mid 2000s successful attempts to implement RNN

• e.g. A. Graves for speech and handwriting recognition• Today

• RNNs SOTA for a variety of applications e.g., speech decoding, translation, language generation, etc

Google Neural Machine Translation System(Wu et al 2016)https://research.googleblog.com/2016/09/a‐neural‐network‐for‐machine.html

• General Architecture


Figure fromWu et al. 2016

Encoder: 8 stacked LSTM RNN + residual connections

Decoder: 8 stacked LSTM RNN + residual connections + Softmax output layer

Attention mechanism

• NMT seminal papers: Cho et al. 2014, Sutskever et al. 2014• Comparison and evaluation of NMT RNNs options (Fritz et al. 2017)

• 250 k‐hours GPU ‐> a 250 k$ paper !

Generative Adversarial Networks (Goodfellows 2014)Generative models intuition

• Provided a sufficiently powerful model z• It should be possible to learn complex mappings from latent spaceto real world spaces such as:


z

z

zLatent space Real space


• Given a probability distribution on the latent space , defines a probability distribution on the observation space


G z

z

zLatent space Real space



• Generative latent variable model

• Given a simple distribution , e.g ~ 0, I , use a NN to learna possibly complex mapping

Z x

θ

NN

G

2018‐05‐25

GANs (Goodfellow, 2014)

• Principle• A generative network generates data after sampling from a latent distribution

• A discriminant network tells if the data comes from the generativenetwork or from real samples

• The two networks are trained together• The generative network tries to fool the discriminator, while the discriminator tries to distinguish between true and artificiallygenerated data

• Formulated as a MinMax game• Hope: the Discriminator will force the Generator to be clever

• Applications• Data generation, Semi‐supervised learning, super resolution, …

Deep Learning for Climate 182018‐05‐25

GANs


• Discriminator is presented alternatively with true ( and fakedata

Generator NetworkG

~ |

Generated data

Discriminator Network 1 if

0 if Latent variable sampling

~Real data sampling

and are typically MLPs

2018‐05‐25

GAN Training


• Algorithm alternates between optimizing and

Train Train

Train Train

2018‐05‐25

GANs examples Deep Convolutional GANs (Radford 2015) ‐Image generation

• LSUN bedrooms dataset ‐ over 3 million training examples


Fig. Radford 2015

Gan exampleMULTI‐VIEW DATA GENERATION WITHOUT VIEWSUPERVISION (Chen 2018)• Objective

• Generate images by disantangling content and view• Eg. Content 1 person, View: position, illumination, etc

• 2 latent spaces: view and content• Generate image pairs: same item with 2 different views• Learn to discriminate between generated and real pairs

•


1 row = 1 content

Column = view Column = view

Adversarial training: video sequence prediction

•


Video prediction, (Mathieu et al. 2016)

Predicting video future segmentations (Luc et al. 2017 << LJK Grenoble)

Examples of Deep Learning applications in the ClimateDomain


Event DetectionEddy detectionExtreme weather event detection


Eddy Identification and Tracking(Lguensat 2017)

• Objective : pixelwise eddy classification• 3 classes: anticyclonic, cyclonic, no Eddy

• Data• SSH maps southwest Atlantic (AVISO‐SSH)• Labeled by PET14 algorithm (Mason 2014)

• Provides eddy center + speed and contour• Transformed into segmentation maps using the speed contour coordinates

• Speed contour with the highest mean geostrophic rotational current• Pixels inside each contour is labeled A‐eddy, C‐eddy, No‐eddy

• 15 years, 1 map/ day, 14 1st years used for training, last year for testing

• Input = 128x128 patch randomly sampled from the SSH map• 5 k training examples



•


Patch sampling

Fig. from (Lguensat 2017)

Eddy Identification and Tracking(Lguensat 2017)• Model

• Convolution‐Deconvolution architecture• Inspired from CNN for biomedical image segmentation

• Task: classification• Training criterion

• Cross Entropy• Dice‐Loss = 1 – mean‐softDiceCoef (better reflects segmentation…)

• , ∑ .∑ ∑

• : predicted output (matrix), : Target output (matrix)• T: one hot encoding (3 D) for each position, P: also 3 D for each

position ( ∈ 0,1• predicted probability, 1 , 0• mean‐softDiceCoef: mean for the 3 coefficients• , should be 1 for perfect segmentation, 0 for

completely mistaken segmentation



•


Fig. from (Lguensat 2017)


• Experiments• 2 variants of the network

• Code available, data available• Mentionned extensions

• 3D altimetry with 3D CNNs• SST as additional inputs


Extreme Weather Event Detection(Racah 2017)• Objective: detection of local events from earth observation

• 4 classes: tropical depressions, tropical cyclones, extra tropical cyclones, atmospheric rivers

• Data• Simulated data from CAM5, a 3 D physical model of the atmosphere.

• Generates 768x1152 images (8) per day, each with 16 channels !! (Channels: Surface temp, surface pressure, etc), for 27 years

• Labeled with TECA (Toolkit for Extreme Climate Analysis)• Produces : event center coordinates in the image, bounding box for the event, event class

• Prone to errors, + imbalanced event classes

• Method• Convolution‐Deconvolution NN + supervision for predicting eventlocalization, size and class


Extreme Weather Event Detection(Racah 2017)

• Model: 3D Conv – Deconv NN


Reconstruction error

Object presentin the grid Y/N

Object classLocation/ size of object

Input image is split into a 12x18 grid of 64x64 pixels

Fig. from (Racah 2017)

Extreme Weather Event Detection(Racah 2017)

• Exemple


Fig. from (Racah 2017)

Spatio‐temporal modelingNowcastingIntegration of NN in numerical modelsIncorporating prior physical knowledge in Deep learning modelsSolving inverse problems with NNs


Precipitation Nowcasting(Shi 2015, Shi 2017, Zhang 2017)

• Precipitation Nowcasting• Very short term (some hours) prediction of rainfall intensity in a local region

• Classical methods• Numerical Weather Prediction (NWP) methods: based on physicalequations of an atmostphere model

• Extrapolation based methods using radar data• Optical flow based methods inspired from vision• Does not fully exploit available data (Shi 2015)

• Objective• Learning from spatio temporal series of radar measures

• k‐step prediction• End to end learning

• Data• Local radar maps


Precipitation Nowcasting(Shi 2015)

• Model• Extension of LSTM by incorporating explicit spatial dependencies

• ConvLSTMs• Inspired from early video prediction models

• Analogy with the video prediction tasks but on dense images• Note: several recent papers for video prediction with NN (without optical Flow)

• convolutions both for input to state and state to state connections


Precipitation Nowcasting(Shi 2015)• Data

• Radar reflectivity maps from 97 rainy days in Hong Kong• 1 radar map every 6 mn, 240 frames per day• Small dataset

• Radar map preprocessed into 100x100 grayscale « image» + smoothing• Sequences = 20 successive frames, 5 as input, 15 as prediction

• Model• 2 layers ConvLSTM• Training criterion: Cross‐Entropy (rain/ no rain ????) or MSE + thresholding ?

• Evaluation• Several measures

• MSE is measured on the predicted values (regression)• The other measures require binary decisions: rain vs no rain, the preicted values are converted to 0/1 using a threshold of

0.5 / rainfall rate• Rover is an optical flow based method

• Lessons• State to state convolutions are essential for handling spatio‐temporal dependencies• Better than ROVER (sota Optical Flow based method) and Full LSTM



• Extension of the ConvLSTM work• Based on GRUs• Main ideas

• Use convolution GRUs instead of fully connected GRUs: ConvGRU• The spatial dependency structure between states should be contextdependent and not fixed like in ConvLSTMs

• They consider a spatial context• Basic unit is called TrajGRU

• New and larger dataset• New evaluation metrics (weighted MSE)



• Selection of neighborhood at time t (Warping mechanism)• For cell , in select neighborhood cells at • Function , generates a bilinear mapping which is then usedto select points in


Precipitation Nowcasting(Shi 2017)• Dataset: HKO‐7

• Echo radar data from 2009 to 2015 in Hong Kong• 1 radar map every 6 mn, 240 frames per day• Resolution 480x480 pixels, altitude 2 km, cover 512x512 km in Hong Kong• Radar images are transformed to 0, 255 pixel values + filtering• Rainy days: 812 days for training, 50 for validation, 131 for test• Prediction: radar reflectivity values are converted to rainfall intensity

values• Model

• 3 layer Encoding – Forecasting model• Training criterion: weighted MSE (higher weights for heavy rainfall –compensates for data imbalance – see next slide)

• Evaluation• MSE and weighted MSE (regression)• Different measures requiring a binary decision: rain or no rain

• Evaluation is performed at different threshold values 0.5, 5, 10, 30• Predicted pixel values are converted to 0/1 values for each threshold• Scores are computed for each threshold



• Rain statistics (dataset)

• Performance comparison


Precipitation Nowcasting(Zhang 2017)

• Number of preliminary analyses, e.g. (Zhang 2017)• Nowcasting based on

• 3 D Radar maps – multiple altitudes• Reanalysis data from VDRAS (NCAR US)• Classification: rain/ no rain

• Vertical velocity and buoyancy of an air parcel (also 3 D data)• Objective: nowcasting, storm initiation and growth (*)

• Argument: radar data not sufficient for (*)


Integration of NN in numerical models(Brajard 2018)

• Can Machine Learning (ML) techniques be used in weather and climate models to replace physical forcings

• Example

• Question: can be represented by a neural network ?


More generally, the forcing terms mimic unresolved processes like turbulence, precipitation, radiation, clouds, friction, etc. Typically computed via complicatedphysical parameterizations with empirical parameters


• Proof of concept• Data generated by a fully specified shallow water model

• i.e. the are modeled by a physical model• Train a MLP to learn the , supervised learning


: speed: heigth of mixture levelsurface wind


• The neural network simulation diverges after a few hundred days (kinetic and potential energy explode)

• Solution: add a mass conservation constraint (hmean = constant) to the neural network training algorithm (physics‐informed machine learning)


Incorporating prior knowledgeDeep Learning for Physical Processes: Incorporating Prior Scientific Knowledge, (de Bezenac 2018)

• Motivations• DL SOTA for perception problems• Natural physical phenomenon are much more complex thanproblems handled by Deep Learning today

• Can we incorporate prior knowledge from physics in statistical models?

• Challenge• Interaction between the Physical and the Statistical paradigms

• Illustration: Sea Surface Temperature Prediction

Deep Learning for Climate 462018‐05‐25

Incorporating prior knowledge ‐ (de Bezenac 2018) Physical model for fluid transportAdvection – Diffusion equation• Describes transport of through advection and diffusion

.• : quantity of interest (Temperature Image)

• motion vector, diffusion coefficient

• There exists a closed form solution• ∗


• If we knew the motion vector and the diffusion coefficient we couldcalculate from • and unknown• ‐> Learn and

2018‐05‐25

Incorporating prior knowledge ‐ (de Bezenac 2018) Prediction ModelObjective: predict from past , , …• 2 components:


Convolution‐ Deconvolution NN for estimating motion vector

• End to End learning using only supervision• Stochastic gradient optimization• Performance on par with SOTA assimilation models

Past Images Target image

Warping SchemeImplements discretizedAdvection‐Diffusion solution

2018‐05‐25

Solving inverse problems with NNs(de Bezenac et al. ongoing work)

• Objective• Given noisy observed data, and possibly some priors how to generate an approximation of the underlying true data ?

• Priors may come from a physical model

• Applications• Improve physical model predictions using observed data• Inpainting for physical data

• Method• Based on an extension of ambiant GANs (Bora et al. 2018)



• Ambiant GANs (Bora et al. 2018)• Train generative models from incomplete or noisy samples• Hyp: the noise/ measurement process is known

• Works for some classes of measurements (theoretical results for kernels + noise distributions – empirical results for large class of processes)

• The NN is trained to distinguish a real measurement from a simulated measurement of a generated image


Fig. from Bora et al. 2018


• AmbiantGAN example


Fig. from Bora et al. 2018


• Conditional ambiant GANs• Objective

• Given a stochastic measurement process model learn so that isindistinguishable from


Process

Discriminator

Generator network


• Preliminary illustrations• Data from Shallow Water model

• Left: 90% pixels eliminated (0) + noise 0,1 on remaining pixels• Right: « clouds »


True State

Observation

GAN model

BLUE

True State

Observation

GAN model

NN as Dynamical Systems


NN as Dynamical Systems

• Recent papers on the interpretation of NNs as discretizationschemes for differential equations

• Links between data driven approaches (NNs) and physical modelsused in climate modeling

• Allows learning efficient discretization schemes for unknown ODE• Motivates the alternative design of NN modules/ architectures• Not yet a clear application to climate pb.


Resnet as a discretization scheme for ODEs

• ODE• , , 0 (1)

• Resnet module• , (2)• , , ∈ 0,1• ,

• Forward Euler Scheme for the ODE• time step

• Note: this type of additive structure (2) is also present in LSTM and GRU units

• Resnet• Input , output • Multiple Resnet modules implement a multi‐step discretization schemefor the ODE

• ,• , , …


Resnet as a discretization scheme for ODEs

• This suggests that alternative discretization schemes will correspond to alternative Resnet like NN models

• Backward Euler, Runge‐Kutta, linear multi‐step …• Example (Lu 2018) linear multi‐step discretization scheme

• 1 ,

• Applications• Classification (a la ResNet)• Modeling dynamical systems

• (Fablet 2017) Runge Kutta for dynamical systems, Toy problems


Fig. (Lu 2018)

References

• Brajard, J., (1), Charantonis A., Sirven J., Can a neural network learn a numerical model ?, Geophysical Research Abstracts, Vol. 20, EGU2018‐13973, 2018

• de Bezenac, E., Pajot, A., & Gallinari, P. (2018). Deep Learning For Physical Processes: Incorporating Prior Scientific Knowledge. In ICLR.

• Fablet, R., Ouala, S., & Herzet, C. (2017). Bilinear residual Neural Network for the identification and forecasting of dynamical systems, 2(1). Retrieved from http://arxiv.org/abs/1712.07003

• Franz, K., Roscher, R., Milioto, A., & Wenzel, S. (n.d.). Ocean Eddy Identification and Tracking using Neural Networks. ArXiv Computer Science. https://doi.org/arXiv:1803.07436v1

• Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial Transformer Networks. Nips, 2017‐‐2025. https://doi.org/10.1038/nbt.3343

• Kim, S., Hong, S., Joh, M., & Song, S. (2017). DeepRain: ConvLSTM Network for Precipitation Prediction using Multichannel Radar Data, 3–6. Retrieved from http://arxiv.org/abs/1711.02316

• Lguensat, R., Sun, M., Fablet, R., Mason, E., Tandeo, P., & Chen, G. (2017). EddyNet: A Deep Neural Network For Pixel‐Wise Classification of Oceanic Eddies (pp. 1–5). Retrieved from http://arxiv.org/abs/1711.03954

• Liu, Y., Racah, E., Prabhat, Correa, J., Khosrowshahi, A., Lavers, D., … Collins, W. (2016). Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets. ArXive, 1605.01156, 81–88. https://doi.org/10.475/123

• Lu, Y., Zhong, A., Li, Q., & Dong, B. (2017). Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations (pp. 1–15). Retrieved from http://arxiv.org/abs/1710.10121

• Racah, E., Beckham, C., Maharaj, T., Kahou, S. E., Prabhat, & Pal, C. (2017). ExtremeWeather: A large‐scale climate dataset for semi‐supervised detection, localization, and understanding of extreme weather events. In NIPS (pp. 1–12). Retrieved from http://arxiv.org/abs/1612.02095

• Shi, X., Chen, Z., Wang, H., Yeung, D.‐Y., Wong, W., & Woo, W. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems 28, 802–810.

• Shi, X., Gao, Z., Lausen, L., Wang, H., Yeung, D.‐Y., Wong, W., & Woo, W. (2017). Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model (pp. 1–11). Retrieved from http://arxiv.org/abs/1706.03458

• Zhang, W., Han, L., Sun, J., Guo, H., & Dai, J. (2017). Application of Multi‐channel 3D‐cube Successive Convolution Network for Convective Storm Nowcasting. ArXiv Preprint ArXiv:1702.04517, 1–9.


Deep Learning for Climate · Deep Learning for Climate 9 Fig. from(He 2016) 2018‐05‐25. Spatial tranformernetworks ... • SST as additional inputs 2018‐05‐25 Deep Learning

Documents