Top Banner
Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud University of Toronto
42

Neural Ordinary Differential Equations

Oct 26, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Ordinary Differential Equations

Neural Ordinary Differential Equations

Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud

University of Toronto

Page 2: Neural Ordinary Differential Equations

Background: Ordinary Differential Equations (ODEs)

- Model the instantaneous change of a state.

(explicit form)

- Solving an initial value problem (IVP) corresponds to integration.

(solution is a trajectory)

- Euler method approximates with small steps:

Page 3: Neural Ordinary Differential Equations

Residual Networks interpreted as an ODE Solver- Hidden units look like:

- Final output is the composition:

Haber & Ruthotto (2017). E (2017).

Page 4: Neural Ordinary Differential Equations

Residual Networks interpreted as an ODE Solver- Hidden units look like:

- Final output is the composition:

- This can be interpreted as an Euler discretization of an ODE.

Haber & Ruthotto (2017). E (2017).

- In the limit of smaller steps:

Page 5: Neural Ordinary Differential Equations

Deep Learning as Discretized Differential EquationsMany deep learning networks can be interpreted as ODE solvers.

Network Fixed-step Numerical Scheme

ResNet, RevNet, ResNeXt, etc. Forward Euler

PolyNet Approximation to Backward Euler

FractalNet Runge-Kutta

DenseNet Runge-Kutta

Lu et al. (2017) Chang et al. (2018)Zhu et al. (2018)

Page 6: Neural Ordinary Differential Equations

Deep Learning as Discretized Differential EquationsMany deep learning networks can be interpreted as ODE solvers.

Network Fixed-step Numerical Scheme

ResNet, RevNet, ResNeXt, etc. Forward Euler

PolyNet Approximation to Backward Euler

FractalNet Runge-Kutta

DenseNet Runge-Kutta

Lu et al. (2017) Chang et al. (2018)Zhu et al. (2018)

But:(1) What is the underlying dynamics?(2) Adaptive-step size solvers provide better error handling.

Page 7: Neural Ordinary Differential Equations

“Neural” Ordinary Differential Equations

Instead of y = F(x),

Page 8: Neural Ordinary Differential Equations

Parameterize

“Neural” Ordinary Differential Equations

Instead of y = F(x), solve y = z(T) given the initial condition z(0) = x.

Page 9: Neural Ordinary Differential Equations

Parameterize

“Neural” Ordinary Differential Equations

Solve the dynamic using any black-box ODE solver.

- Adaptive step size.- Error estimate.- O(1) memory learning.

Instead of y = F(x), solve y = z(T) given the initial condition z(0) = x.

Page 10: Neural Ordinary Differential Equations

Backprop without knowledge of the ODE SolverUltimately want to optimize some loss

Page 11: Neural Ordinary Differential Equations

Backprop without knowledge of the ODE SolverUltimately want to optimize some loss

Naive approach: Know the solver. Backprop through the solver.- Memory-intensive.- Family of “implicit” solvers perform inner optimization.

Page 12: Neural Ordinary Differential Equations

Backprop without knowledge of the ODE SolverUltimately want to optimize some loss

Naive approach: Know the solver. Backprop through the solver.- Memory-intensive.- Family of “implicit” solvers perform inner optimization.

Our approach: Adjoint sensitivity analysis. (Reverse-mode Autodiff.)- Pontryagin (1962).

+ Automatic differentiation.+ O(1) memory in backward pass.

Page 13: Neural Ordinary Differential Equations

Continuous-time Backpropagation

Residual network.

Forward:

Backward:

Params:

Define:Adjoint method.

Page 14: Neural Ordinary Differential Equations

Continuous-time Backpropagation

Residual network.

Forward:

Backward:

Params:

Adjoint method.

Forward:

Define:

Page 15: Neural Ordinary Differential Equations

Continuous-time Backpropagation

Residual network.

Forward:

Backward:

Params:

Adjoint method.

Forward:

Backward:

Adjoint DiffEqAdjoint State

Define:

Page 16: Neural Ordinary Differential Equations

Continuous-time Backpropagation

Residual network.

Forward:

Backward:

Params:

Adjoint method.

Forward:

Backward:

Params:

Adjoint DiffEqAdjoint State

Define:

Page 17: Neural Ordinary Differential Equations

A Differentiable Primitive for AutoDiff

Forward:

Backward:

Page 18: Neural Ordinary Differential Equations

A Differentiable Primitive for AutoDiff

Forward:

Backward:

Page 19: Neural Ordinary Differential Equations

A Differentiable Primitive for AutoDiff

Reversible networks (Gomez et al. 2018) also only require O(1)-memory, but require very specific neural network architectures with partitioned dimensions.

Don’t need to store layer activations for reverse pass - just follow dynamics in reverse!

Page 20: Neural Ordinary Differential Equations

Reverse versus Forward Cost

- Empirically, reverse pass roughly half as expensive as forward pass.

-

- Adapts to instance difficulty.

-

- Num evaluations can be viewed as number of layers in neural nets.

NFE = Number of Function Evaluations.

Page 21: Neural Ordinary Differential Equations

Dynamics Become Increasingly Complex

- Dynamics become more demanding to compute during training.

- Adapts computation time according to complexity of diffeq.

In contrast, Chang et al. (ICLR 2018) explicitly add layers during training.

Page 22: Neural Ordinary Differential Equations

Continuous-time RNNs for Time Series Modeling- We often want arbitrary measurement times, ie. irregular time intervals.- Can do VAE-style inference with a latent ODE.

Page 23: Neural Ordinary Differential Equations

ODEs vs Recurrent Neural Networks (RNNs)

- RNNs learn very stiff dynamics, have exploding gradients.

-

- Whereas ODEs are guaranteed to be smooth.

Page 24: Neural Ordinary Differential Equations

Continuous Normalizing Flows

Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

Page 25: Neural Ordinary Differential Equations

Continuous Normalizing Flows

Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

- In other words,

Page 26: Neural Ordinary Differential Equations

Continuous Normalizing Flows

Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

- In other words,

With an invertible F:

Page 27: Neural Ordinary Differential Equations

Continuous Normalizing Flows

1D: 2D: Data Discrete-NF CNF

Page 28: Neural Ordinary Differential Equations

Is the ODE being correctly solved?

Page 29: Neural Ordinary Differential Equations

Stochastic Unbiased Log Density

Page 30: Neural Ordinary Differential Equations

Stochastic Unbiased Log Density

Can further reduce time complexity using stochastic estimators.

Grathwohl et al. (2019)

Page 31: Neural Ordinary Differential Equations

FFJORD - Stochastic Continuous Flows

Grathwohl et al. (2019)

MNIST - Model Samples CIFAR10 - Model Samples

Page 32: Neural Ordinary Differential Equations

Variational Autoencoders with FFJORD

Page 33: Neural Ordinary Differential Equations

ODE Solving as a Modeling PrimitiveAdaptive-step solvers with O(1) memory backprop.

github.com/rtqichen/torchdiffeq

Future directions we’re currently working on:

- Latent Stochastic Differential Equations.- Network architectures suited for ODEs.- Regularization of dynamics to require fewer evaluations.

Page 34: Neural Ordinary Differential Equations

Thanks!Yulia Rubanova Jesse Bettencourt David Duvenaud

Co-authors:

Page 35: Neural Ordinary Differential Equations

Extra Slides

Page 36: Neural Ordinary Differential Equations

Latent Space Visualizations

Page 37: Neural Ordinary Differential Equations

• Released an implementation of reverse-mode autodiff through black-box ODE solvers.

• Solves a system of size 2D + K + 1.

• In contrast, forward-mode implementation solves a system of size D^2 + KD.

• Tensorflow has Dormand-Prince-Shampine Runge-Kutta 5(4) implemented, but uses naive autodiff for backpropagation.

Page 39: Neural Ordinary Differential Equations

Explicit Error Control

- More fine-grained control than low-precision floats.

- Cost scales with instance difficulty.

NFE = Number of Function Evaluations.

Page 40: Neural Ordinary Differential Equations

Computation Depends on Complexity of Dynamics

- Time cost is dominated by evaluation of dynamics f.

NFE = Number of Function Evaluations.

Page 41: Neural Ordinary Differential Equations

Why not use an ODE solver as modeling primitive?- Solving an ODE is expensive.

Page 42: Neural Ordinary Differential Equations

Future Directions- Stochastic differential equations and Random ODEs. Approximates stochastic

gradient descent.- Scaling up ODE solvers with machine learning.- Partial differential equations.- Graphics, physics, simulations.