Precipitation Nowcasting Leveraging Deep Learning and HPC Systems to Optimize the Data Pipeline
Precipitation Nowcasting Leveraging Deep Learning and HPC Systems to Optimize the Data Pipeline
AMS 2018 Copyright 2018 Cray Inc. 2
Agenda
● Introduction● Motivation● Dataset● Prediction Modeling● Data Pipelines● Results ● Q&A
Introduction
● Nowcasting● Predict precipitation locations and rates at a
regional level over a short timeframe● Traditional Approach: Numerical Weather
Prediction● Requires an extended lead time between
newly acquired data and release of forecasts● Deep Learning
● Branch of Machine Learning based on Neural Networks
● Deep implies multiple layers of computation between inputs and output
● Pattern Matching
AMS 2018 Copyright 2018 Cray Inc. 3
Motivation
● Why is short-term nowcasting important?● Provide reliable ride-to-work forecasts● Predict rapid formation of severe precipitation events: Flash Flood Warning!
● Why Deep Learning?● Traditional Nowcasting relies on NWP, slow to respond to new data● Deep Learning learns from past rainfall patterns● Trained models are computationally cheap to utilize
● Why HPC?● A lot of data! Training can utilize decades of observations● Models can be trained and inferred for small regions in parallel
AMS 2018 Copyright 2018 Cray Inc. 4
Prototype Nowcasting System
● Small: 4 stations● KATX (Seattle), KTLX (Oklahoma City), KTLH (Tallahassee),
KBUF (Buffalo)● Variable sized dataset, as large as 7 years of historical
rainfall data● Total size (raw data): 4TB● Total size (processed): 684GB
● Examine Pipeline Performance and Bottlenecks● Explore Nowcasting Performance via Deep Learning
AMS 2018 Copyright 2018 Cray Inc. 5
Dataset Processing
6
Data Collection• Historical Radar Data
(NETCDF) • Geographical Region • Days with over 0.1 inches
of precipitation, info from NOAA – NCDC
• Radar scans every 5-10 minutes throughout the day
Transformation• Raw radial data structure
converted to evenly spaced Cartesian grid (Tensors with float 32)
• Resolution scaling and clipping
• Configure dimensionality• Sequencing• 2 channels –
Reflectivity, Velocity• Uses Py-ART package
Sampling• Time-series • Inputs and
Labels• Random
samplingBigDL
Framework• Apache Spark on
Urika-XC/GX• Implemented in
Jupyter notebooks and Python
AMS 2018 Copyright 2018 Cray Inc.
Prediction Modeling
● Convolutional Recurrent Neural Network● Convolutional Neural Network – Spatial Patterns● Recurrent Neural Network – Temporal Patterns● ConvLSTM – Convolutional Long Short-Term Memory Network
● Sequence to Sequence● Encoder Decoder● Use recent history to predict future changes
AMS 2018 Copyright 2018 Cray Inc. 7
Pipeline: Data Processing
AMS 2018 Copyright 2018 Cray Inc. 8
Distributed File-System
Feed-Forward
Compute Gradients
BigDL
Iterate on Dataset
Save As NumPy Arrays
Convert to RDDs
Pipeline: Distributed Training
AMS 2018 Copyright 2018 Cray Inc. 9
Distributed File-System
Hyper-Parameter
Optimization
Split Data By Region
KATX
KTLH
KBUF
KTLX…
Train Unique Networks for each region
Trained Networks
Idealized Training Timeline
● Station: KATX● Dataset size:
● 118,342 Sequences● 101GB
● Parameters● Systems:
● Data processing: Cray Urika-GX – 1024 cores
● Training: Cray CS-Storm –8 Nvidia P100 GPUs
AMS 2018 Copyright 2018 Cray Inc. 10
Process Wall-Time Proportion
Download 13 hours 32%
Spark 4 hours 10%
Training 24 hours 58%
Inference 10 seconds 0%
Scaling
● Tensorflow via Cray MPI Com. Plugin
● Nvidia Tesla P100 GPUs● Batchsize of 4 samples
per device● Throughput in
Samples/Second
AMS 2018 Copyright 2018 Cray Inc. 11
Device Count Throughput Scaling
Efficiency
1 25.8 1.0
2 51.6 1.0
4 102.7 .995
8 205.4 .995
16 410.5 .994
Model Performance
AMS 2018 Copyright 2018 Cray Inc. 12
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6
KTLH CSI and Average Error
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6
KTLX CSI and Average Error
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
KBUF CSI and Average Error
CSI-ConvLSTM CSI-PersistenceMAE-ConvLSTM MAE-Persistence
0.1
0.3
0.5
0.7
1 2 3 4 5 6
KATX CSI and Average Error
CSI-ConvLSTM CSI-PersistenceMAE-ConvLSTM MAE-Persistence
Effect of dataset size
● More data is correlated to higher performing prediction model
● Station: KATX (Seattle)
AMS 2018 Copyright 2018 Cray Inc. 13
0.44
0.442
0.444
0.446
0.448
0.45
0.452
0.454
0.456
0.458
0 1 2 3 4 5 6 7 8
Crit
ical
Suc
cess
Inde
xYears in Dataset
Average CSI for varying sized datasets
Years Size (GB) Sequences1 40 43,1713 109 118,3425 143 155,3377 217 235,301
Sample Prediction + Q/A
AMS 2018 Copyright 2018 Cray Inc. 14