Top Banner
Spark Technology Center Convolutional Neural Networks at Scale in MLlib Jeremy Nixon
38

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Jan 21, 2018

Download

Technology

MLconf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Convolutional Neural Networks at Scale in MLlib

Jeremy Nixon

Page 2: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

1. Machine Learning Engineer at the Spark Technology Center

2. Contributor to MLlib, dedicated to scalable deep learning.

3. Previously, studied Applied Mathematics to Computer Science and Economics at Harvard

Jeremy Nixon

Page 3: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Large Scale Data Processing● In-memory compute● Up to 100x faster than Hadoop

Improved Usability● Rich APIs in Scala, Java, Python● Interactive Shell

Page 4: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Spark’s Machine Learning Library● Alternating Least Squares● Lasso● Ridge Regression● Logistic Regression● Decision Trees● Naive Bayes● SVMs● …

MLlib

Page 5: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Part of Spark● Integrated Data Analysis●

Scalable

Python, Scala, Java APIs

MLlib

Page 6: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

● Deep Learning benefits from large datasets

● Spark allows for Large Scale Data Analysis

● Compute is Local to Data● Integrated into organization’s Spark

Jobs● Leverages existing compute cluster

Deep Learning in MLlib

Page 7: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Github Link: https://github.com/JeremyNixon/sparkdl

Spark Package:https://spark-packages.org/package/JeremyNixon/sparkdl

Links

Page 8: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

1. Framing Deep Learning2. MLlib Deep Learning API3. Optimization4. Performance5. Future Work6. Deep Learning Options on Spark7. Deep Learning Outside of Spark

Structure

Page 9: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

1. Structural Assumptions2. Automated Feature Engineering3. Learning Representations4. Applications

Framing Convolutional Neural Networks

Page 10: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Structural Assumptions:

Location Invariance

- Convolution is a restriction on the features that can be combined.

- Location Invariance leads to strong accuracy in vision, audio, and language.

colah.github.io

Page 11: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Structural Assumptions:

Hierarchical Abstraction

Page 12: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

- Pixels - Edges - Shapes - Parts - Objects- Learn features that are optimized for the

data- Makes transfer learning feasible

Structural Assumptions:

Hierarchical Abstraction

Page 13: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

- Character - Word - Phrase - Sentence- Phonemes - Words- Pixels - Edges - Shapes - Parts - Objects

Structural Assumptions:

Composition

Page 14: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

1. CNNs - State of the arta. Object Recognitionb. Object Localizationc. Image Segmentationd. Image Restoratione. Music Recommendation

2. RNNs (LSTM) - State of the Arta. Speech Recognitionb. Question Answeringc. Machine Translationd. Text Summarizatione. Named Entity Recognitionf. Natural Language Generation

g. Word Sense Disambiguationh. Image / Video Captioningi. Sentiment Analysis

Applications

Page 15: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

● Computationally Efficient● Makes Transfer Learning Easy● Takes advantage of location

invariance

Structural Assumptions:

Weight Sharing

Page 16: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

- Network depth creates an extraordinary range of possible models.

- That flexibility creates value in large datasets to reduce variance.

Structural Assumptions:

Combinatorial Flexibility

Page 17: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Automated Feature Engineering

- Feature hierarchy is too complex to engineer manually- Works well for compositional structure, overfits elsewhere

Page 18: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Learning Representations

Hidden Layer+

Nonlinearity

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Page 19: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Flexibility. High level enough to be efficient. Low level enough to be expressive.

MLlib Flexible Deep Learning API

Page 20: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Modularity enables Logistic Regression, Feedforward Networks.

MLlib Flexible Deep Learning API

Page 21: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

OptimizationModern optimizers allow for more efficient, stable training.

Momentum cancels noise in the gradient.

Page 22: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

OptimizationModern optimizers allow for more efficient, stable training.

RMSProp automatically adapts the learning rate.

Page 23: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Parallel implementation of backpropagation:

1. Each worker gets weights from master node.

2. Each worker computes a gradient on its data.

3. Each worker sends gradient to master.4. Master averages the gradients and

updates the weights.

Distributed Optimization

Page 24: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

● Parallel MLP on Spark with 7 nodes ~= Caffe w/GPU (single node).

● Advantages to parallelism diminish with additional nodes due to communication costs.

● Additional workers are valuable up to ~20 workers.

● See https://github.com/avulanov/ann-benchmark for more details

Performance

Page 25: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Github: https://github.com/JeremyNixon/sparkdl

Spark Package: https://spark-packages.org/package/JeremyNixon/sparkdl

Access

Page 26: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

1. GPU Acceleration (External)2. Python API3. Keras Integration4. Residual Layers5. Hardening6. Regularization7. Batch Normalization8. Tensor Support

Future Work

Page 27: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Deep Learning on Spark1. Major Projects

a. DL4Jb. BigDLc. Spark-deep-learningd. Tensorflow-on-Sparke. SystemML

2. Important Comparisons3. Minor & Abandoned Projects

a. H20AI DeepWaterb. TensorFramesc. Caffe-on-Sparkd. Scalable-deep-learninge. MLlib Deep Learningf. Sparknetg. DeepDist

Page 28: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

● Distributed GPU support for all major deep learning architectures○ CPU / Distributed CPU / Single GPU options exist○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec

● Actively Supported and Improved● APIs in Java, Scala, Python

○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end)○ Working towards becoming a Keras Backend

● Backed by Skymind (Committed)○ ~15 person startup, Adam Gibson + Chris Nicholson

● Modular front end in DL4J● Backed by linear algebra library ND4J

○ Numerical computing wrapper over BLAS for various backends

● Python API has Keras import / export● Production with proprietary ‘Skymind Intelligence Layer’

DL4J

Page 29: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

BigDL● Distributed CPU based library

○ Backed by Intel MKL / multithreading○ No benchmark out as yet

● Support for most major deep learning architectures○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove

● Backed by Intel (Committed)○ Actively Supported / Improved○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear.○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL

● Scala and Python API Support○ API Modeled after Torch

● Support for numeric computing via tensors

Page 30: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark-deep-learning● Databricks’ library focused on model serving, to allow scaled out inference● ‘Transfer Learning’ (Allows logistic regression layer to be retrained)● Python API

○ One-liner for integrating Keras model into a pipeline

● Supports Tensorflow models○ Keras Import for Tensorflow backed Keras Models

● Support for image processing only● Weakly Supported by Databricks

○ Last commit was a month ago○ Qualifying lines - “We will implement text processing, audio processing if there is interest”

Page 31: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setupa. Each executor launches a Caffe / TF instanceb. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s

ethernet model

2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout3. Allows for Model / Data parallelism4. Weakly supported by Yahoo

a. Caffe-on-spark hasn’t seen a commit in 6 monthsb. Tensorflow-on-spark gets about 2 minor commits / month

5. Yahoo demonstrated capability on large scale Flickr dataset6. Visualization with tensorboard

Caffe / Tensorflow -on-Spark

Page 32: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

SystemML● Deep Learning library with single-node GPU support, moving towards

distributed GPU support○ Supports CNNs for Classification, Localization, Segmentation○ Supports RNNs / LSTM

● Attached to linear algebra focused ML library w/ linear algebra compiler● Backed by IBM

○ Actively being Improved

● Provides CPU based support for most computer vision tasks○ Convolutional Networks

● Caffe2DML for caffe integration● DML API

○ SystemML has Python API for a handful of algorithms, may come out with Python DL API

Page 33: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Important ComparisonsFramework Hardware Supported Models API

DL4J CPU / GPU,Distributed CPU / GPU

CNNs, RNNs, Feedforward Nets,

Word2Vec

Java, Scala, Python

BigDL CPU / Distributed CPU CNNs, RNNs, Feedforward Nets

Scala, Python

Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs, Feedforward Nets

Python

Caffe / Tensorflow on Spark CPU / GPU,Distributed CPU / GPU

CNNs, RNNs, Feedforward Nets,

Word2Vec

Python

SystemML Deep Learning CPU, Towards GPU / Distrbuted GPU

CNNs, RNNS, Feedforward Nets

DML, Potentially Python

Page 34: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Important ComparisonsFramework Support Strength Goal Distinguishing Value

DL4J Skymind. Fully focused on package, but still a Startup.

Fully fledged Deep Learning solution from training to production

Comprehensive, Distributed GPU.

BigDL Intel. Fairly strong AI/DL commitment. Has Chainer, Nirvana.

Spark / Hadoop solution, bring DL to the data

Comprehensive

Spark-Deep-Learning Databricks, ambiguous level of commitment

Scaleout solution for TF users

Scaling out with Spark at inference time

Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark looks abandoned, TF-on Spark better.

Scaling out training on heterogenous hardware.

Scaling out training with distributed CPU /

GPU.

SystemML Deep Learning IBM team. Deep Learning Training solution

GPU Support, Moving towards Distributed

GPU Support.

Page 35: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Minor & Abandoned Projects1. H20AI DeepWater

a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platformb. Only native support is for feedforward networks

2. MXNet Integrationa. Nascent, few commits from Microsoft engineer

3. TensorFramesa. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month

4. Caffe-on-Sparka. No commits for ~6 months

5. Scalable-deep-learninga. Only supports feedforward networks / autoencoder, CPU based

6. MLlib Deep Learninga. Only supports feedforward networks, CPU based

7. Sparkneta. Abandoned, no commits for 18 months

Page 36: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Deep Learning Outside of Spark

Page 37: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Deep Learning Outside of Spark

Page 38: Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology Center

Thank you for your attention!

Questions?