LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZURE Dr. Miguel Fierro Data Scientist at Microsoft @miguelgfierro [email protected] https://miguelgfierro.com Plymouth University | Jan 27, 2017 | Plymouth, UK
LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZUREDr. Miguel Fierro
Data Scientist at Microsoft
@[email protected]://miguelgfierro.com
Plymouth University | Jan 27, 2017 | Plymouth, UK
AZURE FOR RESEARCH AWARD
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Free Azure resources if awarded
Areas: data science, climate, health…
Ex: Alan Turing Institute got $5M
D a t a S c i e n c e V i r t u a l
M a c h i n eA z u re M L S t u d i o
S p a r k a n d H a d o o p
w i t h A z u re
OUTLINE
SPARK & HADOOP WITH AZURE
WHAT IS HDINSIGHT
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
HDInsightManaged Service
MANAGER GUI: AMBARI
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
APACHE HADOOP
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Software for storing and analysing
massive amounts (~Tb) of
structured and unstructured data
APACHE SPARK
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Framework that runs large-scale data analytics applications
pySpark, Spark (Scala), SparkR
100x faster than Hadoop (processing in memory)
APACHE KAFKA
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Stream processing for real time apps
Publisher & subscriber messaging system
Millions of messages per second
APACHE STORM
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Distributed framework for real-time applications
ETL, continuous computation, online machine learning
Million of operations per second in each node
APACHE HBASE
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Non-relational database (NoSQL) for Big Data applications
Distributed, fast tolerant and scalable
Built on top of HDFS (Hadoop Distributed File System)
APACHE HIVE
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
SQL-like language to query data in Hadoop systems
Word count program
EXAMPLE OF ARCHITECTURE
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
DEMO: PYSPARK APPLICATION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Log analysis with PySpark Predictive analysis on food inspection with PySpark
source: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-machine-learning-mllib-ipython
source: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-custom-library-website-log-analysis
AZURE ML STUDIO
WHAT IS AZURE ML STUDIO
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
GUI for Machine Learning
DATA INPUT/OUTPUT
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
DATA TRANSFORMATION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
DATA MANIPULATION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
FEATURE SELECTION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
CLASSIFICATION & REGRESSION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
TRAINING & SCORING
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
PYTHON & R SCRIPTS
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
AUTOMATIC API
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
DEMO: CREDIT RISK ANOMALY DETECTION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
source: https://gallery.cortanaintelligence.com/Experiment/1219e87f8fb84e88a2e1b54256808bb3
DATA SCIENCE VIRTUAL MACHINE
WHAT IS THE DSVM
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Windows:- Anaconda with python Jupyter notebooks- Microsoft R Server- Visual Studio- SQL Server- Azure SDK- Deep learning: CNTK & MXNet- Machine Learning: XGBoost
Linux:- Anaconda with python Jupyter notebooks- Microsoft R Server- PyCharm- Azure SDK- Deep learning: CNTK & MXNet- Machine Learning: XGBoost, Weka
DEEP LEARNING DSVM
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Libs:- CNTK- MXNet- TensorFlow- Keras
Digit recognition Image recognitionExamples:
NVIDIA TESLA K80
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
AI LANDSCAPE: IMAGES
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
15.4%
7.3%
6.7%
3.6%3.1%
5.1% (human)
error (%)
ImageNet (image recognition competition) top-5 error
AlexNet(2012)
VGG(2014)
Inception(2015)
ResNet(2015)
Inception-ResNet(2016)
AI LANDSCAPE: SPEECH
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Microsoft Research achieves parity with human speech level
source: http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition
CNN(VGG, ResNet, LACE)
RNN(Bi-LSTM)
Multi-GPU and multi server(1-bit Stochastic Gradient Descent)
IMAGE CLASSIFICATION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
1.
2.
3.
4.
5.
source: https://blogs.technet.microsoft.com/machinelearning/2016/11/15/imagenet-deep-neural-network-training-using-microsoft-r-server-and-azure-gpu-vms/
IMAGE CLASSIFICATION IMAGENET
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
source: https://blogs.technet.microsoft.com/machinelearning/2016/11/15/imagenet-deep-neural-network-training-using-microsoft-r-server-and-azure-gpu-vms/
Real class
Predicted class
TEXT CLASSIFICATION
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
Train
Backend
Dataset
Azure NC24 VM with 4 K80 GPUs
.R
model.params
Azure Cloud Services
.py
.js
.html
Score
Web app
API
DNN
input text
DEMO: TEXT CLASSIFICATION WEB APP
Plymouth University January 2017 - Dr. Miguel Fierro @miguelgfierro
LEVERAGING DATA DRIVEN RESEARCH THROUGH MICROSOFT AZUREDr. Miguel Fierro
Data Scientist at Microsoft
@[email protected]://miguelgfierro.com
Plymouth University | Jan 27, 2017 | Plymouth, UK