ML LAId bare Cambridge Wireless SIG Meeting Mary-Ann & Phil Claridge 23 November 2017 www.mandrel.com @MandrelSystems [email protected] 1 © 2017 Mandrel Systems www.mandrel.com @MandrelSystems
ML LAId bareCambridge Wireless SIG Meeting
Mary-Ann & Phil Claridge
23 November 2017
www.mandrel.com @MandrelSystems [email protected]
1
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Welcome To Our ToolboxOur Opinionated Views !
• ”Data” IDE
• Wrangling
• Mainline Exploration and Prototyping – For Programmers
• Supporting Cast
• Up and Coming – Hard Thinking
• Datasets & Kaggle
• Demos
2
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Data “IDE”
H2O
Weka
Honorable mention:
BigML (free to use, not open source)
3
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H20• Good
• Good analysis of performance of generated AI/ML. • Some ML knowledge required
to interpret terminology and results.
• Easy install (local or cloud)• Dowload, unzip• java –jar h2o.jar
• Sparking Water provides integration with Spark for large Data Sets.
• Bad• Model generation focused on
Java
• Recommended • Excellent results for many
commercial projects from open source.
• Demo later !4
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Installing H20http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/index.html
5
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Weka• Good
• Lots of basic data wrangling with no code.
• E.g String to vector.
• High degree of control to evaluate different ML algorithms
• Auto Weka
• Bad• Now a little dated• Focus on traditional Machine Learning
• Recommended • Learn how ML works, and ML algorithms
without coding.• Some quick and dirty wrangling.• Good set of free training videos focus on
ML not programming.
• https://www.cs.waikato.ac.nz/ml/index.html
6
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Aside: BigML(not open source)
• Good• Point and click model building• Download ready to use models in most
languages including Python, C#, Java, Javascript and Excel!
• Decision tree (inc Ensemble) + Neural Nets• Fantastic graphics.
• Bad• Not open source (but free to use for small
data sets).
• Recommended • Three days pay for three hours work
• For many commercial applications.
• Explaining basic ML to non technical audience.
• White box vs black box.
• Sufficient for poor quality or low volume data where more sophisticated tools no benefit
7
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Wrangling
Small: Python + Pandas
Large: Python + Spark
Large, complex and production: Scala + Spark
Honorable supporting cast: Anaconda: Pre-built Python environment
Parquet: Database table as a file. Fast & small
Intellij: Commercial IDE Python - Java, Scala, Javascript, Python Web
8
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Programming Basic ML + Neural Nets
• Python and …
• Scikit
• Tensorflow (+ Keras)
• Scala …• Spark.ML
9
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
AI GymReinforcement Learn
• Reinforcement learning
• Go play.
• A very different kind of AI
• No demo today !
• https://gym.openai.com/docs/
10
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Demo Data - Pima Indians Diabetes Data Set https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
1. Number of times pregnant2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test3. Diastolic blood pressure (mm Hg)4. Triceps skin fold thickness (mm)5. 2-Hour serum insulin (mu U/ml)6. Body mass index (weight in kg/(height in m)^2)7. Diabetes pedigree function8. Age (years)9. Class variable (0 or 1)
11
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
For more data sets: Kaggle !
Model Accuracy
Tool Model Accuracy
H20 Ten level tree 97%
H20 Neural Net 97.5%
BigML 48 Node Decision Tree 82.7%
BigML Neural Net (shallow) 77%
Keras 3 Layer Simple Neural Net 77.73%
Keras Deeper neural net To follow
12
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Next
• Demos• H20
• Keras
• Q&A
13
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Screen Shots
14
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
Demo H2O - import PIMA Indian diabetes data set, and build decision tree
H2O
15
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
16
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
17
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
18
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
19
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
20
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
21
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
22
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
H2O
23
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML – same process as H2O demo above
24
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
25
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
26
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
27
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
28
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
29
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
30
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
31
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
32
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
33
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
34
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
35
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
36
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems
BigML
37
© 2017 Mandrel Systems www.mandrel.com @MandrelSystems