Practical Machine Learning and Rails Part1

Practical Machine Learning and Rails

Who are we?Andrew Cantino

VP Engineering, Mavenlink @tectonic

Ryan StoutFounder, Agile Productions @ryanstout

This talk will

- have examples

- introduce machine learning

- make you ML-aware

This talk will not

- cover collaborative filtering,

optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...

- give you a PhD

- implement algorithms

What is Machine Learning?

Many different algorithms

that predict data

from other data

using applied statistics.

"Enhance and rotate 20 degrees"

What data?

The web is data.

LogsDatabases

Streams

Clicktrails

A/B Tests

Browser versions

User decisions

Reviews

Okay. We have data.

What do we do with it?

We classify it.

Classification

OR:) :(

Classification

• Documentso Sort email (Gmail's importance filter)o Route questions to appropriate expert (Aardvark)o Categorize reviews (Amazon)

• Userso Expertise; interests; pro vs free; likelihood of paying;

expected future karma

• Eventso Abnormal vs. normal

Algorithms:Decision Tree Learning

Email contains word "viagra"

Email contains attachment?

Email contains word "Ruby"

P(Spam)=5%P(Spam)=10% P(Spam)=70% P(Spam)=95%

yes yes

Labels

Features

Algorithms:Support Vector Machines (SVMs)

Graphics from Wikipedia

Algorithms:Support Vector Machines (SVMs)

Algorithms:Naive Bayes

• Break documents into words and treat each word as an independent feature

• Surprisingly effective on simple text and document classification

• Works well when you have lots of data

Algorithms:Naive Bayes

You received 100 emails, 70 of which were spam.

Word Spam with this word Ham with this word

viagra 42 (60%) 1 (3.3%)

ruby 7 (10%) 15 (50%)

hello 35 (50%) 24 (80%)

A new email contains hello and viagra. The probability that it is spam is:

P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82%

Algorithms:Neural Nets

Input layer (features)

Hidden layer

Output layer (Classification)

Curse of Dimensionality

http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg

The more features and labels that you have, the more data that you need.

Overfitting• With enough parameters, anything is

possible.

• We want our algorithms to generalize and infer, not memorize specific training examples.

• Therefore, we test our algorithms on different data than we train them on.

Practical Machine Learning and Rails Part1

genetic algorithms

algorithms ondifferent

word spam

machine learning

noyes pspam

new email

itis spam

advanced statistics

Technology

Rails Devise CanCan Rails-Devise-CanCan-BootstrapBootstrap

Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4...

Migrating Legacy Rails Apps to Rails 3 - O'Reilly...

Introduction à Rails - Paris on Rails 2006

Conferencia Rails: Integracion Continua Y Rails

Understanding Thy Neighbors: Practical...

[Practical] Functional Programming in Rails

COVID19 Practical Guide LV Part1

Practical stress analysis part1

EXANTE: Practical aspects of algorithmic trading. Bitcoin...

Rails API in Rails 5

Control of Rolling Contact Fatigue of Rails - AREMA … ·....

Ruby On Rails - 3. Rails Addons

Rochester on Rails: Introduction to Rails

Ruby on Rails Creating a Rails Application

Katalog Keramicke Plocice 2011 Part1 Part1