Top Banner
It’s Not Magic Brian Lange, Data Scientist + Partner at Explaining classification algorithms
133

It's Not Magic - Explaining classification algorithms

Jan 14, 2017

Download

Data & Analytics

Brian Lange
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: It's Not Magic - Explaining classification algorithms

It’s Not Magic

Brian Lange, Data Scientist + Partner at

Explaining classification algorithms

Page 2: It's Not Magic - Explaining classification algorithms

HEADS UP

Page 3: It's Not Magic - Explaining classification algorithms
Page 4: It's Not Magic - Explaining classification algorithms

I work with some really freakin’ smart people.

Page 5: It's Not Magic - Explaining classification algorithms

classification algorithms

Page 6: It's Not Magic - Explaining classification algorithms

popular examples

Page 7: It's Not Magic - Explaining classification algorithms

popular examples

-spam filters

Page 8: It's Not Magic - Explaining classification algorithms

popular examples

-spam filters

Page 9: It's Not Magic - Explaining classification algorithms

popular examples

-spam filters

-the Sorting Hat

Page 10: It's Not Magic - Explaining classification algorithms

things to know

Page 11: It's Not Magic - Explaining classification algorithms

things to know

- you need data labeled with the correct answers to “train” these algorithms before they work

Page 12: It's Not Magic - Explaining classification algorithms

things to know

- you need data labeled with the correct answers to “train” these algorithms before they work

- feature = dimension = column = attribute of the data

Page 13: It's Not Magic - Explaining classification algorithms

things to know

- you need data labeled with the correct answers to “train” these algorithms before they work

- feature = dimension = column = attribute of the data

- class = category = label = Harry Potter house

Page 14: It's Not Magic - Explaining classification algorithms

BIG CAVEATOften times choosing/creating good features or gathering more data will help more than changing algorithms...

Page 15: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names spam

not spam

Page 16: It's Not Magic - Explaining classification algorithms

Linear discriminants

Page 17: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 18: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 19: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 20: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

1 wrong

Page 21: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

5 wrong

Page 22: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

4 wrong

Page 23: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

4 wrong

y = .01x+4

Page 24: It's Not Magic - Explaining classification algorithms

terri

blen

ess

slopeintercept

a map of terriblenessto find the least terrible line

Page 25: It's Not Magic - Explaining classification algorithms

terri

blen

ess

slopeintercept

a map of terriblenessto find the least terrible line

Page 26: It's Not Magic - Explaining classification algorithms

terri

blen

ess

slopeintercept

“gradient descent”

Page 27: It's Not Magic - Explaining classification algorithms

terri

blen

ess

slopeintercept

“gradient descent”

Page 28: It's Not Magic - Explaining classification algorithms

training data

Page 29: It's Not Magic - Explaining classification algorithms

training data

import numpy as np X = np.array([[1, 0.1], [3, 0.2], [5, 0.1]…]) y = np.array([1, 2, 1])

Page 30: It's Not Magic - Explaining classification algorithms

training data

Page 31: It's Not Magic - Explaining classification algorithms

training data

Page 32: It's Not Magic - Explaining classification algorithms

training data

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis model = LinearDiscriminantAnalysis() model.fit(X, y)

Page 33: It's Not Magic - Explaining classification algorithms

training data

trained model

Page 34: It's Not Magic - Explaining classification algorithms

new data point

trained model

Page 35: It's Not Magic - Explaining classification algorithms

trained model

Page 36: It's Not Magic - Explaining classification algorithms

trained model

new_point = np.array([1, .3])

Page 37: It's Not Magic - Explaining classification algorithms

trained model

new_point = np.array([1, .3])print(model.predict(new_point))

Page 38: It's Not Magic - Explaining classification algorithms

trained model

new_point = np.array([1, .3])print(model.predict(new_point))1

Page 39: It's Not Magic - Explaining classification algorithms

trained model

new_point = np.array([1, .3])print(model.predict(new_point))1

not spam

prediction

Page 40: It's Not Magic - Explaining classification algorithms

trained model

not spam

prediction

Page 41: It's Not Magic - Explaining classification algorithms

Logistic regression

Page 42: It's Not Magic - Explaining classification algorithms

logistic regression“divide it with a logistic function”

Page 43: It's Not Magic - Explaining classification algorithms

logistic regression“divide it with a logistic function”

Page 44: It's Not Magic - Explaining classification algorithms

logistic regression“divide it with a logistic function”

from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X,y) predicted = model.predict(z)

Page 45: It's Not Magic - Explaining classification algorithms

Support Vector Machines

(SVM)

Page 46: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

Page 47: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

- better definition of “terrible”

Page 48: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

- better definition of “terrible”- lines can turn into non-linear shapes if you transform your data

Page 49: It's Not Magic - Explaining classification algorithms
Page 50: It's Not Magic - Explaining classification algorithms
Page 51: It's Not Magic - Explaining classification algorithms

💩

Page 52: It's Not Magic - Explaining classification algorithms
Page 53: It's Not Magic - Explaining classification algorithms

💩

Page 54: It's Not Magic - Explaining classification algorithms
Page 55: It's Not Magic - Explaining classification algorithms

“the kernel trick”

Page 56: It's Not Magic - Explaining classification algorithms

“the kernel trick”

Page 57: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

figure credit: scikit-learn documentation

Page 58: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 59: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 60: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 61: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 62: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 63: It's Not Magic - Explaining classification algorithms

% of email body that is all-caps

# mentions of brand names

Page 64: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

from sklearn.svm import SVC model = SVC(kernel='poly', degree=2) model.fit(X,y) predicted = model.predict(z)

Page 65: It's Not Magic - Explaining classification algorithms

SVMs (support vector machines)“*advanced* draw a line through it”

from sklearn.svm import SVC model = SVC(kernel='rbf') model.fit(X,y) predicted = model.predict(z)

Page 66: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)

Page 67: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

Page 68: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=1

Page 69: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=2

Page 70: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

figure credit: scikit-learn documentation

Page 71: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=1

Page 72: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=1

Page 73: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=2

Page 74: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

k=3

Page 75: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

figure credit: Burton DeWilde

Page 76: It's Not Magic - Explaining classification algorithms

KNN (k-nearest neighbors)“what do similar cases look like?”

from sklearn.neighbors import NearestNeighbors model = NearestNeighbors(n_neighbors=5) model.fit(X,y) predicted = model.predict(z)

Page 77: It's Not Magic - Explaining classification algorithms

Decision tree learners

Page 78: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

Page 79: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

x < 3?yes no

3

Page 80: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

x < 3?yes no

y < 4?yes no

3

4

Page 81: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

x < 3?yes no

y < 4?yes no

x < 5?yes no

3 5

4

Page 82: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

x < 3?yes no

y < 4?yes no

x < 5?yes no

3 5

4

Page 83: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() model.fit(X,y) predicted = model.predict(z)

Page 84: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

sklearn.tree.export_graphviz() + pydot

Page 85: It's Not Magic - Explaining classification algorithms

decision tree learnersmake a flow chart of it

Page 86: It's Not Magic - Explaining classification algorithms

Ensemble models

(make a bunch of models and combine them)

Page 87: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

Page 88: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

Page 89: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

Page 90: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

new data point

Page 91: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

new data point

Page 92: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

new data point

not spam

spam

not spam

Page 93: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

new data point

not spam

spam

not spam

not spam

Final Answer:

Page 94: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

Page 95: It's Not Magic - Explaining classification algorithms

baggingsplit training set, train one model each, models “vote”

Page 96: It's Not Magic - Explaining classification algorithms

other spins on this

Page 97: It's Not Magic - Explaining classification algorithms

other spins on this

Random Forest - like bagging, but at each split randomly constrain features to choose from

Page 98: It's Not Magic - Explaining classification algorithms

other spins on this

Random Forest - like bagging, but at each split randomly constrain features to choose from

Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees

Page 99: It's Not Magic - Explaining classification algorithms

other spins on this

Random Forest - like bagging, but at each split randomly constrain features to choose from

Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees

Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.

Page 100: It's Not Magic - Explaining classification algorithms

other spins on this

Random Forest - like bagging, but at each split randomly constrain features to choose from

Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees

Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.

Boosting- train models in order, make the later ones focus on the points the earlier ones missed

Page 101: It's Not Magic - Explaining classification algorithms

voting example

figure credit: scikit-learn documentation

Page 102: It's Not Magic - Explaining classification algorithms

other spins on this

Random Forest - like bagging, but at each split randomly constrain features to choose from

Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees

Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.

Boosting- train models in order, make the later ones focus on the points the earlier ones missed

Page 103: It's Not Magic - Explaining classification algorithms

from sklearn.ensemble import BaggingClassifierRandomForestClassifier ExtraTreesClassifier VotingClassifier AdaBoostClassifier GradientBoostingClassifier

Page 104: It's Not Magic - Explaining classification algorithms
Page 105: It's Not Magic - Explaining classification algorithms

which one do I pick?

Page 106: It's Not Magic - Explaining classification algorithms

which one do I pick?

try a few!

Page 107: It's Not Magic - Explaining classification algorithms
Page 108: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Page 109: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

Page 110: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

Page 111: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

KNN yes kinda (percent of nearby points)

no

Page 112: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

KNN yes kinda (percent of nearby points)

no

Naïve Bayes yes yes no

Page 113: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

KNN yes kinda (percent of nearby points)

no

Naïve Bayes yes yes no

Decision Tree yes no yes (number of times that feature is used)

Page 114: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

KNN yes kinda (percent of nearby points)

no

Naïve Bayes yes yes no

Decision Tree yes no yes (number of times that feature is used)

Ensemble models yes kinda (% of models that agree)

yes, depending on component parts

Page 115: It's Not Magic - Explaining classification algorithms

Nonlinear decision boundary

provide probability estimates

tell how important a feature is to the model

Logistic Regression no yes yes, if you scale

SVMs yes, with kernel no no

KNN yes kinda (percent of nearby points)

no

Naïve Bayes yes yes no

Decision Tree yes no yes (number of times that feature is used)

Ensemble models yes kinda (% of models that agree)

yes, depending on component parts

Boosted models yes kinda (% of models that agree)

yes, depending on component parts

Page 116: It's Not Magic - Explaining classification algorithms
Page 117: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Page 118: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

Page 119: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

Page 120: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

KNN yes yes

Page 121: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

KNN yes yes

Naïve Bayes yes yes

Page 122: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

KNN yes yes

Naïve Bayes yes yes

Decision Tree no no (but it’s very fast)

Page 123: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

KNN yes yes

Naïve Bayes yes yes

Decision Tree no no (but it’s very fast)

Ensemble models kinda, by adding new models to the ensemble

yes

Page 124: It's Not Magic - Explaining classification algorithms

can be updated with new training data

easy to parallelize?

Logistic Regression kinda kinda

SVMs kinda, depending on kernel yes for some kernels, no for others

KNN yes yes

Naïve Bayes yes yes

Decision Tree no no (but it’s very fast)

Ensemble models kinda, by adding new models to the ensemble

yes

Boosted models kinda, by adding new models to the ensemble

no

Page 125: It's Not Magic - Explaining classification algorithms

Other quirks

Page 126: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

Page 127: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)

Page 128: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)

Naïve Bayes have to choose the distribution can deal with missing data

Page 129: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)

Naïve Bayes have to choose the distribution can deal with missing data

Decision Tree can provide literal flow charts very sensitive to outliers

Page 130: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)

Naïve Bayes have to choose the distribution can deal with missing data

Decision Tree can provide literal flow charts very sensitive to outliers

Ensemble models less prone to overfitting than their component parts

Page 131: It's Not Magic - Explaining classification algorithms

Other quirksSVMs have to pick a kernel

KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)

Naïve Bayes have to choose the distribution can deal with missing data

Decision Tree can provide literal flow charts very sensitive to outliers

Ensemble models less prone to overfitting than their component parts

Boosted models many parameters to tweak more prone to overfit than normal ensembles most popular Kaggle winners use these

Page 132: It's Not Magic - Explaining classification algorithms

if this sounds cool

datascope.co/careers

Page 133: It's Not Magic - Explaining classification algorithms

thanks!

question time…

.cohttp:// @bjlange