Top Banner
Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77
78

Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Apr 21, 2019

Download

Documents

vuongcong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Machine Learning

Ludovic Samper

Antidot

September 1st, 2015

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77

Page 2: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Antidot

Software vendor since 1999

Paris, Lyon, Aix-en-Provence

45 employees

Founders : Fabrice Lacroix CEO, Stephane Loesel CTO, JeromeMainka Chief Scientist Officer

Software products and solutions

Antidot Finder Suite (AFS) search engine

Antidot Information Factory (AIF) a pipe & filters framework

SaaS, Hosted License, 0n-site License

50% of the revenue invested in R&D

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 2 / 77

Page 3: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Antidot

Machine Learning

Automatic text document classification

Named Entity Extraction

Compound Splitter (for german words)

Clustering algorithm (for news agregation)

Open Data, Semantic Web

http://www.rechercheisidore.fr/ Social Sciences andHumanities research platform. Enriched with open resources

https://github.com/antidot/db2triples/ open source libraryto export a db in RDF

Antidot is a Partner organization in WDAqua project

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77

Page 4: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Tutorial

Study a classical task in Machine Learning : text classification

Show scikit-learn.org Python machine learning library

Follow the “Working with text data” tutorial :http://scikit-learn.org/stable/tutorial/text_analytics/

working_with_text_data.html

Additional material on http://blog.antidot.net/

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 4 / 77

Page 5: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Summary of the tutorial

1 Problem definitionSupervised classificationEvaluation metrics

2 Extracting features from text filesBag of words modelTerm frequency inverse document frequency (tfidf)

3 Algorithms for classificationNaıve BayesSupport Vector Machine (SVM)Tuning parameters

Cross validationGrid search

4 ConclusionMethodology

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 5 / 77

Page 6: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definitionSupervised classificationEvaluation metrics

2 Extracting features from text files

3 Algorithms for classification

4 Conclusion

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 6 / 77

Page 7: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

20 newsgroups dataset

http://qwone.com/~jason/20Newsgroups/

20 newsgroups

20 newsgroups documents collected in the 90’s

The label is the newsgroup the document belongs to

A popular collection

18846 documents : 11314 in train, 7532 in test

wiss-ml.ipynb#The-20-newsgroups-dataset

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 7 / 77

Page 8: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Classification

Problem statement

One label per document

Automatically determine the label of an unseen document. Set ofdocuments and their labels

A supervised classification problem

Training

Set of documents and their labels

Build a model

Inference

Given a new document, use the model to predict its label

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 8 / 77

Page 9: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Precision and Recall I

Binary classification

C C

Labeled C TP True Positive FP False Positive

Not labeled C FN False Negative TN True Negative

Precision

TP

TP + FP

Proba(e ∈ C |e labeled C )

Recall

TP

TP + FN

Proba(e labeled C |e ∈ C )

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 9 / 77

Page 10: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Precision and Recall II

F1

F1 = 2P × R

P + R

Harmonic mean of Precision and Recall

Accuracy

TP + TN

TP + TN + FP + FN

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 10 / 77

Page 11: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multiclass I

NC = number of class

Macro Average

Bmacro =1

NC

NC∑k=1

(Bbinary (TPk ,FPk ,TNk ,FNk))

Average mesure by class. Large classes count has much as small ones.

Micro Average

Bmicro = Bbinary (

NC∑k=1

TPi ,

NC∑k=1

FPi ,

NC∑k=1

TNk ,

NC∑k=1

FNk)

Average mesure by instance

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 11 / 77

Page 12: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multiclass II

Micro average in single label multiclass

NC∑k=1

(FNk) =

NC∑k=1

(FPk)

andNC∑k=1

(TNk) =

NC∑k=1

(TPk)

Then,

Precisionmicro = Recallmicro = Accuracy =

∑NCk=1(TPk)

Nbdoc

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 12 / 77

Page 13: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definition

2 Extracting features from text filesBag of words modelTerm frequency inverse document frequency (tfidf)

3 Algorithms for classification

4 Conclusion

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 13 / 77

Page 14: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Bag of words

From text to features

Count the number of occurrences of words in text

“bag” because position isn’t taken into account

Extensions

Remove stop words

Remove too frequent words (max_df)

lowercase

Ngram (ngram_range) tokenize ngrams instead of words. Useful totake into account word positions

wiss-ml.ipynb#Bag-of-words

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 14 / 77

Page 15: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Term frequency inverse document frequency (tfidf) I

Intuition

Take into account relative importance of each word regarding the wholedatasetIf a word occurs in every document, it doesn’t hold any information

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 15 / 77

Page 16: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Term frequency inverse document frequency (tfidf) II

Definition

Term frequency × inverse document frequency

tfidf (w , d) = tf (w , d)× idf (w , d)

tf (w , d) = term frequency(word w in doc d)

idf (w) = log(Ndoc

doc freq(w))

In scikit-learn :

tfidf (w , d) = tf (w , d)× (idf (w) + 1)

Terms that occurs in all documents idf = 0 will not be ignored

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 16 / 77

Page 17: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Term frequency inverse document frequency (tfidf) III

Options

Normalisation ||doc|| = 1. Ex, for norm L2,∑

w∈d tfidf(w , d)2 = 1

Smoothing : add one to document frequencies as if an extra doccontained every term in the collection exactly once

idf (w) = log(Ndoc + 1

doc freq(w) + 1)

Example

Show most significants words of a doc wiss-ml.ipynb#Tfidf

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 17 / 77

Page 18: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definition

2 Extracting features from text files

3 Algorithms for classificationNaıve BayesSupport Vector Machine (SVM)Tuning parameters

Cross validationGrid search

4 Conclusion

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 18 / 77

Page 19: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Supervised classification problem I

Notations

x = (x1, · · · , xn) = (xi )0≤i<n feature vector

{(xd , yd)}0≤d<D the training set

∀i , xi ∈ Rn

xi feature vector for document in dimension of the feature space

∀d , yd ∈ {1, · · · ,NC}NC the number of classesyd the class of document d

y class predictionFor a new vector x , y is the predicted class of x .

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 19 / 77

Page 20: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Supervised classification problem II

Goal

Find a function F :

Rn → {1, · · · ,NC}x 7→ y

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 20 / 77

Page 21: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

In 20newsgroups I

Values in 20 newsgroups

n = 130107 nb features (number of unique terms)

D = 11314 training samples

NC = 20 different classes

Goal

Find a function F that given a new document predicts its class

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 21 / 77

Page 22: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Naıve Bayes Algorithm I

Bayes’ theorem

P(A|B) =P(B|A)P(A)

P(B)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 22 / 77

Page 23: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Naıve Bayes Algorithm II

Posterior probability of class C

P(C |x) =P(x |C )P(C )

P(x)

P(x) does not depend on C ,

P(C |x) ∝ P(x |C )P(C )

Naıve Bayes independent assumption : each feature i is conditionallyindependent of every other feature j

P(C |x) ∝ P(C )×n∏

i=1

P(xi |C )

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 23 / 77

Page 24: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Naıve Bayes Algorithm III

Classifier from the probability model

y = arg maxk∈{1,··· ,NC}

P(y = k)×n∏

i=0

P(xi |y = k)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 24 / 77

Page 25: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Parameter estimation in Naıve Bayes’ classifier

Prior of a class

P(y = k) =nb samples in class k

total nb samples

Can also be uniform : P(y = k) = 1NC

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 25 / 77

Page 26: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multinomial Naıve Bayes I

Naıve Bayes

P(x |y = k) =∏n

i=1 P(xi |y = k)

Multinomial distribution

Event word is i follows a multinomial distribution with parameters(p1, · · · , pn) where pi = P(word = i)

P(x1, · · · , xn) =n∏

i=1

pxii

Where∑

i pi = 1.pi = P(w = i)One distribution for each class y .

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 26 / 77

Page 27: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multinomial Naıve Bayes II

Multinomial Naıve Bayes

One multinomial distribution for each class

P(i |y = k) =sum of occurrences of word xi in class k

total nb words in class k

=

∑d∈k xi∑

0≤j<n

∑d∈k xj

With smoothing,

P(i |y = k) =

∑d∈k xi + α∑

0≤j<n

∑d∈k xj + αn

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 27 / 77

Page 28: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multinomial Naıve Bayes III

Inference in Multinomial Naıve Bayes

y = arg maxk

P(y = k |x)

= arg maxk

P(y = k)∏

0≤i<n

P(i |y = k)xi

= arg maxk

(log(P(y = k)) +

∑0≤i<n

xi log(P(i |y = k)))

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 28 / 77

Page 29: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multinomial Naıve Bayes IV

A linear model

In the log space,

(log P(y = k|x))k ∝ W0 + W T .x

W0, is the vector of priors :

W0 = log(P(y = k))

W is the matrix of distributions :

W = (wik), i ∈ [1, n], k ∈ [1,NC ]

wik = log P(i |y = k)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 29 / 77

Page 30: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Multinomial Naıve Bayes V

Example step-by-step

http://www.antidot.net/wiss2015/wiss-ml.html#Naive-Bayes

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 30 / 77

Page 31: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definition

2 Extracting features from text files

3 Algorithms for classificationNaıve BayesSupport Vector Machine (SVM)Tuning parameters

Cross validationGrid search

4 Conclusion

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 31 / 77

Page 32: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

A linear classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 32 / 77

Page 33: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

A linear classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 33 / 77

Page 34: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

A linear classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 34 / 77

Page 35: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

A linear classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 35 / 77

Page 36: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

A linear classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 36 / 77

Page 37: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, notations

Problem

S, training set{(xi , yi ), xi ∈ Rn, yi ∈ {−1, 1}}i∈0..D

Find a linear function 〈w , xi 〉+ b such that :

sign(〈w , xi 〉+ b) = yi

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 37 / 77

Page 38: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

SVM, maximum margin classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 38 / 77

Page 39: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Margin

distance(x+, x−) = 〈 w

||w ||, x+ − x−〉

=1

||w ||(〈w , x+〉 − 〈w , x−〉)

=1

||w ||((〈w , x+〉+ b)− (〈w , x−〉+ b))

=1

||w ||(1− (−1))

=2

||w ||

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 39 / 77

Page 40: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

SVM, maximum margin classifier

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 40 / 77

Page 41: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Solving an optimization problem using the Lagrangien

Primal problem

minimizew ,bf (w , b)

Under the constraints, hi (w , b) ≥ 0

Lagrange function

L(w , b, α) = f (w , b)−∑i

αihi (w , b)

Let, g(α) = inf(w ,b) L(w , b, α)∀w , b, g(α) ≤ L(w , b, α)Moreover, L(w , b, α) ≤ f (w , b)Thus, ∀αi ≥ 0, g(α) ≤ minw ,b f (w , b)And with Karush Kuhn Tucker (KKT) optimality condition,

maxα

g(α) = minw ,b

f (w , b)⇔ αihi (w , x) = 0

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 41 / 77

Page 42: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, problem

Primal problem

minimize(w ,b)||w ||2

2

Under the constraints, ∀0 < i ≤ D, yi (〈w , xi 〉+ b) ≥ 1

Lagrange function

L(w , b, α) =1

2||w ||2 −

∑i

αi (yi (〈w , xi 〉+ b)− 1)

Dual problem :maximize(w ,b,α)L(w , b, α)

with αi ≥ 0Optimality in w, b is a saddle point with α

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 42 / 77

Page 43: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, problem

Derivative in w, b need to vanish

∂wL(w , b, α) = w −

∑i

αiyixi = 0

∂bL(w , b, α) =

∑i

αiyi = 0

Dual problem

maximizeα −1

2

∑i ,j

αiαjyiyj〈xi , xj〉+∑i

αi

under the constraints, { ∑i αiyi = 0

αi ≥ 0

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 43 / 77

Page 44: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vectors

Support vectors

w =∑i

yiαixi

Karush Kuhn Tucker (KKT) optimality condition

Lagrange multiplier times constraint equals zero

αi (yi (〈w , xi 〉+ b)− 1) = 0

Thus, {αi = 0αi > 0⇒ yi (〈w , xi 〉+ b) = 1

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 44 / 77

Page 45: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Experiments with separable space

SVMvaryingC.ipynb

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 45 / 77

Page 46: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

What happens if space is not separable

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 46 / 77

Page 47: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Adding slack variable

Problem was

minimize(w ,b)||w ||2

2

With,yi (w .xi + b) ≥ 1

With slack

minimize(w ,b)||w ||2

2+ C

∑i

ξi

With, {yi (w .xi + b) ≥ 1− ξiξi ≥ 0

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 47 / 77

Page 48: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, without slack

Primal problem

minimize(w ,b)||w ||2

2

With,yi (w .xi + b) ≥ 1

Lagrange function

L(w , b, α) =1

2||w ||2 −

∑i

αi (yi (〈w , xi 〉+ b)− 1)

Dual problem :maximize(w ,b,α)L(w , b, α)

Optimality in w , b, is a saddle point with α

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 48 / 77

Page 49: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, with slack

Primal problem

minimize(w ,b)||w ||2

2+ C

∑i

ξi

With, {yi (w .xi + b) ≥ 1− ξiξi ≥ 0

Lagrange function

L(w , b, ξ, α, η) =1

2||w ||2 + C

∑i

ξi −∑i

αi (yi (〈xi ,w〉+ b) + ξi − 1)−∑i

ηiξi

Dual problem :maximize(w ,b,ξ,α,η)L(w , b, ξ, α, η)

Optimality in w , b, ξ is a saddle point with α, η

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 49 / 77

Page 50: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, problem

Derivative in w, b, ξ need to vanish

∂wL(w , b, ξ, α, η) = w −

∑i

αiyixi = 0

∂bL(w , b, ξ, α, η) =

∑i

αiyi = 0

∂ξL(w , b, ξ, α, η) = C − αi − ηi = 0⇒ ηi = C − αi

Dual problem

maximizeα −1

2

∑i ,j

αiαjyiyj〈xi , xj〉+∑i

αi

under the constraints,∑

i αiyi = 0 and 0 ≤ αi ≤ C

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 50 / 77

Page 51: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vectors

Support vectors

w =∑i

yiαixi

Karush Kuhn Tucker (KKT) optimality condition

Lagrange multiplier times constraint equals zero

αi (yi (〈w , xi 〉+ b) + ξi − 1) = 0

ηiξi = 0⇔ (C − αi )ξi = 0

Thus, αi = 0⇒ yi (〈w , xi 〉+ b) ≥ 10 < αi < C ⇒ yi (〈w , xi 〉+ b) = 1αi = C ⇒ yi (〈w , xi 〉+ b) ≤ 1

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 51 / 77

Page 52: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, Loss functions

Primal problem

minimize(w ,b)||w ||2

2+ C

∑i

ξi

With, {yi (w .xi + b) ≥ 1− ξiξi ≥ 0

With loss function

minimize(w ,b)||w ||2

2+ C

∑i

max(0, 1− yi (w .xi + b))

here,loss(xi , yi ) = max(0, 1− yi (w .xi + b)) = max(0, 1− f (xi ))

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 52 / 77

Page 53: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Support Vector Machine, Common loss functions

Common loss functions

hinge loss, L1-loss : max(0, 1− yi (w .xi + b))

squares hinge L2-loss : max(0, (1− yi (w .xi + b))2)

logistic loss : log(1 + exp(−yi (w .xi + b)))

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 53 / 77

Page 54: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 54 / 77

Page 55: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Expermiments with different values for C

SVMvaryingC.ipynb#Varying-C-parameter

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 55 / 77

Page 56: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Non linearly separable data

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 56 / 77

Page 57: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Non linearly separable data, Φ(x) = (x , x2)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 57 / 77

Page 58: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Non linearly separable data, Φ(x) = (x , x2)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 58 / 77

Page 59: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Linear case

Primal Problem

minimizew ,b1

2||w ||2 + C

∑i

ξi

subject to, yi (〈w , xi 〉+ b) ≥ 1− ξi and ξi ≥ 0

Dual Problem

maximizeα1

2

∑i ,j

αiαjyiyj〈xi , xj〉+∑i

αi

subject to,∑

i αiyi = 0 and 0 ≤ αi ≤ C

Support vector expansion

f (x) =∑i

αiyi 〈xi , x〉+ b

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 59 / 77

Page 60: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

With a transformation Φ : x 7→ Φ(x)

Primal Problem

minimizew ,b1

2||w ||2 + C

∑i

ξi

subject to, yi (〈w ,Φ(xi )〉+ b) ≥ 1− ξi and ξi ≥ 0

Dual Problem

maximizeα1

2

∑i ,j

αiαjyiyj〈Φ(xi ),Φ(xj)〉+∑i

αi

subject to,∑

i αiyi = 0 and 0 ≤ αi ≤ C

Support vector expansion

f (x) =∑i

αiyi 〈Φ(xi ),Φ(x)〉+ b

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 60 / 77

Page 61: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

The kernel trick

Kernel function

k(x , x ′) = 〈Φ(x),Φ(x ′)〉

We just need to compute the dot product in the new space

Dual Problem

maximizeα1

2

∑i ,j

αiαjyiyjk(xi , xj) +∑i

αi

subject to,∑

i αiyi = 0 and 0 ≤ αi ≤ C

Support vector expansion

f (x) =∑i

αiyik(xi , x) + b

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 61 / 77

Page 62: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Kernels

Kernel functions

linear : k(x , x ′) = 〈x , x ′〉polynomial : k(x , x ′) = (γ〈x , x ′〉+ r)d

rbf : k(x , x ′) = exp(−γ|x − x ′|2)

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 62 / 77

Page 63: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

RBF Kernel imply an infinite space

Here we’re in dimension 1, x ∈ R

k(x , x ′) = exp(−(x − x ′)2)

= exp(−x2)exp(−x ′2)exp(2xx ′)

With Taylor transformation,

k(x , x ′) = exp(−x2)exp(−x ′2)∞∑k=0

2kxkx ′k

k!

= 〈(· · · , 2k−1√k!

exp(−x2)xk , · · · ),

(· · · , 2k−1√k!

exp(−x ′2)x ′k , · · · )〉

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 63 / 77

Page 64: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Experiments with different kernels

www.antidot.net/wiss2015/SVMvaryingC.html#Non-linear-kernels

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 64 / 77

Page 65: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

SVM in multiclass

one-vs-the rest

NC binary classifiers (but each involving all dataset)

At prediction time, choose the class with maximum decision value

one-vs-oneNC (NC−1)

2 binary classifiers

At prediction time, vote

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 65 / 77

Page 66: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

SVM in scikit-learn

SVC : Support Vector Classification

sklearn.svm.linearSVC

based on Liblinear library

strategy : one-vs-the rest

only linear kernel

loss can be : ‘hinge’ or ‘squared hinge’

sklearn.svm.SVC

based on libSVM

multiclass strategy : one-vs-one

kernel can be : linear, polynomial, RBF, sigmoid, precomputed

only hinge loss

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 66 / 77

Page 67: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definition

2 Extracting features from text files

3 Algorithms for classificationNaıve BayesSupport Vector Machine (SVM)Tuning parameters

Cross validationGrid search

4 Conclusion

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 67 / 77

Page 68: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Cross validation I

http://scikit-learn.org/stable/modules/cross_validation.html

Overfitting

Estimation of parameters on the test set can lead to overfitting :parameters are the best for this test set but not in the general case.

Train, test and validation dataset

A solution :

tweak the parameters on the test set

validate on a validation dataset

only few data in training dataset

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 68 / 77

Page 69: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Cross validation II

Cross validation

k-fold cross validation

Split training data in k partitions of the same size

train the model on k − 1 partitions

then, evaluate on the kth partition

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 69 / 77

Page 70: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Cross validation III

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 70 / 77

Page 71: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Grid Search

http://scikit-learn.org/stable/modules/grid_search.html

Grid search

Test each value for each parameter

brut force algorithm to find the best value for each parameter

In scikit-learn

Automatically runs k× number of parameters’ values trainings

Keeps the best model

Demo with scikit-learnhttp://www.antidot.net/wiss2015/grid_search_20newsgroups.html

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 71 / 77

Page 72: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Sommaire

1 Problem definition

2 Extracting features from text files

3 Algorithms for classification

4 ConclusionMethodology

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 72 / 77

Page 73: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

1 Problem definitionSupervised classificationEvaluation metrics

2 Extracting features from text filesBag of words modelTerm frequency inverse document frequency (tfidf)

3 Algorithms for classificationNaıve BayesSupport Vector Machine (SVM)Tuning parameters

Cross validationGrid search

4 ConclusionMethodology

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77

Page 74: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Methodology

To solve a problem using Machine Learning, you have to :

1 Understand the data

2 Choose an evaluation measure

3 Be able to test the model

4 Find the main features

5 Try the algorithms, with different parameters

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77

Page 75: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Conclusion

Machine Learning has a lot of applications

With libraries like scikit-learn, no need to implement algorithmsyourself

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 74 / 77

Page 76: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Questions ?

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 75 / 77

Page 77: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

References

Machine Learning in Python :

http://scikit-learn.org

Alex Smola very good lecture on Machine Learning at CMU :

http://alex.smola.org/teaching/10-701-15/

Kernels : https://www.youtube.com/watch?v=0Nis-oMLbDs

SVM : https://www.youtube.com/watch?v=bsbpqNIKQzU

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 76 / 77

Page 78: Machine Learning - antidot.net · Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77. Tutorial Study a classical

Bernoulli Naıve Bayes

Features

xi = 1 iff word i is present in documentElse, xi = 0The number of occurrences of word i doesn’t matter

Bernoulli

For each feature i ,P(xi |y = k) = P(i |y = k)xi + (1− P(i |y = k))(1− xi )Absence of a feature is explicitly taken into account

Estimation of P(i |y = k)

P(i |y = k) =1 + nb of documents in k that contains word i

nb of documents in k

Ludovic Samper (Antidot) Machine Learning September 1st, 2015 77 / 77