Top Banner
kNN & Naïve Bayes Hongning Wang CS@UVa
38

KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Jan 19, 2016

Download

Documents

Leonard Powers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

kNN & Naïve Bayes

Hongning WangCS@UVa

Page 2: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 2

Today’s lecture

• Instance-based classifiers– k nearest neighbors– Non-parametric learning algorithm

• Model-based classifiers– Naïve Bayes classifier• A generative model

– Parametric learning algorithm

CS@UVa

Page 3: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 3

How to classify this document?

CS@UVa

?

Sports

Politics

Finance

Documents by vector space representation

Page 4: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 4

Let’s check the nearest neighbor

CS@UVa

?

Sports

Politics

Finance

Are you confident about this?

Page 5: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 5

Let’s check more nearest neighbors

• Ask k nearest neighbors – Let them vote

CS@UVa

?

Sports

Politics

Finance

Page 6: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 6

Probabilistic interpretation of kNN

• Approximate Bayes decision rule in a subset of data around the testing point

• Let be the volume of the dimensional ball around containing the nearest neighbors for , we have

CS@UVa

𝑝 (𝑥∨𝑦=1 )=𝑘1

𝑁1𝑉𝑝 (𝑦=1 )=

𝑁1

𝑁

With Bayes rule:𝑝 (𝑦=1∨𝑥 )=

𝑁 1

𝑁×

𝑘1𝑁1𝑉𝑘𝑁𝑉

=𝑘1𝑘

𝑝 (𝑥 )𝑉=𝑘𝑁

𝑝 (𝑥 )= 𝑘𝑁𝑉=>

Total number of instancesTotal number of instances in class 1

Nearest neighbors from class 1

Counting the nearest neighbors from class 1

Page 7: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 7

kNN is close to optimal

• Asymptotically, the error rate of 1-nearest-neighbor classification is less than twice of the Bayes error rate

• Decision boundary– 1NN - Voronoi tessellation

CS@UVa

A non-parametric estimation of posterior distribution

Page 8: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 8

Components in kNN

• A distance metric– Euclidean distance/cosine similarity

• How many nearby neighbors to look at– k

• Instance look up– Efficiently search nearby points

CS@UVa

Page 9: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 9

Effect of k

• Choice of k influences the “smoothness” of the resulting classifier

CS@UVa

Page 10: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 10

Effect of k

• Choice of k influences the “smoothness” of the resulting classifier

CS@UVa

k=1

Page 11: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 11

Effect of k

• Choice of k influences the “smoothness” of the resulting classifier

CS@UVa

k=5

Page 12: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 12

Effect of k

• Large k -> smooth shape for decision boundary

• Small k -> complicated decision boundary

CS@UVa

Error on training set

Error on testing set

Model complexity

Error

Smaller kLarger k

Page 13: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 13

Efficient instance look-up

• Recall MP1– In Yelp_small data set, there are 629K reviews for

training and 174K reviews for testing– Assume we have a vocabulary of 15k– Complexity of kNN

CS@UVa

Training corpus size Testing corpus size

Feature size

Page 14: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 14

Efficient instance look-up

• Exact solutions– Build inverted index for documents• Special mapping: word -> document list• Speed-up is limited when average document length is

large

CS@UVa

information

retrieval

retrieved

is

helpful

Doc1 Doc2

Doc1

Doc2

Doc1 Doc2

Doc1 Doc2

Dictionary Postings

Page 15: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 15

Efficient instance look-up

• Exact solutions– Build inverted index for documents• Special mapping: word -> document list• Speed-up is limited when average document length is

large

– Parallelize the computation• Map-Reduce

– Map training/testing data onto different reducers– Merge the nearest k neighbors from the reducers

CS@UVa

Page 16: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 16

Efficient instance look-up

• Approximate solutions– Locality sensitive hashing• Similar documents -> (likely) same hash values

CS@UVa

h(x)

Page 17: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 17

Efficient instance look-up

• Approximate solution– Locality sensitive hashing• Similar documents -> (likely) same hash values• Construct the hash function such that similar items

map to the same “buckets” with high probability– Learning-based: learn the hash function with annotated

examples, e.g., must-link, cannot-link– Random projection

CS@UVa

Page 18: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 18

1 1 01 0 1

Random projection

• Approximate the cosine similarity between vectors– , is a random unit vector– Each defines one hash function, i.e., one bit in the

hash value

CS@UVa

𝐷𝑥

𝐷𝑦

𝜃

𝒓𝟏𝐷𝑥

𝐷𝑦

𝜃𝒓𝟐𝐷𝑦

𝐷𝑥

𝜃

𝒓𝟑

𝐷𝑥

𝐷𝑦

𝒓𝟏 𝒓𝟐 𝒓𝟑

Page 19: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 19

Random projection

• Approximate the cosine similarity between vectors– , is a random unit vector– Each defines one hash function, i.e., one bit in the

hash value

CS@UVa

𝐷𝑥

𝐷𝑦

𝜃

𝒓𝟏

𝒓𝟐

𝒓𝟑

𝐷𝑥

𝐷𝑦

1 0 11 0 1

𝒓𝟏 𝒓𝟐 𝒓𝟑

𝐷𝑥

𝐷𝑦

𝜃

𝐷𝑥

𝐷𝑦

𝜃

Page 20: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 20

Random projection

• Approximate the cosine distance between vectors– , is a random unit vector– Each defines one hash function, i.e., one bit in the

hash value– Provable approximation error

CS@UVa

Page 21: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 21

Efficient instance look-up

• Effectiveness of random projection– 1.2M images + 1000 dimensions

CS@UVa

1000x speed-up

Page 22: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 22

Weight the nearby instances

• When the data distribution is highly skewed, frequent classes might dominate majority vote– They occur more often in the k nearest neighbors

just because they have large volume

CS@UVa

?Sports

Politics

Finance

Page 23: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 23

Weight the nearby instances

• When the data distribution is highly skewed, frequent classes might dominate majority vote– They occur more often in the k nearest neighbors

just because they have large volume• Solution– Weight the neighbors in voting• or

CS@UVa

Page 24: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 24

Summary of kNN

• Instance-based learning– No training phase– Assign label to a testing case by its nearest neighbors– Non-parametric– Approximate Bayes decision boundary in a local

region• Efficient computation– Locality sensitive hashing

• Random projection

CS@UVa

Page 25: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 25

Recall optimal Bayes decision boundary

𝑓 ( 𝑋 )=𝑎𝑟𝑔𝑚𝑎𝑥 𝑦𝑃 (𝑦∨𝑋 )CS@UVa

𝑋

𝑝 (𝑋 , 𝑦 )

𝑝 (𝑋|𝑦=1 )𝑝 (𝑦=1)𝑝 (𝑋|𝑦=0 )𝑝 (𝑦=0)

�̂�=0 �̂�=1

False positiveFalse negative

*Optimal Bayes decision boundary

Page 26: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 26

Estimating the optimal classifier

CS@UVa

¿𝑎𝑟𝑔𝑚𝑎𝑥 𝑦𝑃 ( 𝑋|𝑦 )𝑃 (𝑦)

Class conditional density Class prior

text information identify mining mined is useful to from apple delicious Y

D1 1 1 1 1 0 1 1 1 0 0 0 1

D2 1 1 0 0 1 1 1 0 1 0 0 1

D3 0 0 0 0 0 1 0 0 0 1 1 0

V binary features

#parameters: |𝑌|−1|𝑌|×(2𝑉−1)

Requirement:

|D |>>|𝒀|×(𝟐𝑽−𝟏)

Page 27: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 27

We need to simplify this

• Features are conditionally independent given class label

– E.g.,

CS@UVa

¿𝑝 (𝑥2|𝑦 )𝑝 (𝑥1∨𝑦 )

This does not mean ‘white house’ is independent of ‘obama’!

Page 28: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 28

Conditional v.s. marginal independence

• Features are not necessarily marginally independent from each other

• However, once we know the class label, features become independent from each other– Knowing it is already political news, observing

‘obama’ contributes little about occurrence of ‘while house’

CS@UVa

Page 29: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 29

Naïve Bayes classifier

CS@UVa

¿𝑎𝑟𝑔𝑚𝑎𝑥 𝑦𝑃 ( 𝑋|𝑦 )𝑃 (𝑦)

¿𝑎𝑟𝑔𝑚𝑎𝑥 𝑦 ∏𝑖=1

¿𝑑∨¿𝑃 (𝑥 𝑖∨𝑦)𝑃 ( 𝑦 )¿

¿

Class conditional density Class prior

#parameters: |𝑌|−1|𝑌|×𝑉

|𝑌|×(2𝑉−1)

v.s.Computationally feasible

Page 30: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 30

Naïve Bayes classifier

CS@UVa

¿𝑎𝑟𝑔𝑚𝑎𝑥 𝑦𝑃 ( 𝑋|𝑦 )𝑃 (𝑦)

¿𝑎𝑟𝑔𝑚𝑎𝑥 𝑦 ∏𝑖=1

¿𝑑∨¿𝑃 (𝑥 𝑖∨𝑦)𝑃 ( 𝑦 )¿

¿

y

x2 x3 xvx1…

By Bayes rule

By independence assumption

Page 31: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 31

Estimating parameters

• Maximial likelihood estimator

CS@UVa

text information identify mining mined is useful to from apple delicious Y

D1 1 1 1 1 0 1 1 1 0 0 0 1D2 1 1 0 0 1 1 1 0 1 0 0 1D3 0 0 0 0 0 1 0 0 0 1 1 0

Page 32: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 32

Enhancing Naïve Bayes for text classification I

• The frequency of words in a document matters

– In log space

CS@UVa

¿𝑎𝑟𝑔𝑚𝑎 𝑥 𝑦 log 𝑃 (𝑦 )+ ∑𝑖=1

¿𝑑∨¿𝑐 (𝑥 𝑖 ,𝑑)log 𝑃(𝑥 𝑖∨𝑦 )¿

¿

Class bias Model parameterFeature vector

Essentially, estimating different language models!

Page 33: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 33

Enhancing Naïve Bayes for text classification

• For binary case

CS@UVa

¿ 𝑠𝑔𝑛(log 𝑃 (𝑦=1 )𝑃 ( 𝑦=0 )

+∑𝑖=1

|𝑑|

𝑐 (𝑥𝑖 ,𝑑) log𝑃 (𝑥𝑖|𝑦=1 )𝑃 (𝑥𝑖|𝑦=0 ) )

¿ 𝑠𝑔𝑛(𝑤𝑇 𝑥)where

𝑤=( log 𝑃 (𝑦=1 )𝑃 (𝑦=0 )

, log𝑃 (𝑥1|𝑦=1 )𝑃 (𝑥1|𝑦=0 )

,…, log𝑃 (𝑥𝑣|𝑦=1 )𝑃 (𝑥𝑣|𝑦=0 ) )

𝑥=(1 ,𝑐 (𝑥1 ,𝑑) ,…,𝑐 (𝑥𝑣 ,𝑑))

a linear model with vector space representation?

We will come back to this topic later.

Page 34: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 34

Enhancing Naïve Bayes for text classification II

• Usually, features are not conditionally independent

• Enhance the conditional independence assumptions by N-gram language models

CS@UVa

Page 35: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 35

Enhancing Naïve Bayes for text classification III

• Sparse observation

– Then, no matter what values the other features take,

• Smoothing class conditional density– All smoothing techniques we have discussed in

language models are applicable here

CS@UVa

Page 36: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 36

Maximum a Posterior estimator

• Adding pseudo instances– Priors: and – MAP estimator for Naïve Bayes

CS@UVa

#pseudo instances

Can be estimated from a related corpus or manually tuned

Page 37: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 37

Summary of Naïve Bayes

• Optimal Bayes classifier– Naïve Bayes with independence assumptions

• Parameter estimation in Naïve Bayes– Maximum likelihood estimator– Smoothing is necessary

CS@UVa

Page 38: KNN & Naïve Bayes Hongning Wang CS@UVa. Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS 6501: Text Mining 38

Today’s reading

• Introduction to Information Retrieval– Chapter 13: Text classification and Naive Bayes• 13.2 – Naive Bayes text classification• 13.4 – Properties of Naive Bayes

– Chapter 14: Vector space classification• 14.3 k nearest neighbor• 14.4 Linear versus nonlinear classifiers

CS@UVa