Top Banner
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
45

Machine Learning in Practice Lecture 8

Dec 30, 2015

Download

Documents

Adrian Hahn

Machine Learning in Practice Lecture 8. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Plan for the Day. Announcements Should be finalizing plans for term project Weka helpful hints Spam Dataset Overcoming some limits of Linear Functions - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning in Practice Lecture 8

Machine Learning in PracticeLecture 8

Carolyn Penstein Rosé

Language Technologies Institute/ Human-Computer Interaction

Institute

Page 2: Machine Learning in Practice Lecture 8

Plan for the Day Announcements

Should be finalizing plans for term project Weka helpful hints Spam Dataset Overcoming some limits of Linear

Functions Discussing ordinal attributes in light of

linear functions

Page 3: Machine Learning in Practice Lecture 8

Weka Helpful Hints

Page 4: Machine Learning in Practice Lecture 8

Feature Selection Feature

selection algorithms pick out a subset of the features that work best Usually they

evaluate each feature in isolation

* Click here to start setting up feature selection

Page 5: Machine Learning in Practice Lecture 8

Feature Selection Feature

selection algorithms pick out a subset of the features that work best Usually they

evaluate each feature in isolation

* Now click here

Page 6: Machine Learning in Practice Lecture 8

Feature Selection

* Now click here.

Page 7: Machine Learning in Practice Lecture 8

Feature Selection

Page 8: Machine Learning in Practice Lecture 8

Feature Selection

* Now pick your baseclassifier just like before

Page 9: Machine Learning in Practice Lecture 8

Feature Selection

* Finally you will configurethe feature selection

Page 10: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* First click here.

Page 11: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* Select CHiSquaredAttributeEval

Page 12: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* Now click here.

Page 13: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* Select Ranker

Page 14: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* Now click here

Page 15: Machine Learning in Practice Lecture 8

Setting Up Feature Selection

* Set the number of featuresyou want

Page 16: Machine Learning in Practice Lecture 8

Setting Up Feature Selection The number you pick

should not be larger than the number of features available

The number should not be larger than the number of coded examples you have

Page 17: Machine Learning in Practice Lecture 8

Examining Which Features are Most Predictive

You can find a ranked list of features in the Performance Report if you use feature selection

* Predictiveness score

* Frequency

Page 18: Machine Learning in Practice Lecture 8

Spam Data Set

Page 19: Machine Learning in Practice Lecture 8

Spam Data Set

Word frequencies

Runs of $, !, Capitalization

All numeric Spam versus

NotSpam

* Which algorithm will work best?

Page 20: Machine Learning in Practice Lecture 8

Spam Data Set Decision Trees (.85 Kappa) SMO (linear function) (.79 Kappa) Naïve Bayes (.6 Kappa)

Page 21: Machine Learning in Practice Lecture 8

What did SMO learn?

Page 22: Machine Learning in Practice Lecture 8

Decision tree model

Page 23: Machine Learning in Practice Lecture 8

More on Linear Functions… exploring the idea of nonlinearity

Page 24: Machine Learning in Practice Lecture 8

Limits of linear functions

Page 25: Machine Learning in Practice Lecture 8

Numeric Prediction with the CPU Data Predicting CPU

performance from computer configuration

All attributes are numeric as well as the output

Page 26: Machine Learning in Practice Lecture 8

Numeric Prediction with the CPU Data Could discretize the output

and predict good performance, mediocre performance, or bad performance

Numeric prediction allows you to make arbitrarily many distinctions

Page 27: Machine Learning in Practice Lecture 8

Linear Regression

R-squared= .87

Page 28: Machine Learning in Practice Lecture 8

Outliers

** Notice that here it’s the really high values that fit the line the least well. That’s not always the case.

Page 29: Machine Learning in Practice Lecture 8

The two most highly weighted features

Page 30: Machine Learning in Practice Lecture 8

Exploring the Attribute Space

* Identify outliers with respect to typical attribute values.

Page 31: Machine Learning in Practice Lecture 8

The two most highly weighted features

Within 1 standard deviation

of the mean value

Page 32: Machine Learning in Practice Lecture 8

Trees for Numeric Prediction

Looks like we may need a representation that allows for a nonlinear solution

Regression trees can handle a combination of numeric and nominal attributes

M5P: computes a linear regression function at each leaf node of the treeLook at CPU performance data and compare a

simple linear regression (R = .93) with M5P (R = .98)

Page 33: Machine Learning in Practice Lecture 8

Results on CPU data with M5P

MoreData Here

BiggestOutliersHere

Page 34: Machine Learning in Practice Lecture 8

Results with M5P

MoreData Here

BiggestOutliersHere

Page 35: Machine Learning in Practice Lecture 8

Multi-Layer Networks can learn arbitrarily complex functions

Page 36: Machine Learning in Practice Lecture 8

Multilayer Perceptron

Page 37: Machine Learning in Practice Lecture 8

Best Results So Far

Page 38: Machine Learning in Practice Lecture 8

Forcing a Linear Function

Note that it weights the featuresdifferently than the linear regression

Partly because of normalization

Regression trees split on MMAXNN emphasizes MMIN

Page 39: Machine Learning in Practice Lecture 8

Review of Ordinal Attributes

Page 40: Machine Learning in Practice Lecture 8

Feature Space Design for Linear Functions

Often features will be numeric Continuous values May be more likely to generalize properly

with discretized valuesWe discussed the fact that you lose ordering

and distanceWith respect to linear functions, it may be more

important that you lose the ability to think in terms of ranges

Explicitly coding ranges allows for a simple form of nonlinearity

Page 41: Machine Learning in Practice Lecture 8

Ordinal Values Weka technically does not have ordinal

attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

A B C D

A

A or B

A or B or C

A or B or C or D

Page 42: Machine Learning in Practice Lecture 8

Ordinal Values Weka technically does not have ordinal

attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

A B C D

A

A or B

A or B or C

A or B or C or D

Page 43: Machine Learning in Practice Lecture 8

Ordinal Values Weka technically does not have ordinal

attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

A B C D

A

A or B

A or B or C

A or B or C or D

Now how wouldyou represent

X <= .35?

Page 44: Machine Learning in Practice Lecture 8

Ordinal Values Weka technically does not have ordinal

attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?

.2 .25 .28 .31 .35 .45 .47 .52 .6 .63

A B C D

A

A or B

A or B or C

A or B or C or D

Now how wouldyou represent

X <= .35?

Feat2 = 1

Page 45: Machine Learning in Practice Lecture 8

Take Home Message Linear functions cannot learn interactions

between attributes If you need to account for interactions:

Multiple layersTree-like representationsAttributes that represent rangesLater in the semester we’ll talk about other

approaches