Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Dec 30, 2015
Machine Learning in PracticeLecture 8
Carolyn Penstein Rosé
Language Technologies Institute/ Human-Computer Interaction
Institute
Plan for the Day Announcements
Should be finalizing plans for term project Weka helpful hints Spam Dataset Overcoming some limits of Linear
Functions Discussing ordinal attributes in light of
linear functions
Feature Selection Feature
selection algorithms pick out a subset of the features that work best Usually they
evaluate each feature in isolation
* Click here to start setting up feature selection
Feature Selection Feature
selection algorithms pick out a subset of the features that work best Usually they
evaluate each feature in isolation
* Now click here
Setting Up Feature Selection The number you pick
should not be larger than the number of features available
The number should not be larger than the number of coded examples you have
Examining Which Features are Most Predictive
You can find a ranked list of features in the Performance Report if you use feature selection
* Predictiveness score
* Frequency
Spam Data Set
Word frequencies
Runs of $, !, Capitalization
All numeric Spam versus
NotSpam
* Which algorithm will work best?
Numeric Prediction with the CPU Data Predicting CPU
performance from computer configuration
All attributes are numeric as well as the output
Numeric Prediction with the CPU Data Could discretize the output
and predict good performance, mediocre performance, or bad performance
Numeric prediction allows you to make arbitrarily many distinctions
Outliers
** Notice that here it’s the really high values that fit the line the least well. That’s not always the case.
Trees for Numeric Prediction
Looks like we may need a representation that allows for a nonlinear solution
Regression trees can handle a combination of numeric and nominal attributes
M5P: computes a linear regression function at each leaf node of the treeLook at CPU performance data and compare a
simple linear regression (R = .93) with M5P (R = .98)
Forcing a Linear Function
Note that it weights the featuresdifferently than the linear regression
Partly because of normalization
Regression trees split on MMAXNN emphasizes MMIN
Feature Space Design for Linear Functions
Often features will be numeric Continuous values May be more likely to generalize properly
with discretized valuesWe discussed the fact that you lose ordering
and distanceWith respect to linear functions, it may be more
important that you lose the ability to think in terms of ranges
Explicitly coding ranges allows for a simple form of nonlinearity
Ordinal Values Weka technically does not have ordinal
attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?
.2 .25 .28 .31 .35 .45 .47 .52 .6 .63
A B C D
A
A or B
A or B or C
A or B or C or D
Ordinal Values Weka technically does not have ordinal
attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?
.2 .25 .28 .31 .35 .45 .47 .52 .6 .63
A B C D
A
A or B
A or B or C
A or B or C or D
Ordinal Values Weka technically does not have ordinal
attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?
.2 .25 .28 .31 .35 .45 .47 .52 .6 .63
A B C D
A
A or B
A or B or C
A or B or C or D
Now how wouldyou represent
X <= .35?
Ordinal Values Weka technically does not have ordinal
attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to .35”?
.2 .25 .28 .31 .35 .45 .47 .52 .6 .63
A B C D
A
A or B
A or B or C
A or B or C or D
Now how wouldyou represent
X <= .35?
Feat2 = 1