Top Banner
1/24/09 CS 461, Winter 2009 1 CS 461: Machine Learning Lecture 3 Dr. Kiri Wagstaff [email protected]
22

1/24/09CS 461, Winter 20091 CS 461: Machine Learning Lecture 3 Dr. Kiri Wagstaff [email protected] Dr. Kiri Wagstaff [email protected].

Jan 20, 2016

Download

Documents

Baldric Martin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1/24/09CS 461, Winter 2009*CS 461: Machine LearningLecture 3Dr. Kiri [email protected]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Questions?Homework 2Project ProposalWekaOther questions from Lecture 2

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Review from Lecture 2Representation, feature types numeric, discrete, ordinalDecision trees: nodes, leaves, greedy, hierarchical, recursive, non-parametricImpurity: misclassification error, entropyEvaluation: confusion matrix, cross-validation

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Plan for TodayDecision treesRegression trees, pruning, rulesBenefits of decision treesEvaluation Comparing two classifiersSupport Vector MachinesClassificationLinear discriminants, maximum marginLearning (optimization)Non-separable classesRegression

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Remember Decision Trees?

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Algorithm: Build a Decision Tree[Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Building a Regression TreeSame algorithm different criterionInstead of impurity, use Mean Squared Error (in local region)Predict mean output for nodeCompute training error (Same as computing the variance for the node)Keep splitting until node error is acceptable; then it becomes a leafAcceptable: error < threshold

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Turning Trees into Rules[Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Comparing Two AlgorithmsChapter 14

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Machine Learning Showdown!McNemars Test

    Under H0, we expect e01= e10=(e01+ e10)/2Accept if < X2,1 [Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Support Vector MachinesChapter 10

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Linear DiscriminationModel class boundaries (not data distribution)Learning: maximize accuracy on labeled dataInductive bias: form of discriminant used[Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*How to find best w, b?E(w|X) is error with parameters w on sample Xw*=arg minw E(w | X)Gradient

    Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient[Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Gradient Descent[Alpaydin 2004 The MIT Press]wtwt+1

    E (wt)E (wt+1)

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Support Vector MachinesMaximum-margin linear classifiersImagine: Army Ceasefire

    How to find best w, b?Quadratic programming:

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Optimization (primal formulation) N + d +1 parameters[Alpaydin 2004 The MIT Press]

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Optimization (dual formulation)[Alpaydin 2004 The MIT Press]N parameters. Where did w and b go?We know:

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*What if Data isnt Linearly Separable?Embed data in higher-dimensional spaceExplicit: Basis functions (new features)Visualization of 2D -> 3DImplicit: Kernel functions (new dot product/similarity)PolynomialRBF/GaussianSigmoidSVM appletStill need to find a linear hyperplaneAdd slack variables to permit some errors

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Example: Orbital ClassificationLinear SVM flying on EO-1 Earth Orbiter since Dec. 2004Classify every pixelFour classes12 features (of 256 collected)HyperionClassified[Castano et al., 2005]EO-1SnowLandWaterIce

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*SVM in WekaSMO: Sequential Minimal OptimizationFaster than QP-based versionsTry linear, RBF kernels

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Summary: What You Should KnowDecision treesRegression trees, pruning, rulesBenefits of decision treesEvaluation Comparing two classifiers (McNemars test)Support Vector MachinesClassificationLinear discriminants, maximum marginLearning (optimization)Non-separable classes

    CS 461, Winter 2009

  • 1/24/09CS 461, Winter 2009*Next TimeReadingEvaluation (read Ch. 14.7)Support Vector Machines (read Ch. 10.1-10.4, 10.6, 10.9)

    Questions to answer from the readingPosted on the websiteThree volunteers: Sen, Jimmy, and Irvin

    CS 461, Winter 2009

    **********************