Overview of Today’s Overview of Today’s Lecture Lecture • Last Time: course introduction Last Time: course introduction • Reading assignment posted to class webpage Reading assignment posted to class webpage • Don’t get discouraged Don’t get discouraged • Today: introduction to “Supervised Today: introduction to “Supervised Machine Learning” Machine Learning” • Our first ML algorithm: K-nearest neighbor Our first ML algorithm: K-nearest neighbor • HW 0 out online HW 0 out online • Create a dataset of Create a dataset of • “ fixed-length feature vectors” fixed-length feature vectors” • Due next Tuesday Sept 19 (4 PM) Due next Tuesday Sept 19 (4 PM) • Instructions for handing in HW0 coming soon Instructions for handing in HW0 coming soon
28
Embed
Overview of Today’s Lecture Last Time: course introductionLast Time: course introduction Reading assignment posted to class webpageReading assignment posted.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Overview of Today’s Overview of Today’s LectureLecture
• Last Time: course introductionLast Time: course introduction• Reading assignment posted to class webpageReading assignment posted to class webpage• Don’t get discouragedDon’t get discouraged
• Today: introduction to “Supervised Machine Today: introduction to “Supervised Machine Learning”Learning”• Our first ML algorithm: K-nearest neighborOur first ML algorithm: K-nearest neighbor
• HW 0 out onlineHW 0 out online• Create a dataset of Create a dataset of
• ““fixed-length feature vectors”fixed-length feature vectors”• Due next Tuesday Sept 19 (4 PM)Due next Tuesday Sept 19 (4 PM)• Instructions for handing in HW0 coming soonInstructions for handing in HW0 coming soon
• A collection of A collection of positivepositive examples of some examples of some concept/class/category (i.e., members of the class) concept/class/category (i.e., members of the class) and, possibly, a collection of the and, possibly, a collection of the negativenegative examples examples (i.e., non-members)(i.e., non-members)
• ProduceProduce• A description that A description that coverscovers (includes) all (most) of the (includes) all (most) of the
positive examples and none (few) of the negative positive examples and none (few) of the negative examples examples
(which, hopefully, properly categorizes most future (which, hopefully, properly categorizes most future examples!)examples!)
The KeyPoint!
Note: one can easily extend this definition Note: one can easily extend this definition to handle more than two classesto handle more than two classes
ExampleExamplePositive Examples Negative Examples
How does this symbol classify?
•Concept
•Solid Red Circle in a Regular Polygon
•What about?• Figure with red solid circles not in larger red circle• Figures on left side of page etc
HW0 – Your “Personal HW0 – Your “Personal Concept”Concept”
• Books I like/dislike Books I like/dislike • Movies I like/dislike Movies I like/dislike • www pages I like/dislikewww pages I like/dislike
• ““time will tell” conceptstime will tell” concepts• Stocks to buyStocks to buy• Medical treatmentMedical treatment ( (at time at time tt, predict outcome at , predict outcome at
time (time (t t ++∆∆tt))))• Sensory interpretation Sensory interpretation
• Face recognition (See text)Face recognition (See text)• Handwritten digit recognitionHandwritten digit recognition• Sound recognitionSound recognition
• Hard to program functionsHard to program functions
HW0 – Your “Personal HW0 – Your “Personal Concept”Concept”
• Step 2: Step 2: Choose a feature spaceChoose a feature space• We will use fixed-length feature vectorsWe will use fixed-length feature vectors
• Choose Choose NN features features• Each feature has Each feature has VVii
possible valuespossible values• Each example is represented by a vector of N feature Each example is represented by a vector of N feature
values values (i.e., (i.e., is a point in the feature spaceis a point in the feature space))e.g.: e.g.: <red, 50, round><red, 50, round>
Standard Feature TypesStandard Feature Typesfor representing training examples for representing training examples – source of “ – source of “domain knowledgedomain knowledge””
closed
polygon continuous
trianglesquare circle ellipse
• Nominal (Boolean is a special case)Nominal (Boolean is a special case)• No relationship among possible valuesNo relationship among possible values
e.g., e.g., color color єє {red, blue, green} {red, blue, green} (vs.(vs. color = 1000 color = 1000 Hertz)Hertz)
• Linear (or Ordered)Linear (or Ordered)• Possible values of the feature are totally Possible values of the feature are totally
$2 million$2 million• Movie is drama? (action, sci-fi,…)Movie is drama? (action, sci-fi,…)• Movies I like/dislike (e.g. Tivo)Movies I like/dislike (e.g. Tivo)
HW0: Creating your datasetHW0: Creating your dataset
• MovieMovie• Average age of actorsAverage age of actors• Number of producersNumber of producers• Percent female actorsPercent female actors
• StudioStudio• Number of movies Number of movies
mademade• Average movie grossAverage movie gross• Percent movies Percent movies
released in USreleased in US
• Director/ProducerDirector/Producer• Years of experienceYears of experience• Most prevalent Most prevalent
genregenre• Number of award Number of award
winning movieswinning movies• Average movie Average movie
grossgross• ActorActor
• GenderGender• Has previous Oscar Has previous Oscar
award or award or nominationsnominations
• Most prevalent Most prevalent genregenre
Create your feature spaceCreate your feature space
HW0: Creating your datasetHW0: Creating your dataset
David Jensen’s group at UMass used Naïve Bayes David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:selected and a novel way of sampling from the data:
Back to Supervised Back to Supervised Learning Learning One way learning systems differ is in how they One way learning systems differ is in how they representrepresent concepts: concepts:
TrainingExamples
Backpropagation
C4.5, CART
AQ, FOIL
SVMs
NeuralNet
DecisionTree
Φ <- X^YΦ <- Z
Rules
If 5x1 + 9x2 – 3x3 > 12Then +
Feature SpaceFeature Space
If examples are described in terms of If examples are described in terms of values of features, they can be plotted values of features, they can be plotted as points in an N-dimensional space.as points in an N-dimensional space.
Size
Color
Weight
?Big
2500
Gray
A “concept” is then a (possibly disjoint) volume in this space.
Supervised Learning = Supervised Learning = Learning from Labeled Learning from Labeled ExamplesExamples• Most common & successful form Most common & successful form
of MLof ML Venn Diagram
+ ++
+
- -
--
-
-
--
• Examples – points in multi-dimensional “feature space”• Concepts – “function” that labels points in feature space
Empirical Learning and Empirical Learning and Venn DiagramsVenn Diagrams
Concept = Concept = AA or or B B (Disjunctive concept)(Disjunctive concept)
Examples = labeled points in feature spaceExamples = labeled points in feature space
Concept = a label for a Concept = a label for a setset of points of points
Venn Diagram
A
B
--
--
-
-
- -
-
-
-
-
--
-
-
-
--
--
- - --- -
---
--
-
-
+
++ ++
+ +
+
++
+ +
+
++
+
+
++
Feature Space
Aspects of an ML SystemAspects of an ML System
• ““Language” for representing examplesLanguage” for representing examples• ““Language” for representing “Concepts”Language” for representing “Concepts”• Technique for producing concept Technique for producing concept
“consistent” with the training examples“consistent” with the training examples• Technique for classifying new instanceTechnique for classifying new instance
Each of these limits the Each of these limits the expressivenessexpressiveness//efficiencyefficiency of the supervised learning algorithm.of the supervised learning algorithm.
(1-NN ≡(1-NN ≡ one nearest neighbor)one nearest neighbor)
K-NN AlgorithmK-NN Algorithm
Collect K nearest neighbors, select majority Collect K nearest neighbors, select majority classification (or somehow combine their classification (or somehow combine their classes)classes)
• What should K be?What should K be?• It probability is problem dependentIt probability is problem dependent• Can use Can use tuning setstuning sets (later) to select (later) to select
a good setting for Ka good setting for KTuning SetError Rate
1 2 3 4 5 K
Shouldn’t really“connect the dots”(Why?)
Some Common JargonSome Common Jargon
• ClassificationClassification• Learning a Learning a discretediscrete valued function valued function
• RegressionRegression• Learning a Learning a realreal valued function valued function
IBL easily extended to regression tasks IBL easily extended to regression tasks (and to multi-category classification)(and to multi-category classification)
Discrete/RealOutputs
Variations on a ThemeVariations on a Theme
• IB1IB1 – keep all examples – keep all examples
• IB2IB2 – keep next instance if – keep next instance if incorrectlyincorrectly classified by using previous instancesclassified by using previous instances• Uses less storageUses less storage• Order dependentOrder dependent• Sensitive to noisy dataSensitive to noisy data
(From Aha, Kibler and Albert in ML Journal)(From Aha, Kibler and Albert in ML Journal)
Variations on a Theme Variations on a Theme (cont.)(cont.)• IB3IB3 – extend IB2 to more intelligently decide – extend IB2 to more intelligently decide
which examples to keep (see article)which examples to keep (see article)• Better handling of noisy dataBetter handling of noisy data
• Another IdeaAnother Idea - - cluster groups, keep cluster groups, keep “examples” from each (median/centroid)“examples” from each (median/centroid)
Next timeNext time
• Finish K-NNFinish K-NN• Begin linear separatorsBegin linear separators