This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• We’ll meet 30 times this term (may or may We’ll meet 30 times this term (may or may not include exam in this count)not include exam in this count)
• We’ll meet on FRIDAY this and next week, We’ll meet on FRIDAY this and next week, in order to cover material for HW 1in order to cover material for HW 1(plus I have some business travel this term)(plus I have some business travel this term)
• DefaultDefault: we WILL meet on Friday unless I : we WILL meet on Friday unless I announce otherwiseannounce otherwise
learning, reinforcement learning, SVMs, use of prior learning, reinforcement learning, SVMs, use of prior knowledgeknowledge
• Other machine learning material covered in Other machine learning material covered in Bioinformatics CS 576/776, Jerry Zhu’s CS 838Bioinformatics CS 576/776, Jerry Zhu’s CS 838
• to understand to understand whatwhat a learning a learning system should dosystem should do
• to understand to understand howhow (and how (and how wellwell) ) existing systems workexisting systems work• Issues in algorithm designIssues in algorithm design• Choosing algorithms for applicationsChoosing algorithms for applications
Academic Misconduct Academic Misconduct (also on course homepage)(also on course homepage)
All examinations, programming assignments, All examinations, programming assignments, and written homeworks must be done and written homeworks must be done individuallyindividually. Cheating and plagiarism will be . Cheating and plagiarism will be dealt with in accordance with University dealt with in accordance with University procedures (see the procedures (see the Academic Misconduct Guide for StudentsAcademic Misconduct Guide for Students). ). Hence, for example, code for programming Hence, for example, code for programming assignments must not be developed in groups, assignments must not be developed in groups, nor should code be shared. You are encouraged nor should code be shared. You are encouraged to discuss with your peers, the TAs or the to discuss with your peers, the TAs or the instructor ideas, approaches and techniques instructor ideas, approaches and techniques broadly, but not at a level of detail where broadly, but not at a level of detail where specific implementation issues are described by specific implementation issues are described by anyone. If you have any questions on this, anyone. If you have any questions on this, please ask the instructor before you act. please ask the instructor before you act.
• Employed by first machine Employed by first machine learning systems, in 1950slearning systems, in 1950s• Samuel’s Checkers programSamuel’s Checkers program• Michie’s MENACE: Matchbox Educable Michie’s MENACE: Matchbox Educable
Naughts and Crosses EngineNaughts and Crosses Engine
• Prior to these, some people Prior to these, some people believed computers could not believed computers could not improveimprove at a task at a task with experiencewith experience
• Memorize I/O pairs and perform Memorize I/O pairs and perform exact matching with new inputsexact matching with new inputs
• If computer has not seen precise If computer has not seen precise case before, it cannot apply its case before, it cannot apply its experienceexperience
• Want computer to “generalize” Want computer to “generalize” from prior experiencefrom prior experience
Some Settings in Some Settings in Which Learning May Which Learning May HelpHelp• Given an input, what is appropriate Given an input, what is appropriate
response (output/action)?response (output/action)?• Game playing – board state/moveGame playing – board state/move• Autonomous robots (e.g., driving a vehicle) Autonomous robots (e.g., driving a vehicle)
-- world state/action-- world state/action• Video game characters – state/actionVideo game characters – state/action• Medical decision support – symptoms/ Medical decision support – symptoms/
Broad Paradigms of Broad Paradigms of Machine LearningMachine Learning
• Inducing Functions from I/O PairsInducing Functions from I/O Pairs• Decision trees (e.g., Quinlan’s C4.5 [1993])Decision trees (e.g., Quinlan’s C4.5 [1993])• Connectionism / neural networks (e.g., backprop)Connectionism / neural networks (e.g., backprop)• Nearest-neighbor methodsNearest-neighbor methods• Genetic algorithmsGenetic algorithms• SVM’s SVM’s
• Learning without Learning without Feedback/TeacherFeedback/Teacher• Conceptual clusteringConceptual clustering• Self-organizing systemsSelf-organizing systems• Discovery systemsDiscovery systems
IID IID (Completion of Lec #2)(Completion of Lec #2)
• We are assuming examples are We are assuming examples are IID: IID: independently identically independently identically distributeddistributed
• Eg, we are ignoring Eg, we are ignoring temporaltemporal dependencies (covered in dependencies (covered in time-series learningtime-series learning))
• Eg, we assume the learner has no Eg, we assume the learner has no say in which examples it gets say in which examples it gets (covered in (covered in active learningactive learning))
• Note: mappings on previous slide Note: mappings on previous slide are not necessarily 1-to-1are not necessarily 1-to-1• Bad for first mapping?Bad for first mapping?• Good for the second Good for the second
(in fact, it’s the goal!)(in fact, it’s the goal!)
Empirical Learning: Empirical Learning: Task DefinitionTask Definition• Given Given
• A collection of A collection of positivepositive examples of some examples of some concept/class/category (i.e., members of the class) and, concept/class/category (i.e., members of the class) and, possibly, a collection of the possibly, a collection of the negativenegative examples (i.e., non- examples (i.e., non-members)members)
• ProduceProduce• A description that A description that coverscovers (includes) all/most of the (includes) all/most of the
positive examples and non/few of the negative examples positive examples and non/few of the negative examples
(and, hopefully, properly categorizes most future (and, hopefully, properly categorizes most future examples!)examples!)
Note: one can easily extend this definition to handle more than two Note: one can easily extend this definition to handle more than two classesclasses
If examples are described in terms of If examples are described in terms of values of features, they can be plotted values of features, they can be plotted as points in an as points in an NN-dimensional space.-dimensional space.
Size
Color
Weight
?Big
2500
Gray
A “concept” is then a (possibly disjoint) volume in this space.
• More formally a “concept” is of the More formally a “concept” is of the formform• x y z F(x, y, z) -> Member(x, Class1)x y z F(x, y, z) -> Member(x, Class1)
Aspects of an ML Aspects of an ML SystemSystem• ““Language” for representing classified Language” for representing classified
examplesexamples• ““Language” for representing “Concepts”Language” for representing “Concepts”• Technique for producing concept Technique for producing concept
“consistent” with the training examples“consistent” with the training examples• Technique for classifying new instanceTechnique for classifying new instance
Each of these limits the Each of these limits the expressivenessexpressiveness//efficiencyefficiency of the supervised learning algorithm.of the supervised learning algorithm.
Collect Collect KK nearest neighbors, select majority nearest neighbors, select majority classification (or somehow combine their classification (or somehow combine their classes)classes)
• What should What should KK be? be?• It probably is problem dependentIt probably is problem dependent• Can use Can use tuning setstuning sets (later) to select (later) to select
HW0 – Create Your Own HW0 – Create Your Own Dataset Dataset (repeated from lecture (repeated from lecture #1)#1)
• Think about before next classThink about before next class• Read HW0 (on-line)Read HW0 (on-line)
• Google to find:Google to find:• UCI archive (or UCI KDD archive)UCI archive (or UCI KDD archive)• UCI ML archive (UCI ML repository)UCI ML archive (UCI ML repository)• More links in HW0’s web pageMore links in HW0’s web page
Standard Feature TypesStandard Feature Typesfor representing training examples for representing training examples – a source of “ – a source of “domain knowledgedomain knowledge””
• NominalNominal• No relationship among possible valuesNo relationship among possible values
e.g., e.g., color color єє {red, blue, green} {red, blue, green} (vs.(vs. color = 1000 color = 1000 Hertz)Hertz)• Linear (or Ordered)Linear (or Ordered)
• Possible values of the feature are totally orderedPossible values of the feature are totally orderede.g., e.g., size size єє {small, medium, large}{small, medium, large} ←← discretediscrete
• DiscreteDiscrete• tokens (char strings, w/o quote marks and tokens (char strings, w/o quote marks and
spaces)spaces)
• ContinuousContinuous• numbers (int’s or float’s)numbers (int’s or float’s)
• If only a few possible values (e.g., 0 & 1) use If only a few possible values (e.g., 0 & 1) use discretediscrete
• i.e., merge i.e., merge nominalnominal and and discrete-ordereddiscrete-ordered (or convert (or convert discrete-ordereddiscrete-ordered into 1,2,…) into 1,2,…)
• We will ignore hierarchical info and We will ignore hierarchical info and only use the leaf values (common approach)only use the leaf values (common approach)
HW0: HW0: Creating Your DatasetCreating Your Dataset
Ex: IMDB has a lot of data that Ex: IMDB has a lot of data that are not discrete or are not discrete or continuous or binary-valued continuous or binary-valued for target function for target function (category)(category)Studio
Movie
Director/Producer
ActorMade
Acted inDirected
NameCountryList of movies
NameYear of birthGenderOscar nominationsList of movies
Title, Genre, Year, Opening Wkend BO receipts,List of actors/actresses, Release season
HW0: Representing as a HW0: Representing as a Fixed-Length Feature Fixed-Length Feature VectorVector
<discuss on chalkboard><discuss on chalkboard>
Note: some advanced ML approaches do Note: some advanced ML approaches do not not require such “feature mashing” require such “feature mashing” (eg, ILP)(eg, ILP)
David Jensen’s group at UMass uses David Jensen’s group at UMass uses Naïve Bayes and other ML algo’s on the Naïve Bayes and other ML algo’s on the IMDBIMDB
First Algorithm in First Algorithm in DetailDetail
• KK-Nearest Neighbors / -Nearest Neighbors / Instance-Based Learning (Instance-Based Learning (kk-NN/IBL)-NN/IBL)• Distance functionsDistance functions• Kernel functionsKernel functions• Feature selection (applies to all ML Feature selection (applies to all ML
• ClassificationClassification• Learning a Learning a discretediscrete valued function valued function
• RegressionRegression• Learning a Learning a realreal valued function valued function
IBL easily extended to regression IBL easily extended to regression tasks (and to multi-category tasks (and to multi-category classification)classification)
• IB2IB2 – keep next instance if – keep next instance if incorrectlyincorrectly classified by using previous instancesclassified by using previous instances• Uses less storage (good)Uses less storage (good)• Order dependent (bad)Order dependent (bad)• Sensitive to noisy data (bad)Sensitive to noisy data (bad)
(From Aha, Kibler and Albert in ML Journal)(From Aha, Kibler and Albert in ML Journal)
Variations on a Theme Variations on a Theme (cont.)(cont.)• IB3IB3 – extend IB2 to more intelligently decide – extend IB2 to more intelligently decide
which examples to keep (see article)which examples to keep (see article)• Better handling of noisy dataBetter handling of noisy data
• Another IdeaAnother Idea - - cluster groups, keep cluster groups, keep example from each (median/centroid)example from each (median/centroid)• Less storage, faster lookupLess storage, faster lookup
• Collect Collect kk nearest neighbors nearest neighbors• Give them to some supervised ML algoGive them to some supervised ML algo• Apply learned model to test exampleApply learned model to test example
Forward vs. Backward Forward vs. Backward Feature SelectionFeature Selection
• Faster in early steps Faster in early steps because fewer because fewer features to testfeatures to test
• Fast for choosing a Fast for choosing a small subset of the small subset of the featuresfeatures
• Misses useful features Misses useful features whose usefulness whose usefulness requires other requires other features (feature features (feature synergy)synergy)
• Fast for choosing all Fast for choosing all but a small subset but a small subset of the featuresof the features
• Preserves useful Preserves useful features whose features whose usefulness requires usefulness requires other featuresother features• Example: area Example: area
important, important, features = length, features = length, widthwidth
Questions about IBL Questions about IBL (Breiman et al. - CART book)(Breiman et al. - CART book)
• Computationally expensive to Computationally expensive to save all examples; slow save all examples; slow classification of new examplesclassification of new examples• Addressed by IB2/IB3 of Aha et al. Addressed by IB2/IB3 of Aha et al.
and work of A. Moore (CMU; now and work of A. Moore (CMU; now Google)Google)
• Is this really a problem?Is this really a problem?
Questions about IBL Questions about IBL (Breiman et al. - CART book)(Breiman et al. - CART book)
• Intolerant of NoiseIntolerant of Noise• Addressed by IB3 of Aha et al.Addressed by IB3 of Aha et al.• Addressed by Addressed by kk-NN version-NN version• Addressed by feature selection - can Addressed by feature selection - can
discard the noisy featurediscard the noisy feature• Intolerant of Irrelevant FeaturesIntolerant of Irrelevant Features
• Since algorithm very fast, can Since algorithm very fast, can experimentally choose good feature experimentally choose good feature sets (Kohavi, Ph. D. – now at Amazon)sets (Kohavi, Ph. D. – now at Amazon)
• High sensitivity to choice of similiarity High sensitivity to choice of similiarity (distance) function(distance) function• Euclidean distance might not be best choiceEuclidean distance might not be best choice
• Handling non-numeric features and Handling non-numeric features and missing feature values is not natural, but missing feature values is not natural, but doabledoable
• How might we do this? (Part of HW1)How might we do this? (Part of HW1)
• No insight into task No insight into task (learned concept not interpretable)(learned concept not interpretable)