Top Banner
Introduction to Machine Introduction to Machine Learning Learning Lecture 4 Slides based on Francisco Herrera course on Data Mining Albert Orriols i Puig il@ ll l d aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture4 - Machine Learning

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 4Slides based on Francisco Herrera course on Data Mining

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Page 2: Lecture4 - Machine Learning

Recap of Lecture 3

Typically, techniques in ML have been divided in different paradigms

Inductive learning

Explanation-based learningp g

Analogy-based learning

Evolutionary learningEvolutionary learning

Connectionist Learning

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture4 - Machine Learning

Recap of Lecture 3

Problems that we’ll studyD t l ifi ti C4 5 kNN N ï B1. Data classification: C4.5, kNN, Naïve Bayes …

2. Statistical learning: SVM

3. Association analysis: A-priori

4. Link mining: Page Rank

5. Clustering: k-means

6. Reinforcement learning: Q-learning, XCSg g,

7. Regression

8 Genetic Fuzzy Systems8. Genetic Fuzzy Systems

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture4 - Machine Learning

Today’s Agenda

Situation: Where Are We?ClassificationClassificationPredictionClusteringAssociation D t Mi i S tData Mining Systems

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture4 - Machine Learning

Situation: Where Are We?

The input consists of examples featured by different characteristicsdifferent characteristics

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture4 - Machine Learning

Situation: Where Are We?What can we do with a bunch of examples?

Depend on the type of examples we may haveClassification: Find the class to which a new instance belongs tog

E.g.: Find whether a new patient has cancer or not

Numeric prediction: A variation of classification in which the output p pconsists of numeric classes

E.g.: Find the frequency of cancerous cell found

Regression: Find a function that fits your examplesE.g.: Find a function that controls your chain process

Association: Find association among your problem attributes or variables

E Fi d l ti h ti t ith hi h bl d iE.g.: Find relations such as a patient with high-blood-pressure is more likely to have heart-attack disease

Clustering: Process to cluster/group the instances into classes

Slide 6

Clustering: Process to cluster/group the instances into classesE.g.: Group clients whose purchases are similar

Artificial Intelligence Machine Learning

Page 7: Lecture4 - Machine Learning

Data Classification

Test set

Information basedon experience

Knowledget ti

New instance

Dataset Learner Modelon experience extraction

Predicted Output

Training set

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture4 - Machine Learning

Example of Data Classification

Data Set Classification Model How

The classification model can be implemented in several ways:• Rules• Decision trees• Decision trees• Mathematical formulae

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture4 - Machine Learning

Classification as a Two-Step Process

Model usage: to classify future or unknown objectsg y jEstimate the accuracy of the model

The known label of test samples is compared with the labelThe known label of test samples is compared with the label predicted by the systemThe accuracy rate is the proportion of test examples that are y p p pcorrectly classified by the modelThe test set is independent of the training set

If the experts thing that the model is acceptableThen, use to the model to predict unknown examples

Slide 9Artificial Intelligence Machine Learning

Page 10: Lecture4 - Machine Learning

Going to Real Worldkatydids

Definition: Given a collection of annotated data (in this case katydids a o a ed da a ( s case a yd dsand grasshoppers), decide what type of insect in the following one

grasshoppers

Slide 10Artificial Intelligence Machine Learning

Page 11: Lecture4 - Machine Learning

Going to Real WorldHow can I put a katydid or a grasshopper into my p y g pp ycomputer?

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture4 - Machine Learning

Going to Real WorldThus, the classification problem has been reduced to, p

InsectID

Abdomen L th

AntennaeL th

InsectClID Length Length Class

1 2.7 5.5 Grasshopper2 8.0 9.1 Katydid3 0 9 4 7 Grasshopper3 0.9 4.7 Grasshopper4 1.1 3.1 Grasshopper5 5.4 8.5 Katykid6 2.9 1.9 Grasshopper7 6.1 6.6 Katydid8 0.5 1.0 Grasshopper9 8.3 6.6 Katydid

10 8 1 4 7 Katydid

We have an observation with abdomen length 5 1 and

10 8.1 4.7 Katydid

We have an observation with abdomen length 5.1 and antennae length 7?

Slide 12Artificial Intelligence Machine Learning

Page 13: Lecture4 - Machine Learning

Going to Real WorldActually, we could write thaty,

Slide 13

How do I classify this domain?Artificial Intelligence Machine Learning

Page 14: Lecture4 - Machine Learning

How to Create Classification Models

We will study some of this methods:The decision tree C4 5The decision tree C4.5

The instance based classifier kNN

Slide 14

The probabilistic classifier Naïve Bayes

Artificial Intelligence Machine Learning

Page 15: Lecture4 - Machine Learning

Regression or PredictionPrediction vs data classification

Similarities: Both learn from a data set

DiffDifference:In classification, each example has a class associatedI di ti h l h i l lIn prediction, each example has a numerical value associated

Slide 15Artificial Intelligence Machine Learning

Page 16: Lecture4 - Machine Learning

How to Extract a Model?

Prediction works analogously to data classificationU l i h b ild d lUse an algorithm to build a model

Use this model to predict the new unknown example

Types of regressionLinear and multiple regressionNon-linear regression

Two of the most-used approaches to regressionpp gNeural networks

F l b d tFuzzy rule-based systems

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture4 - Machine Learning

ClusteringThe clustering problemg p

Given a data base D={t1, t2, …, tn} of transactions and an integer value k, the clustering problem refers to define a ege a ue , e c us e g p ob e e e s o de e amapping f: D {1,…, k} where each ti is assigned to one cluster kj, 1<=j<=k

Main difference with classificationIn classification each example is labeled with a classIn classification, each example is labeled with a class

In clustering, examples are not labeled

Examples of clusteringSegment customer data base based on similar buying patternsG h i t i tGroup houses in a town into neighborhoods based on similar featuresIdentify new plant speciesIdentify similar web usage patterns

Slide 17Artificial Intelligence Machine Learning

Identify similar web usage patterns

Page 18: Lecture4 - Machine Learning

Example of ClusteringPut these people in different clustersp p

Which are the keys?

Define what’s similar

Group similar things in different clusters

Size of the clusters?

Which type of clustering do I want?

Hierarchical clustering?

Partition-based clustering?

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture4 - Machine Learning

Are They Similar?

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture4 - Machine Learning

How to Group the Elements?

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture4 - Machine Learning

Which Type of Clustering?Many types of clusteringy yp g

Hierarchical: Nested set of clusters

Partition-based: One set of clustersPartition-based: One set of clusters

Incremental: Each element handled at one time

Si lt All l t h dl d t thSimultaneous: All elements handled together

Overlapping/non-overlapping

Hierarchical Clustering Partition-based Clustering

Slide 21Artificial Intelligence Machine Learning

Page 22: Lecture4 - Machine Learning

Association RulesGiven a set of items I={I1, I2, …, Im} and a database of { , , , }transactions D={t1, t2, …, tn} where ti={Ii1, Ii2, …, Iik} and Iij Є I

The association rule problem is to identify all the rules with form

X Y

R les ith minim m s pport and confidenceRules with minimum support and confidenceSupport: Fraction of transactions which contain both X and Y

Confidence: Measures of how often items in Y appear in transactions that contain X

Slide 22Artificial Intelligence Machine Learning

Page 23: Lecture4 - Machine Learning

Example Association Rules

I = {Beer Bread Jelly Milk PeanutButter}I = {Beer, Bread, Jelly, Milk, PeanutButter}

Support of {Bread, PeanutButter} is 60%

Slide 23Artificial Intelligence Machine Learning

Page 24: Lecture4 - Machine Learning

Example Association Rules

Slide 24Artificial Intelligence Machine Learning

Page 25: Lecture4 - Machine Learning

Before Finishing…Some environments that contain algorithms to perform g pdata classification, regression, clustering and association rule mining

KEEL: http://www keel esKEEL: http://www.keel.es

Weka: http://www.cs.waikato.ac.nz/ml/weka/

Rapid Miner: http://rapid-i.com/content/blogcategory/38/69/

Slide 25Artificial Intelligence Machine Learning

Page 26: Lecture4 - Machine Learning

Next Class

Start with data classificationC4.5

Slide 26Artificial Intelligence Machine Learning

Page 27: Lecture4 - Machine Learning

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 4Slides based on Francisco Herrera course on Data Mining

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull