14.11.2001 Data mining: Classificati on 1 Course on Data Mining (581550-4 Course on Data Mining (581550-4 Intro/Ass. Rules Intro/Ass. Rules Episodes Episodes Text Mining Text Mining Home Exam Home Exam 24./26.10. 30.10. Clustering Clustering KDD Process KDD Process Appl./Summary Appl./Summary 14.11. 21.11. 7.11. 28.11.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
14.11.2001 Data mining: Classification 1
Course on Data Mining (581550-4)Course on Data Mining (581550-4)
Evaluation of Evaluation of classification methodsclassification methods
14.11.2001 Data mining: Classification 17
Decision tree inductionDecision tree induction
A decision tree is a tree whereA decision tree is a tree where
• internal nodeinternal node = a test on an attribute
• tree branch branch = an outcome of the test
• leaf nodeleaf node = class label or class distribution
A?
B? C?
D? Yes
14.11.2001 Data mining: Classification 18
Decision tree generationDecision tree generation
Two phases of decision tree generation:Two phases of decision tree generation:
• tree constructiontree construction
o at start, all the training examples at the root
o partition examples based on selected attributes
o test attributes are selected based on a heuristic or a statistical measure
• tree pruningtree pruning
o identify and remove branches that reflect noise or outliers
14.11.2001 Data mining: Classification 19
Decision tree induction – Decision tree induction – Classical example: play tennis?Classical example: play tennis?
Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N
Training set Training set from from Quinlan’s Quinlan’s ID3ID3
14.11.2001 Data mining: Classification 20
Decision tree obtained with ID3 Decision tree obtained with ID3 (Quinlan 86)(Quinlan 86)
outlook
overcast
humidity windy
high normal falsetrue
sunny rain
P
PN N P
14.11.2001 Data mining: Classification 21
From a decision tree From a decision tree to classification rulesto classification rules
• One rulerule is generated for each pathpath in the tree from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are generally simpler to understand than trees
IF outlook=sunnyAND
humidity=normal
THEN play tennis
outlook
overcast
humidity windy
high normal falsetrue
sunny rain
P
PN N P
14.11.2001 Data mining: Classification 22
Decision tree algorithmsDecision tree algorithms
• Basic algorithmBasic algorithm
o constructs a tree in a top-downtop-down recursive divide-divide-and-conquerand-conquer manner
o attributes are assumed to be categorical
o greedy (may get trapped in local maxima)
• Many variantsMany variants: ID3, C4.5, CART, CHAID
o main difference: divide (split) criterion / attribute selection measure
o predict multiple hypotheses, weighted by their probabilities
• StandardStandard:
o even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured
14.11.2001 Data mining: Classification 35
Bayesian classificationBayesian classification
• The classification problem may be formalized using a-posteriori probabilities:a-posteriori probabilities:
P(P(C|X) = ) = probability that the sample tuple
X=<x1,…,xk> is of the class C
• For example
P(P(class==N | N | outlook=sunny,windy=true,…))
• Idea: Idea: assign to sample X the class label C such that P(P(C|X)) is maximal
• If i-th attribute is categoricalcategorical:P(P(xi|C)) is estimated as the relative frequency of samples having value xi as i-th attribute in the class C
• If i-th attribute is continuouscontinuous:P(P(xi|C)) is estimated thru a Gaussian density function
• PartitionPartition: training-and-testing (large data sets)
o use two independent data sets, e.g., training set (2/3), test set(1/3)
• Cross-validationCross-validation (moderate data sets)
o divide the data set into k subsamples
o use k-1 subsamples as training data and one sub-sample as test data --- k-fold cross-validation
• BootstrappingBootstrapping: leave-one-out (small data sets)
14.11.2001 Data mining: Classification 43
• Classification is an Classification is an extensively studied problem extensively studied problem
• Classification is probably Classification is probably one of the most widely used one of the most widely used data mining techniques with data mining techniques with a lot of extensionsa lot of extensions
Summary (1)Summary (1)
14.11.2001 Data mining: Classification 44
• Scalability is still an Scalability is still an important issue for important issue for database applicationsdatabase applications
• Research directions: Research directions: classification of non-classification of non-relational data, e.g., text, relational data, e.g., text, spatial and multimediaspatial and multimedia
Summary (2)Summary (2)
14.11.2001 Data mining: Classification 45
Thanks to Thanks to Jiawei Han from Simon Fraser University Jiawei Han from Simon Fraser University
for his slides which greatly helped for his slides which greatly helped in preparing this lecture! in preparing this lecture!
Also thanks to Also thanks to Fosca Giannotti and Dino Pedreschi from Pisa Fosca Giannotti and Dino Pedreschi from Pisa
for their slides of classification.for their slides of classification.
• C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13, 1997.
• F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery, Sept. 1999.
• F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Aug. 1999.
• J. Catlett. Megainduction: machine learning on very large databases. PhD Thesis, Univ. Sydney, 1991.
• P. K. Chan and S. J. Stolfo. Metalearning for multistrategy and parallel learning. In Proc. 2nd Int. Conf. on Information and Knowledge Management, p. 314-323, 1993.
• J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
• J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.
• L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
• P. K. Chan and S. J. Stolfo. Learning arbiter and combiner trees from partitioned data for scaling machine learning. In Proc. KDD'95, August 1995.
• J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases, pages 416-427, New York, NY, August 1998.
• B. Liu, W. Hsu and Y. Ma. Integrating classification and association rule mining. In Proc. KDD’98, New York, 1998.
• J. Magidson. The CHAID approach to segmentation modeling: Chi-squared automatic interaction detection. In R. P. Bagozzi, editor, Advanced Methods of Marketing Research, pages 118-159. Blackwell Business, Cambridge Massechusetts, 1994.
• M. Mehta, R. Agrawal, and J. Rissanen. SLIQ : A fast scalable classifier for data mining. In Proc. 1996 Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, March 1996.
• S. K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Diciplinary Survey. Data Mining and Knowledge Discovery 2(4): 345-389, 1998
• J. R. Quinlan. Bagging, boosting, and C4.5. In Proc. 13th Natl. Conf. on Artificial Intelligence (AAAI'96), 725-730, Portland, OR, Aug. 1996.
• R. Rastogi and K. Shim. Public: A decision tree classifer that integrates building and pruning. In Proc. 1998 Int. Conf. Very Large Data Bases, 404-415, New York, NY, August 1998.
• J. Shafer, R. Agrawal, and M. Mehta. SPRINT : A scalable parallel classifier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases, 544-555, Bombay, India, Sept. 1996.
• S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.
• D. E. Rumelhart, G. E. Hinton and R. J. Williams. Learning internal representation by error propagation. In D. E. Rumelhart and J. L. McClelland (eds.) Parallel Distributed Processing. The MIT Press, 1986