Top Banner
Data Mining By: Thai Hoa Nguyen Pham
19

Data Mining By: Thai Hoa Nguyen Pham. Data Mining Define Data Mining Classification Association Clustering.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Data MiningBy: Thai Hoa Nguyen Pham

Page 2: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Data Mining Define Data Mining Classification Association Clustering

Page 3: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Define Data Mining Also known as KDD (Knowledge-Discovery in

Database).

Data mining is the semiautomatic process of analyzing data to find useful patterns.

Why semiautomatic?

Manual preprocessing of data and postprocessing of data.

Page 4: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Examples of Data Mining A simple example would be of a clothing retail store.

A data mining system could be used to list the customers who often buy t-shirts during the Summer season.

Another example would be of the urban legend of how Walmart used data mining to find a correlation between customers buying beer and baby diapers. So they put the two aisles close together to increase profits.

Page 5: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Classification If it is given that items in databases are put

into classes, a problem arises when a new item wants to be added to the database.

The class for the new item is unknown, so other methods have to be used to find the right class for the item to be put in. Rules then come in to solve the problems.

Page 6: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of a rule

P, P.degree = masters and P.income > 75,000 => P.credit = excellent

P, P.degree = bachelors and P.income < 50K => P.credit = bad

Page 7: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Decision Tree Classifiers Widely used technique for classification.

Internal nodes either called functions or predicates

Leaf nodes are associated classes.

Page 8: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of Decision Tree Classifiers

Functions

Classes

Root

Page 9: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of Decision Tree Classifiers Internal nodes or functions are inside the

boxes—degree (root) and income.

Leaf nodes or associated classes are the four different circles—bad, average, good, excellent.

Page 10: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Association An example of an association for beer and

diapers would be:

Beer => Diapers As already mentioned, the above association

just means that customers that buy beer often buy diapers, too.

Page 11: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Association Rules Support—is a measure of what fraction of the

population satisfies both the antecedent and the consequent. In other words, in the association below:

milk => screwdrivers

Higher percentage of the above association happening is worth more attention than lower percentage.

Page 12: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Association Rule 2

Confidence– The measure of how often the consequent is true when the antecedent is true.

bread = > milk

For example, if the association above had a confidence of 50 percent, it just means that 50 percent of the purchases include bread and milk, but it leaves room for other items purchased with the bread.

Page 13: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Clustering Clustering refers to finding clusters of points

in a given data and grouping them in different subsets.

Widely used clustering techniques—Hierarchical clustering, agglomerative clustering, and divisive clustering.

Page 14: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Types of Clustering Hierarchical—clustering that deals with grouping

things by importance.

Agglomerative—start by building small clusters, then progressively merge into larger clusters.

Decisive—begins with whole set and successively divides into smaller clusters.

Page 15: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of agglomerative hierarchical clustering

An example of a agglomerative clustering, where we have separate elements of a set merging with each internal node until the last merge “abcdef” is achieved.

Page 16: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Other types of mining Text Mining– data mining techniques to textual documents.

An example would be how there is a tool to form clusters on pages that users have visited. So if a user supplies a site and defines that he/she wants a site containing the keyword “Japan”, a list of sites that used the keyword “Japan” the most will appear.

Data Visualization—helps users to examine large volumes of data, and to detect patterns visually. So instead of seeing problems through text, visual displays can use maps and charts to pinpoint where the problem is with some color coding scheme.

Page 17: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of Text Mining

This example shows what happens when a user does a search for “Japan”. The points closer to the center of the circle has more information on Japan. We can think of the points as websites or research articles.

Page 18: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Example of Data-visualization

We could say a number of things for this example. We could say the map depicts poverty levels or which state grows more apples.

Page 19: Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

References Data mining. (2006, October 27). In Wikipedia, The Free Encyclopedia. Retrieved

05:59, October 30, 2006, from http://en.wikipedia.org/w/index.php?title=Data_mining&oldid=84059363

Data clustering. (2006, October 29). In Wikipedia, The Free Encyclopedia.

Retrieved 06:03, October 30, 2006, from http://en.wikipedia.org/w/index.php?title=Data_clustering&oldid=84478616

GISmatters (2004-2006) Retrived on October 31, 2006, from http://www.gismatters.com/over65.html

Martin, G., Spath, J. (2000) Kryptasthesie. Retrieved on October 31, 2006 from http://www.projekttriangle.com/work/work_rwe.htm?research

Silberschaz, A., Korth, H., Sudarshan, S. (2002). Database System Concepts. New York: New York.