Top Banner
Data Mining Data Mining Chun-Hung Chou [email protected]
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Data MiningData Mining

Chun-Hung [email protected]

Page 2: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

OutlineOutline

• Data Mining Overview

• Functionalities

• Examples

• Q & A

Page 3: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

What is Data Mining?What is Data Mining?

• Searching for knowledge(interesting patterns) in your data

• a process that uses a variety of data analysis tools to discover patterns and relationships in data.

• Uses tools from Computer Science and Artificial Intelligence as well as Statistics.

Page 4: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Why we need data mining?Why we need data mining?

– Large number of records (cases) (108-1012 bytes)– High dimensional data (variables) (10-104 attributes)– Only a small portion, typically 5% to 10%, of the

collected data is ever analyzed.– Data that may never be explored continues to be

collected out of fear that something that may prove important in the future may be missing.

– Magnitude of data precludes most traditional analysis ANOVA/PC/

Page 5: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Goals of Data MiningGoals of Data Mining

•Prediction using some variables or fields in the data set to predict

unknown or future values of other variables of interest

produce a model,expressed as an executable code, which can

be used to perform classification, prediction, estimation or

other similar tasks

•Description finding patterns describing the data that can be interpreted

by humans

understanding of the analyzed system by uncovering patterns

and relationships in large data sets

Page 6: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Procedure of Data MiningProcedure of Data Mining

Interpret the model & draw the conclusions

State the problem

Collect the data

Perform preprocessing

Estimate the model (mine the data)

Page 7: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

State the problemState the problem

– domain-specific knowledge and experience are necessary in order to come up with a meaningful problem statement

– A close interaction between data mining expert and the application expert

– This cooperation does not stop in the initial phase; it continues during the entire data mining process

Page 8: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Collect the dataCollect the data

– Designed experiment data

the data-generation process is under the

control of an expert

– Observational approach

random data generation

Page 9: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Preprocessing the dataPreprocessing the data

• Outlier detection

a)Detect and eventually remove outliers as a part of the preprocessing phase

b)Develop robust modeling methods that are

insensitive to outliers

• Scaling,encoding and selecting

features a)variables with different scale

b)dimensionality reduction

Page 10: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Estimate the modelEstimate the model

• Selection and implementation of the

appropriate data mining technique

Page 11: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Interpret the model & draw Interpret the model & draw conclusionsconclusions

• Decision making

• Validate the result

Page 12: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Potential ApplicationsPotential Applications

– Fraud Detection – Manufacturing Processes – Targeting Markets – Scientific Data Analysis– Risk Management– Web Intelligence– Bioinformation– …...

Page 13: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Data Mining MythsData Mining Myths

• Data mining tools need no guidance.

• Data mining models explain behavior.

• Data mining requires no data analysis skill.

• Data mining eliminates the need to understand your business and your data

• Data mining tools are “different” from statistics.

Page 14: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Data Mining FunctionalitiesData Mining Functionalities

• Concept/Class Description

• Association Analysis

• Classification Analysis

• Cluster Analysis

• Outlier Analysis

• Evolution Analysis

Page 15: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Concept DescriptionConcept Description

Generate descriptions for characterization and

comparison of data

characterization :

summarizes and describes a collection of data

e.g. mean,distribution,percentile,..

comparison :

summarizes and distinguishes one collection of data from other

collection(s) of data

Page 16: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Association AnalysisAssociation Analysis

Goal: find interesting relationships among items in a given data set

Page 17: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Association AnalysisAssociation Analysis

Example:• Market Basket Analysis - An example of Rule-based

Machine Learning• Customer Analysis

– Market Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases

• Product Analysis– Market Basket Analysis gives us insight into the

merchandise by telling us which products tend to be purchased together and which are most amenable to purchase

Page 18: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Classification AnalysisClassification Analysis

Goal:

Build a model to describe a predetermined set of data

classes or concepts and use the model as prediction

Page 19: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Classification AnalysisClassification Analysis

Method: Decision Tree Bayesian network Bayesian belife network Neural network k-nearest neighbor case-based reasoning genetic algorithm rough sets fuzzy logic SVM/SOM ….

Page 20: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Cluster AnalysisCluster Analysis

Goal:

grouping a set of physical or abstract objects into classes

of similar objects

Page 21: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

ClusterCluster

• Method:

Partitioning methods :k-means

Hierarchical methods :top-down,bottom-up

Density-based methods :arbitrary shapes

Grid-based methods :cells

Model-based methods :best fit of given model

Page 22: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Outlier AnalysisOutlier Analysis

Outlier: the data can be considered as

inconsistent in a given data set

Goal: find an efficient method to mine the

outliers

Page 23: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Outlier AnalysisOutlier Analysis

Method:

- Statistical-Based Outlier Detection

- Distance-Based Outlier Detection

- Deviation-Based Outlier Detection

Page 24: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Evolution AnalysisEvolution Analysis

• Goal:

Describe and models regularities or trends for

objects whose behavior changes over time

Page 25: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Evolution AnalysisEvolution Analysis

• Method:

Statistical Method

Trend Analysis

Similarity Search in Time-Series Analysis

Sequential Pattern Mining

Periodicity Analysis

Page 26: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

ExampleExample

V6A event : DT 1 poly Dep BLC

Root cause: FUR-DPA-02 (VPLPDT1 : DT#1 POLY DEP)

Page 27: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

ExampleExample

• Result

Page 28: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Question & Suggestion

Page 29: Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw.

Thanks !