Top Banner
Data Mining Student Name:
15

Introduction of Data Mining

Apr 14, 2015

Download

Documents

rubinwang007

a brief introduction of data mining. Introduce what is the definition of data mining.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction of Data Mining

Data Mining

Student Name:

Page 2: Introduction of Data Mining

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. - Fayyad et al., (1996)

Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large database.

Page 3: Introduction of Data Mining

Most common standard processes: CRISP-DM (Cross-Industry Standard Process

for Data Mining) SEMMA (Sample, Explore, Modify, Model,

and Assess) KDD (Knowledge Discovery in Databases)

Page 4: Introduction of Data Mining

Data Sources

Business Understanding

Data Preparation

Model Building

Testing and Evaluation

Deployment

Data Understanding

6

1 2

3

5

4

Page 5: Introduction of Data Mining

CRISP-DM provides a systematic and orderly way to conduct data mining projects.

This process has six steps. First, an understanding of the data and an

understanding of the business issues to be addressed are developed concurrently.

Second, data are prepared for modeling; After data are modeled; Next, model results are evaluated; finally the models can be employed for regular

use.

Page 6: Introduction of Data Mining

Sample

(Generate a representative sample of the data)

Modify(Select variables, transform

variable representations)

Explore(Visualization and basic description of the data)

Model(Use variety of statistical and machine learning models )

Assess(Evaluate the accuracy and usefulness of the models)

SEMMA

Page 7: Introduction of Data Mining

The main difference between CRISP-DM and SEMMA is that CRISP-DM takes a more comprehensive approach—including understanding of the business and the relevant data—to data mining projects, whereas SEMMA implicitly assumes that the data mining project’s goals and objectives along with the appropriate data sources have been identified and understood.

Page 8: Introduction of Data Mining

A large range of Data Mining methods are available now-a-days to handle the huge volume of data in any domain.

Classification Clustering Association Sequence Discovery

Page 9: Introduction of Data Mining

Classification learns patterns from past data (a set of information—traits, variables, features—on characteristics of the previously labeled items, objects, or events) in order to place new instances (with unknown labels) into their respective groups or classes.

The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior.

Page 10: Introduction of Data Mining

Cluster analysis is an exploratory data analysis tool for solving classification problems.

The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters.

Page 11: Introduction of Data Mining

Association rule mining is a popular data mining method that is commonly used as an example to explain what data mining is and what it can do to a technologically less savvy audience.

Association rule mining aims to find interesting relationships (affinities) between variables (items) in large databases. For example, a recession is associated with decline in house prices.

Page 12: Introduction of Data Mining

Sequence discovery is the identification of association over time.

When appropriate information is available (e.g., the identity of a customer is a retail shop), a temporal analysis can be conducted to identify behaviour over time.

This provides a considerable amount of information that could be used to increase sales or to detect fraud.

Page 13: Introduction of Data Mining

Commercial SPSS - PASW (formerly Clementine) SAS - Enterprise Miner IBM - Intelligent Miner StatSoft – Statistical Data Miner Free and/or Open Source Weka RapidMiner…

Page 14: Introduction of Data Mining

Data mining is considered to be a powerful analytical tool helping decision-makers understand the past and predict the future. However there are common myths and mistakes associated with this field. provides instant solutions/predictions is not yet viable for business applications requires a separate, dedicated database can only be done by those with advanced degrees is only for large firms that have lots of customer

data

Page 15: Introduction of Data Mining

Data Mining refers to develop business intelligence from data that an organization collects, organizes, and processes.

Data mining techniques are being used by organizations to gain a better understanding of their customers and their own operations.