Top Banner
Diambil dari © Copyright 2007, Natasha B 1 Introduction to Data Mining Informatika
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

1

Introduction to Data Mining

Informatika

Page 2: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

2

Outline

Motivation: Why Data Mining?

What is Data Mining?

Data Mining Applications

Issues in Data Mining

Page 3: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

3

Data vs. Information

Society produces massive amounts of data business, science, medicine, economics, sports, …

Potentially valuable resource Raw data is useless

need techniques to automatically extract information Data: recorded facts Information: patterns underlying the data

Page 4: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

4

Multidisciplinary Field

Data Mining

Database Technology

Statistics

OtherDisciplines

Artificial Intelligence (Machine Learning – Neural Network)

MachineLearning Visualization

Page 5: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

5

Terminology

Gold Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases or KDD Information harvesting Business intelligence

Page 6: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

6

KDD Process

Database

Selection Transformation

Data Preparation

Data Data MiningMining

Training Data

Evaluation, Verification

Model, Patterns

Page 7: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

7

Data Mining Tasks

Exploratory Data Analysis Predictive Modeling: Classification and Regression Descriptive Modeling

Cluster analysis/segmentation Discovering Patterns and Rules

Association/Dependency rules Sequential patterns Temporal sequences

Deviation detection

Page 8: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

8

Data Mining Tasks

Concept/Class description: Characterization and discrimination Generalize, summarize, and contrast data

characteristics, e.g., dry vs. wet regions

Association (correlation and causality) Multi-dimensional or single-dimensional association

age(X, “20-29”) ^ income(X, “60-90K”) buys(X, “TV”)

Page 9: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

9

Data Mining Tasks

Classification and Prediction

Finding models (functions) that describe and distinguish classes or concepts for future prediction

Example: classify countries based on climate, or classify cars based on gas mileage

Presentation: If-THEN rules, decision-tree, classification rule,

neural network Prediction: Predict some unknown or missing

numerical values

Page 10: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

10

Cluster analysis Class label is unknown: Group data to form

new classes, Example: cluster houses to find distribution

patterns

Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity

Data Mining Tasks

Page 11: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

11

Data Mining Applications

Science: Chemistry, Physics, Medicine Biochemical analysis Remote sensors on a satellite Telescopes – star galaxy classification Medical Image analysis

Page 12: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

12

Data Mining Applications

Bioscience Sequence-based analysis Protein structure and function prediction Protein family classification Microarray gene expression

Page 13: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

13

Pharmaceutical companies, Insurance and Health care, Medicine Drug development Identify successful medical therapies Claims analysis, fraudulent behavior Medical diagnostic tools Predict office visits

Data Mining Applications

Page 14: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

14

Financial Industry, Banks, Businesses, E-commerce Stock and investment analysis Identify loyal customers vs. risky customer Predict customer spending Risk management Sales forecasting

Data Mining Applications

Page 15: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

15

Retail and Marketing Customer buying patterns/demographic

characteristics Mailing campaigns Market basket analysis Trend analysis

Data Mining Applications

Page 16: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

16

Database analysis and decision support Market analysis and management

target marketing, customer relation management, market

basket analysis, cross selling, market segmentation

Risk analysis and management Forecasting, customer retention, improved underwriting,

quality control, competitive analysis

Fraud detection and management

Data Mining Applications

Page 17: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

17

Sports and Entertainment IBM Advanced Scout analyzed NBA game

statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat

Astronomy JPL and the Palomar Observatory discovered

22 quasars with the help of data mining

Data Mining Applications

Page 18: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

18

DATA MINING EXAMPLES

Grocery store NBA Banking and Credit Card scoring

Fraud detection Personalization & Customer Profiling Campaign Management and Database

Marketing

Page 19: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

19

Data Mining Challenges

Computationally expensive to investigate all possibilities

Dealing with noise/missing information and errors in data

Choosing appropriate attributes/input representation

Finding the minimal attribute space Finding adequate evaluation function(s) Extracting meaningful information Not overfitting

Page 20: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

20

Summary

Data mining: discovering interesting patterns from large amounts of data

A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation

Page 21: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

21

Summary

Mining can be performed in a variety of information repositories

Data mining functionalities: characterization, association, classification, clustering, outlier and trend analysis, etc.

Classification of data mining systems Major issues in data mining

Page 22: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

22

Kinds of Data Mining

Decision Tree Learning Clustering Neural Networks Association Rules Support Vector Machines Genetic Algorithms Nearest Neighbor Method

Page 23: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

23

DECISION TREE FOR THE CONCEPT

“Play Tennis”Day Outlook Temp Humidity Wind PlayTennis

D1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Strong YesD8 Sunny Mild High Weak NoD9 Sunny Cool Normal Weak YesD10 Rain Mild Normal Weak YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong NoMitchell, 1997

Page 24: Intro data mining lingkup

Diambil dari © Copyright 2007, Natasha B

24

DECISION TREE FOR THE CONCEPT

“Play Tennis”

[Mitchell,1997]