Top Banner
WELCOME TO THE SEMINAR ON DATA MINING By Ujjwal Kumar MSC 635 1
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining

WELCOME TO THE SEMINAR

ON

DATA MININGBy Ujjwal Kumar MSC 635

1

Page 2: Data mining

2

Overview

• Introduction• Why Data Mining?• Goals of Data Mining• Architecture of Data Mining• Data Mining – On what kind of data?• Data Mining techniques• Advantages of Data Mining• Data mining tools/software• Conclusion• References

Page 3: Data mining

3

Introduction

• Data Mining – extracting or “mining” knowledge from large amounts of data.• It’s also popularly referred to KDD (Knowledge Discovery from Data) is the automated of convenient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouse, the web, other massive information repositories, or data streams.

Page 4: Data mining

4

Why Data mining?

• The explosive growth in data: from terabytes to petabytes Data collection and data availability Major sources of abundant data (Business, Science, Society and everyone)• We have find a more effective way to use these data in decision support process than just using traditional query languages.

Page 5: Data mining

5

Goals of Data Mining

• prediction: how certain attributes within the data will behave in the future.• Identification: identify the existence of an item, an event, an activity.• Classification: partition the data into categories.

• Optimization: optimize the use of limited resources.

Page 6: Data mining

6

Architecture of Data Mining

• The architecture of a typical data mining system may have the following major components: Database, Data warehouse, WWW,

Other information repository Database or Data warehouse server Data mining engine Pattern evolution model User interface

Page 7: Data mining

7

User Interface

Pattern Evolution

Data Mining Engine

Database or Data Warehouse Server

DatabaseData

WarehouseWorld Wide

WebOther Info

Repositories

Knowledge Base

Data cleaning, integration and selection

Page 8: Data mining

8

Data Mining – On what kind of data?

• Data Mining should be applicable to any kind of data repository, as well as to transient data, such as data streams.• The challenges and techniques of mining may differ for each of the repository systems.

Page 9: Data mining

9

Data Mining – On what kind of data?

• Relational Databases

• Transactional Databases• Temporal Databases• Object - Relational Databases• Spatial Databases• Text Databases and Multimedia DB• Legacy Databases

Page 10: Data mining

10

Data Mining techniques

• Classification

• Clustering

• Association

Page 11: Data mining

11

Classification• Classification is the process of finding a model that describes and distinguishes data classes or concepts• Classify the new item and identify to which class it belongs

Page 12: Data mining

12

Clustering• In general, the class label are not present in the training data simply they are not known to begin with• The objects are clustered or grouped based on the principle of maximizing the intraclass similarity and minimizing the intraclass similarity.

• Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased.

Page 13: Data mining

13

Clustering• Group data in to clusters Similar data is grouped in same cluster Dissimilar data is grouped in same cluster

Page 14: Data mining

A 1% support means that 1% of all of the transactions under analysis showed that computer and software were purchased together.

14

Association rules• “An association algorithm creates rules that describe how often events have occurred together.”• Example: buys(X, “computer”)=> buys(X, “software”) [support=1%, confidence=50%]

Where X is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer buys a computer, there is a 50% chance that they will buy software as well.

Page 15: Data mining

15

Advantages of Data Mining• Provides new knowledge from existing data:

Public databases Government sources Company Databases• Old data can be used to develop

new knowledge • New knowledge can be used to improve services or products• Improvements lead to:

Bigger profits More efficient service

Page 16: Data mining

16

Uses of Data Mining• Sales/ Marketing analysisMarketing strategiesAdvertisements

• Risk analysis and Management Finance and InvestmentManufacturing and Production

• Text mining (news group, email, documents) and web mining.

Page 17: Data mining

17

Uses of Data Mining• DNA and Bio-data analysisEffectiveness of treatmentsIdentify new drugs

• Fraud detectionIdentify people misusing the systemFinancial transactions

• Customer careIdentify customer needs

Page 18: Data mining

18

Data Mining tools

• Intelligent Miner, IBMSPSS modeler

• Enterprise Miner

• ODM

• Ghost Miner

• Rapid Miner

Page 19: Data mining

19

Conclusion

• Data mining is a “Decision Support” process in which we search for patterns of information in data.• This technique is used for many types of data.• Overlaps with machine learning, statistics, artificial intelligence, databases and visualization.

Page 20: Data mining

20

References

• “Data Mining Concepts & Techniques” by Jiawei Han and Micheline Kamber

• http://www.oracle.com/

• http://www.datamininglab.com/

• http://www.onlinelibrary.wiley.com

• http://www.cs.sjsu.edu/

Page 21: Data mining

21

Thank You