Top Banner
Data Mining Priyabrata satapathy M.Tech 1 st Year SIS NO.-MCS12121
16
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining

Data Mining

Priyabrata satapathyM.Tech 1st Year

SIS NO.-MCS12121

Page 2: Data mining

Contents What is Data mining.

Why Data mining needed.

Data, Information, Knowledge.

Data mining & KDD.

Data Warehouses.

Data Cleaning.

Applications of Data mining.

Page 3: Data mining

What is data MiningData mining (knowledge discovery in databases):

Extraction of interesting information or patterns from data in large databases.

Knowledge discovery in databases (KDD) is the process of identifying valid, useful and ultimately understandable patterns in data from large database.

Page 4: Data mining

Why Data Mining Needed Data mining is needed for providing

tools to discover Knowledge from data.

Data mining turns a large collection of data into knowledge.

Page 5: Data mining

The Data •Data

Data are any facts, numbers, or text that can be processed by a computer.

operational or transactional data : such as, sales, cost, inventory, payroll, and accounting

meta data - : data about the data itself, such as logical database design or data dictionary definitions

nonoperational data: such as industry sales, forecast data, and macro economic data

Page 6: Data mining

The InformationThe patterns, associations, or relationships among All this data can provide information.

For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.

Page 7: Data mining

The Knowledge•Information can be converted into knowledge about historical patterns and future trends.

For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior.

Page 8: Data mining

Data, Information & Knowledge

Page 9: Data mining

Data Mining & KDDData cleaning Used to remove noise and inconsistent data.Data integration Where multiple data sources may be combined.Data selection Where data relevant to the analysis task are retrieved from the database.Data transformation Where data are transformed or consolidated into forms appropriate for mining by performing summary.Data mining An essential process where intelligent methods are applied in order to extract data patterns.

Page 10: Data mining

Data Mining & KDD

Page 11: Data mining

Data WarehouseIA data warehouse is a repository of information collected from multiple sources, stored under a unified schema and residing to a single site.

Data warehouse constructed through a process of data cleaning, data integration, data transformation, data loading & data refreshing.

Page 12: Data mining

Data CleaningData that is to be analyze by data mining techniques can be incomplete, noisy, and inconsistent.

Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconstancies of data.

Page 13: Data mining

Missing ValuesWe can clean the missing values in data by Ignoring the tuple. Filling the missing value manually. Using a global constant to fill the values. Using the measure of mean, median to fill the missing value. Using the most probable value to fill.

Page 14: Data mining

Noisy DataNoisy data means error full data .To handle noisy data : Binning:Binning methods smooth a sorted data value by consulting the neighborhood values around it. Regression: Data smoothing can be done by regression . Here data values changes to a function. Outlier: Outliers may be detected by clustering. Here similar values are arranged in clusters, those are fall outside are outliers.

Page 15: Data mining

Applications of Data MiningData mining for Financial data analysis.

Data mining for Retail and

Telecommunication Industries.

Data mining for Science and Engineering.

Data mining and Recommender systems.

Page 16: Data mining

Thank You