Top Banner
Future Institute Of Engineering & Management COMPUTER SCIENCE AND ENGINEERING DEPARTMENT SEMINAR ON : DATA MINING AND IT’S APPLICATIONS BY ABHIJIT KARMAKAR(43) PRATAP KARMAKAR(07) SAYAN CHAUDHURI(17) & SHRISH KUMAR(55)
26

Future Institute Of Engineering & Management

Apr 13, 2017

Download

Documents

Sayan Chaudhuri
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Future Institute Of Engineering & Management

Future Institute Of Engineering & Management

COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

SEMINAR ON : DATA MINING AND IT’S APPLICATIONS

BY ABHIJIT KARMAKAR(43)PRATAP KARMAKAR(07) SAYAN CHAUDHURI(17)& SHRISH KUMAR(55)

Page 2: Future Institute Of Engineering & Management

Why Data Mining ?

Methodology of Knowledge discovery in Databases.

Business Applications of Data Mining

Data Mining Functionalities

Page 3: Future Institute Of Engineering & Management

What is Data Mining ?

Process of semi-automatically analyzing large database to find patterns that are: Valid : hold on new data with some certainty Novel: non-obvious to the system Useful : should be possible to act on the item Understandable : Humans should be able to

interpret the pattern

Page 4: Future Institute Of Engineering & Management

Where to use it?

1) Banking : loan/credit card approval2) Customer relationship management3) Targeted marketing4) Fraud detection: telecommunications,

financial transactions5) Manufacturing and production6) Biomedical implementation

Page 5: Future Institute Of Engineering & Management

“Necessity is the Mother of Invention”

Data explosion problem Automated data collection tools and mature database technology lead to tremendous

amounts of data stored in databases, data warehouses and other information repositories Need to convert such data into knowledge and information Applications Business management Production control Market analysis Engineering design Science exploration

Page 6: Future Institute Of Engineering & Management

Developments in Computer Hardware

Powerful and affordable computersData collection equipmentStorage mediaCommunication and networking

Page 7: Future Institute Of Engineering & Management

Evolution of Databases Technology

Data collection, database creation Data management

data storage and retrieval database transaction processing

Data analysis and understanding Data mining and data warehousing

Page 8: Future Institute Of Engineering & Management

Steps of kdd process

1. Goal identification: Define problem relevant prior knowledge and goals of application

2. Creating a target data set: data selection

3. Data preprocessing: (may take 60%-80% of effort!) removal of noise or outliers strategies for handling missing data fields accounting for time sequence information

4. Data reduction and transformation: Find useful features, dimensionality/variable reduction, invariant

representation.

Page 9: Future Institute Of Engineering & Management

Continued… 5.Data Mining:

Choosing functions of data mining: summarization, classification, regression, association, clustering.

Choosing the mining algorithm(s): which models or parameters

Search for patterns of interest

6. Presentation and Evaluation: visualization, transformation, removing redundant patterns, etc.

7. Taking action: incorporating into the performance system documenting reporting to interested parties

Page 10: Future Institute Of Engineering & Management

Data WarehouseRepository of multiple heterogeneous data sources, organized under a unified schema at a

single site in order to facilitate management decision making.Data warehouse technology includes: Data cleaning Data integration On-Line Analytical Processing (OLAP): Techniques that support multidimensional analysis

and decision making with the following functionalities summarization consolidation aggregation view information from different angles

but additional data analysis tools are needed for classification clustering characterizations of data changing over time

Page 11: Future Institute Of Engineering & Management

Data-rich, Information-poor State

Abundance of data AND need for powerful data analysis tools “data tombs” - data archives

seldom visited Important decisions are made

not on the information rich data stored in databases but on a decision maker’s intuition

No tool to extract knowledge embedded in vast amounts of data

Current expert system technology Users or domain experts manually input knowledge which is time

consuming, costly, prone to biases errors

Page 12: Future Institute Of Engineering & Management

Data Mining: A KDD Process

Data mining: the core of knowledge discovery process.

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Data SelectionData transformation

Data Mining

Pattern Evaluation

Page 13: Future Institute Of Engineering & Management

Data Warehouse

Data cleaning & data integration Filtering

Databases

Database or data warehouse server

Data mining engine

Pattern evaluation

Graphical user interface

Architecture of a Typical Data Mining System

Page 14: Future Institute Of Engineering & Management

DatabaseFind all credit applicants with last name of Smith.

Identify customers who have purchased more than $10,000 in the last month.

Find all customers who have purchased milk

Find all credit applicants who are poor credit risks. (classification)

Identify customers with similar buying habits. (Clustering)

Find all items which are frequently purchased with milk. (association rules)

Data Mining

Difference between Database and Data mining

Page 15: Future Institute Of Engineering & Management

Data mining models and tasks

Page 16: Future Institute Of Engineering & Management

Basic data mining tasks

Classificationsmaps data into predefined groups or classes

• Supervised learning

• Pattern recognition

• Prediction

Regressionis used to map a data item to a real valued prediction variable.

Clusteringgroups similar data together into clusters.

• Unsupervised learning

• Segmentation• Partitioning

Page 17: Future Institute Of Engineering & Management

Continued….

Summarizationmaps data into subsets with associated simple descriptions.

Characterization Generalization

Link Analysisuncovers relationships among data.

Affinity Analysis Association Rules Sequential Analysis

determines sequential patterns

Page 18: Future Institute Of Engineering & Management

Data Mining Applications

Data mining is an interdisciplinary field with wide and diverse applications There exist nontrivial gaps between data mining principles and domain-specific

applications Some application domains

Financial data analysis Retail industry Telecommunication industry Biological data analysis

Page 19: Future Institute Of Engineering & Management

Data Mining for Financial Data Analysis

Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality

Design and construction of data warehouses for multidimensional data analysis and data mining View the debt and revenue changes by month, by region, by sector, and by

other factors Access statistical information such as max, min, total, average, trend, etc.

Loan payment prediction/consumer credit policy analysis Loan payment performance Consumer credit rating

Page 20: Future Institute Of Engineering & Management

Data Mining for Retail Industry

Retail industry: huge amounts of data on sales, customer shopping history, etc. Applications of retail data mining

Identify customer buying behaviors Discover customer shopping patterns and trends Improve the quality of customer service Achieve better customer retention and satisfaction Enhance goods consumption ratios Design more effective goods transportation and distribution policies

Page 21: Future Institute Of Engineering & Management

Data Mining for Telecomm. Industry

A rapidly expanding and highly competitive industry and a great demand for data mining Understand the business involved Identify telecommunication patterns Catch fraudulent activities Make better use of resources Improve the quality of service

Multidimensional analysis of telecommunication data Intrinsically multidimensional: calling-time, duration, location of caller, location of

calle, type of call, etc.

Page 22: Future Institute Of Engineering & Management

Biomedical Data Analysis

Protein Folding:Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds")

Extra-Terrestrial Intelligence:Scanning Satellite receptions for possible transmissions from other planets.

Page 23: Future Institute Of Engineering & Management

Prevalence of Data Mining

Your data is already being mined, whether you like it or not. Many web services require that you allow access to your information [for data

mining] in order to use the service. Google mines email data in Gmail accounts to present account owners with ads. Facebook requires users to allow access to info from non-Facebook pages. Facebook privacy policy:

"We may use information about you that we collect from other sources, including but not limited to newspapers and Internet sources such as blogs, instant messaging services and other users of Facebook, to supplement your profile.

This allows access to your blog RSS feed (rather innocuous), as well as information obtained through partner sites (worthy of concern).

Page 24: Future Institute Of Engineering & Management

Data Mining Controversies

Latest one: Facebook's Beacon Advertising program (Just popped on Slashdot within some years ago)

What Beacon does: “when you engage in consumer activity at a [Facebook] partner website, such as Amazon, eBay, not only will Facebook record that activity, but your Facebook connections will also be informed of your purchases or actions.”

Facebook currently offers users no way to opt out of Beacon (once it has been activated ?). Users can close the accounts, but account data is never deleted.

Page 25: Future Institute Of Engineering & Management

Bibliography

P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005

Wikipedia Data Mining entryhttp://en.wikipedia.org/wiki/Data_mining

Data Mining in Social Networkshttp://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf

“Privacy is Dead - Get Over It: Revisited" Steve Rambam's Hope Number Six lecturehttp://www.hopenumbersix.net/speakers.html#pid2

Facebook Data Mining guidehttp://saunderslog.com/2007/11/25/facebook-market-research-secrets/

Page 26: Future Institute Of Engineering & Management

Thank You