Top Banner
Ahmed Moussa 30-Feb-2010
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining

Ahmed Moussa30-Feb-2010

Page 2: Data mining

What is Data Mining?

Page 3: Data mining

The Evolution of Data Analysis

Page 4: Data mining

Evolutionary Step Business Question Enabling Technologies

Product Providers Characteristics

Data Collection (1960s)"What was my total revenue in the last

five years?"

Computers, tapes, disks IBM, CDC Retrospective, static data

delivery

Data Access (1980s) "What were unit sold last March?"

Relational databases

(RDBMS), Structured Query Language (SQL),

ODBC

Oracle, Sybase, Informix, IBM,

Microsoft

Retrospective, dynamic data delivery at record

level

Data Warehousing & Decision Support (1990s)

"What were unit sales in last March?

Drill down to Other."

On-line analytic processing (OLAP),

multidimensional databases, data

warehouses

SPSS, Comshare, Arbor, Cognos,

Microstrategy,NCR

Retrospective, dynamic data delivery at multiple

levels

Data Mining (Emerging Today)"What’s likely to

happen to unit sales next month? Why?"

Advanced algorithms,

multiprocessor computers, massive

databases

SPSS/Clementine, Lockheed, IBM, SGI, SAS, NCR, Oracle, numerous

startups

Prospective, proactive information delivery

- - RDBMS: A relational database management system

- ODBC: Open Database Connectivity (ODBC) provides a standard software API method for using database management systems (DBMS).

- OLAP : Online analytical processing, is an approach to quickly answer multi-dimensional analytical queries.

- SPSS: Statistical Package for the Social Sciences (formerly SPSS) is a computer program used for statistical analysis. Before 2009 it was called SPSS, but in 2009 it was re-branded as PASW.

Page 5: Data mining

Results of Data Mining Include

Page 6: Data mining

Results of Data Mining Include• Forecasting what may happen in the future

• Classifying people or things into groups by recognizing patterns.

• Clustering people or things into groups based on their attributes.

• Associating what events are likely to occur together.

• Sequencing what events are likely to lead to later events.

Page 7: Data mining

Results of Data Mining Include• Forecasting what may happen in the future

• Classifying people or things into groups by recognizing patterns.

• Clustering people or things into groups based on their attributes.

• Associating what events are likely to occur together.

• Sequencing what events are likely to lead to later events.

Page 8: Data mining

Results of Data Mining Include• Forecasting what may happen in the future

• Classifying people or things into groups by recognizing patterns.

• Clustering people or things into groups based on their attributes.

• Associating what events are likely to occur together.

• Sequencing what events are likely to lead to later events.

Page 9: Data mining

Results of Data Mining Include• Forecasting what may happen in the future

• Classifying people or things into groups by recognizing patterns.

• Clustering people or things into groups based on their attributes.

• Associating what events are likely to occur together.

• Sequencing what events are likely to lead to later events.

Page 10: Data mining

Results of Data Mining Include• Forecasting what may happen in the future

• Classifying people or things into groups by recognizing patterns.

• Clustering people or things into groups based on their attributes.

• Associating what events are likely to occur together.

• Sequencing what events are likely to lead to later events.

Page 11: Data mining

Data mining is not• Crunching of bulk data

• “Blind” application of algorithms• Going to find relationships where none exist

• Presenting data in different ways

• A database intensive task

• A difficult to understand technology requiring an advanced degree in computer science

Page 12: Data mining

Data Mining Is• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages analysis technologies and computing power.

• A group of techniques that find relationships that have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the business problem/subject matter expertise.

Page 13: Data mining

Data Mining Is• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages analysis technologies and computing power.

• A group of techniques that find relationships that have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the business problem/subject matter expertise.

Page 14: Data mining

Data Mining Is• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages analysis technologies and computing power.

• A group of techniques that find relationships that have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the business problem/subject matter expertise.

Page 15: Data mining

Data Mining Is• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages analysis technologies and computing power.

• A group of techniques that find relationships that have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the business problem/subject matter expertise.

Page 16: Data mining

Data Mining Is• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages analysis technologies and computing power.

• A group of techniques that find relationships that have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the business problem/subject matter expertise.

Page 17: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 18: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 19: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 20: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 21: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 22: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 23: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more

effectively• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying

profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

Page 24: Data mining

Examples of What People are Doing with Data Mining:

• Fraud/Non-Compliance Anomaly detection

• Isolate the factors that lead to fraud, waste and abuse• Target auditing and investigative efforts more effectively

• Credit/Risk Scoring• Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable

customers) • Service Delivery and Customer Retention

• Build profiles of customers likely to use which services

Page 25: Data mining

Examples of What People are Doing with Data Mining:

Right offer for the right customer throw the right channel in the right time

Page 26: Data mining

How Can We Do Data Mining?• A standard process

• Existing data

• Software technologies

• Situational expertise

Page 27: Data mining

How Can We Do Data Mining?• A standard process

• Existing data

• Software technologies

• Situational expertise

The data mining process must be reliable and repeatable by people with little data mining background.

Page 28: Data mining

Phases and Tasks

BusinessUnderstanding

DataUnderstanding

EvaluationDataPreparation

Modeling

Determine Business ObjectivesBackgroundBusiness ObjectivesBusiness Success Criteria

Situation AssessmentInventory of ResourcesRequirements, Assumptions, and ConstraintsRisks and ContingenciesTerminologyCosts and Benefits

Determine Data Mining GoalData Mining GoalsData Mining Success Criteria

Produce Project PlanProject PlanInitial Asessment of Tools and Techniques

Collect Initial DataInitial Data Collection Report

Describe DataData Description Report

Explore DataData Exploration Report

Verify Data Quality Data Quality Report

Data SetData Set Description

Select Data Rationale for Inclusion / Exclusion

Clean Data Data Cleaning Report

Construct DataDerived AttributesGenerated Records

Integrate DataMerged Data

Format DataReformatted Data

Select Modeling TechniqueModeling TechniqueModeling Assumptions

Generate Test DesignTest Design

Build ModelParameter SettingsModelsModel Description

Assess ModelModel AssessmentRevised Parameter Settings

Evaluate ResultsAssessment of Data Mining Results w.r.t. Business Success CriteriaApproved Models

Review ProcessReview of Process

Determine Next StepsList of Possible ActionsDecision

Plan DeploymentDeployment Plan

Plan Monitoring and MaintenanceMonitoring and Maintenance Plan

Produce Final ReportFinal ReportFinal Presentation

Review ProjectExperience Documentation

Deployment

Page 29: Data mining

Phases and Tasks

Page 30: Data mining

Phases and Tasks

Page 31: Data mining

Phases and Tasks

A) Business UnderstandingDetermine Business ObjectivesBackgroundBusiness ObjectivesBusiness Success CriteriaSituation AssessmentInventory of Resources Requirements, Assumptions, and ConstraintsRisks and ContingenciesTerminologyCosts and Benefits

Determine Data Mining Goal Data Mining GoalsData Mining Success Criteria

Produce Project PlanProject PlanInitial Asessment of Tools and Techniques

Page 32: Data mining

Phases and Tasks

Page 33: Data mining

Phases and Tasks

B) Data UnderstandingExplore DataData Exploration Report Verify Data Quality Data Quality Report

Collect Initial DataInitial Data Collection Report

Describe DataData Description Report

Page 34: Data mining

Phases and Tasks

Page 35: Data mining

Phases and Tasks

C) Data PreparationData SetData Set DescriptionSelect Data Rationale for Inclusion/ExclusionClean Data Data Cleaning Report

Integrate DataMerged DataFormat DataReformatted DataConstruct DataDerived AttributesGenerated Records

Page 36: Data mining

Phases and Tasks

Page 37: Data mining

Phases and Tasks

D) ModelingSelect Modeling Modeling TechniqueModeling AssumptionsGenerate Test DesignTest DesignBuild ModelParameter SettingsModels and Model DescriptionAssess ModelModel AssessmentRevised Parameter Settings

Page 38: Data mining

Phases and Tasks

Page 39: Data mining

Phases and Tasks

D) EvaluationEvaluate ResultsAssessment of Data Mining Results w.r.t. Business Success CriteriaApproved ModelsReview ProcessReview of ProcessDetermine Next StepsList of Possible ActionsDecision

Page 40: Data mining

Phases and Tasks

Page 41: Data mining

Phases and Tasks

E) DeploymentPlan DeploymentDeployment PlanPlan Monitoring and MaintenanceMonitoring and Maintenance PlanProduce Final ReportFinal ReportFinal PresentationReview ProjectExperience and Documentation

Page 42: Data mining

Data mining success story

Scheduled its workforce to provide faster, more accurate answers

to questions.

The US Internal Revenue Service needed to improve customer service and...

Page 43: Data mining

Data mining success story

analyzed suspects’ cell phone usage to focus investigations.

The US Drug Enforcement Agency needed to be more effective in their drug “busts” and

Page 44: Data mining

Data mining success story

Reduced direct mail costs by 30% while garnering 95% of the campaign’s

revenue.

HSBC need to cross-sell more effectively by identifying profiles that would be interested in

higher yielding investments and...

Page 45: Data mining

Final Comments

Page 46: Data mining

Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.

Page 47: Data mining

Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.

By using the DM methodology, analysts can have a reasonable level of assurance that their Data Mining efforts will render useful, repeatable, and valid results.

Page 48: Data mining
Page 49: Data mining
Page 50: Data mining