Top Banner
Chapter 1 Chapter 1 Initial Description of Initial Description of Data Mining in Business Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai
49

Chapter 1 Initial Description of Data Mining in Business

Jan 27, 2015

Download

Documents

Shelly38

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 1 Initial Description of Data Mining in Business

Chapter 1Chapter 1Initial Description of Data Mining Initial Description of Data Mining

in Businessin Business

Prepared by: Dr. Tsung-Nan Tsai

Page 2: Chapter 1 Initial Description of Data Mining in Business

結束

1-2

ContentsContents

Introduces data mining concepts

Presents typical business data applications

Explains the meaning of key concepts

Gives a brief overview of data mining tools

Outlines the remaining chapters of the book

Page 3: Chapter 1 Initial Description of Data Mining in Business

結束

1-3

DefinitionDefinition

DATA MINING: exploration & analysisRefers to the analysis of the large quantities of data that

are stored in computers.by automatic meansof large quantities of datato discover actionable patterns & rules

Data mining is a way to use massive quantities of data that businesses generate

GOAL - improve marketing, sales, customer support through better understanding of customers

Page 4: Chapter 1 Initial Description of Data Mining in Business

結束

1-4

Retail OutletsRetail Outlets

Bar coding & scanning generate masses of datacustomer service (Grocery stores can quickly

process he purchases and accurately determine product prices)

inventory control (Determine the quantity of items of each product on hand, supply chain management)

MICROMARKETINGCUSTOMER PROFITABILITY ANALYSISMARKET-BASKET ANALYSIS

Page 5: Chapter 1 Initial Description of Data Mining in Business

結束

1-5

Political Data MiningPolitical Data Mining

Grossman et al., 10/18/2004, Time, 38

2004 ElectionRepublicans: VoterVault

From Mid-1990sAbout 165 million votersMassive get-out-the-vote drive

for those expected to vote Republican

Democrats: DemzillaAlso about 165 million votersNames typically have 200 to

400 information items

Page 6: Chapter 1 Initial Description of Data Mining in Business

結束

1-6

Medical DiagnosisMedical Diagnosis

J. Morris, Health Management Technology Nov 2004, 20, 22-24

Electronic Medical RecordsAssociated Cardiovascular

Consultants31 physicians40,000 patients per year,

southern New JerseyData mined to identify

efficient medical practiceEnhance patient outcomesReduced medical liability

insurance

Page 7: Chapter 1 Initial Description of Data Mining in Business

結束

1-7

Mayo ClinicMayo Clinic

Swartz, Information Management Journal Nov/Dec 2004, 8

IBM developed EMR programComplete records on almost

4.4 million patients.Doctors can ask for how last

100 Mayo patients with same gender, age, medical history responded to particular treatments.

Page 8: Chapter 1 Initial Description of Data Mining in Business

結束

1-8

Business Uses of Data MiningBusiness Uses of Data Mining

Toyata used the data mining of its data warehouse to determine more efficient transportation routes, reducing time-to-market by average of 19 days.

Bank firms used the data mining in soliciting credit card customers,

Insurance and Telecommunication companies used DM to detect fraud.

Manufacturing firms used DM in quality control,

Many …..

Page 9: Chapter 1 Initial Description of Data Mining in Business

結束

1-9

Business Uses of Data MiningBusiness Uses of Data Mining

1. Customer profiling Identify profitability from subset customers

2. Targeting• Determine characteristics of most profitable

customers

3. Market-Basket Analysis• Determine correlation of purchases by profile

(customers)

• Cross-selling

• Part of Customer Relationship Management

Page 10: Chapter 1 Initial Description of Data Mining in Business

結束

1-10

What is needed to do DM?What is needed to do DM?

DM requires the identification of a problem, along with data collection that can lead to a better understanding of the market.

Computer models provide statistical or other means of analysis.

Two general types of DM studies:1. Hypothesis testing: involving expressing a theory

about the relationship between actions and outcomes.

2. Knowledge discovery: a preconceived notion may not be present, but rather than relationships can be identified by looking at the data (correlation analysis).

Page 11: Chapter 1 Initial Description of Data Mining in Business

結束

1-11

Reasons why Data Mining is now effectiveReasons why Data Mining is now effective

Data are there

Data are warehoused (computerized)Walmart: 35 thousand queries per week

Computing economically available

Competitive pressure

Commercial products available

Page 12: Chapter 1 Initial Description of Data Mining in Business

結束

1-12

TrendsTrends

Every business is servicehotel chains record your

preferencescar rental companies the sameservice versus price

credit card companieslong distance providersairlinescomputer retailers

Page 13: Chapter 1 Initial Description of Data Mining in Business

結束

1-13

TrendsTrends

Information as ProductCustom Clothing Technology Corporation

fit jeans, other clothing

INFORMATION BROKERINGIMS - collects prescription data from pharmacies, sells

to drug firmsAC Nielsen - TV

Page 14: Chapter 1 Initial Description of Data Mining in Business

結束

1-14

TrendsTrends

Commercial Software Availableusing statistical, artificial intelligence tools

that have been developedEnterprise Miner SASIntelligent Miner IBMClementine SPSSPolyAnalyst MegaputerSpecialty products

Page 15: Chapter 1 Initial Description of Data Mining in Business

結束

1-15

Fingerhut’s DM modelsFingerhut’s DM models

Fingerhut used segmentation, decision tree, regression analysis, and neural modeling tools from SAS for regression analysis tools and SPSS for neural network tools.

The segmentation model combines order and basic demographic data with Fingerhut’s product offerings.

Neural network models used to identify in mailing patterns and order filling telephone call orders.

Goal: Create new mailings targeted at customers with the greatest

potential payoff. Create a catalog containing products that those who is interested

in, such as furniture, telephones…

Page 16: Chapter 1 Initial Description of Data Mining in Business

結束

1-16

How Data Mining Is Being UsedHow Data Mining Is Being Used

U.S. Government track down Oklahoma City

bombers, Unabomber, many others

Treasury department - international funds transfers, money laundering

Internal Revenue Service

Page 17: Chapter 1 Initial Description of Data Mining in Business

結束

1-17

How Data Mining Is UsedHow Data Mining Is Used

Fireflyasks members to rate

music and moviessubscribers clusteredclusters get custom-

designed recommendations

Page 18: Chapter 1 Initial Description of Data Mining in Business

結束

1-18

Warranty Claims RoutingWarranty Claims Routing

Diesel engine manufacturerstream of warranty claimsexamine each by expert

determine whether charges are reasonable & appropriate

think of expert system to automate claims processing

Page 19: Chapter 1 Initial Description of Data Mining in Business

結束

1-19

Data mining application areaData mining application area

Application Area Applications Specifics

Retailing Affinity positioning

Cross-selling

Position products effectively

Find more products for customers

Banking Customer relationship management

Identify customer value

develop programs to maximize revenue

Credit card Management

Lift

Churn,

Fraud detection

Identify effective market segments

Identify likely customer turnover

Insurance Fraud detection Identify claims meriting investigation

Telecommunications Churn Identify likely customer turnover

Telemarketing Online information Aid telemarketers with easy data access

Human Resource Management

Churn Identify potential employee turnover

Page 20: Chapter 1 Initial Description of Data Mining in Business

結束

1-20

RetailingRetailing

Affinity positioning is based up the identification of products that the same customer is likely to want.Cold medicine tissues

Cross-selling: The knowledge of products that go together can be used by marketing the complementary product.Grocery stores do that through position product shelf

location.

Grocery stores generate mountains of cash register data. Current technology enables grocers to look at customers who have defected from a store, their purchase history, and characteristics of other potential defectors.

Page 21: Chapter 1 Initial Description of Data Mining in Business

結束

1-21

Cross-sellingCross-selling

USAA insurancedoubled number of products held by average

customer due to data miningdetailed records on customerspredict products they might need

Fidelity Investmentsregression - what makes customer loyal

Page 22: Chapter 1 Initial Description of Data Mining in Business

結束

1-22

BankingBanking

CRM involves the application of technology to monitor customer service, a function that is enhanced through data mining support.

DM applications in finance include predicting the prices of equities involving a dynamic environment with surprise information, some of which might be inaccurate …

Only 3% of the customers at Norwest bank provided 44% of their profits.

CRM products enable banks to define and identify customer and household relationships.

Page 23: Chapter 1 Initial Description of Data Mining in Business

結束

1-23

Retaining Good CustomersRetaining Good Customers

Customer loss:Banks - AttritionCellular Phone Companies - Churn

study who might leave, whySouthern California Gas

– customer usage, credit information

– direct mail contact - most likely best billing plan

– who is price sensitive

Who should get incentives, whom to keep

Page 24: Chapter 1 Initial Description of Data Mining in Business

結束

1-24

Credit card managementCredit card management

Bank credit card marketing promotions typically generate 1,000 responses to mailed solicitations – a response rate of about 1%. The rate is improved significantly through data mining analysis.

DM tools used by banks include credit scoring which is a quantified analysis of credit applicants with respect to predictions of on-time loan repayment. (Data covering deposits, savings, loans, credit card, insurance…).

These credit scores can be used to accept/reject recommendations, as well as to establish the size of a credit line.

ATM machines could be rigged up with electronic sales pitches for products that a particular customer is likely to be interested in.

Page 25: Chapter 1 Initial Description of Data Mining in Business

結束

1-25

Fairbank & MorrisFairbank & Morris

Credit card company’s most valuable asset:INFORMATION ABOUT CUSTOMERS

Signet Banking Corporationobtained behavioral data from many sourcesbuilt predictive modelsaggressively marketed balance transfer card

First Unionwho will move soon - improve retention

Page 26: Chapter 1 Initial Description of Data Mining in Business

結束

1-26

TelecommunicationsTelecommunications

Retention of customers for telemarketing is very difficult. The phenomenon of a customer switching carriers is referred to as churn, a fundamental concept in telemarketing as well as in other fields.A communications company considered the 1/3 of churn is due to poor call quality, and up to ½ is due to poor equipment.A cellular fraud prevention monitors traffic to spot problems with faulty telephones. When a telephone begins to go bad, telemarketing personal are alerted to contact the customer and suggest bringing the equipment in for service.Another way to reduce churn is to protect customers from subscription and cloning (duplication) fraud. Fraud prevention systems provide verification that is transparent to legitimate subscribers.

Page 27: Chapter 1 Initial Description of Data Mining in Business

結束

1-27

Human resource managementHuman resource management

Business intelligence is a way to truly understand markets, competitors, and processes.Software technology such as data warehouses, data marts, online analytical processing (OLAP), and data mining can be used to improve firm’s profitability.In HRM, the analysis can lead to the identification of individuals who are liable to leave the company unless additional compensation or benefits are provided.HRM would identify the right people so that organizations could treat them well and retain them (reduce churn).

Page 28: Chapter 1 Initial Description of Data Mining in Business

結束

1-28

Methodology and ToolsMethodology and Tools

Analyzing dataGiven management goals and that management

can translate knowledge into action

Page 29: Chapter 1 Initial Description of Data Mining in Business

結束

1-29

Basic StylesBasic Styles

Top-Down: HYPOTHESIS TESTINGSUPERVISEDhave a theory, experiment to prove or disproveSCIENCE

Bottom-Up: KNOWLEDGE DISCOVERYUNSUPERVISEDstart with data, see new patternsCREATIVITY

Page 30: Chapter 1 Initial Description of Data Mining in Business

結束

1-30

Hypothesis TestingHypothesis Testing

Generate theory

Determine data needed

Get data

Prepare data

Build computer model

Evaluate model resultsconfirm or reject hypotheses

Page 31: Chapter 1 Initial Description of Data Mining in Business

結束

1-31

Generate TheoryGenerate Theory

Systematically tie different input sources together (MENTAL MODEL)What causes sales volume?

sales rep performanceeconomy, seasonalityproduct quality, price, promotion,

location

Page 32: Chapter 1 Initial Description of Data Mining in Business

結束

1-32

Generate TheoryGenerate Theory

Brainstorm:diverse representatives for broad coverage of

perspectives (electronic)keep under control (keep positive)generate testable hypotheses

Page 33: Chapter 1 Initial Description of Data Mining in Business

結束

1-33

Define Data NeededDefine Data Needed

Determine data needed to test hypothesisLucky - query existing databaseMore often - gather

pull together from diverse databases, survey, buy

Page 34: Chapter 1 Initial Description of Data Mining in Business

結束

1-34

Locate DataLocate Data

Usually scattered or unavailable

Sources: warranty claims

point-of-sale data (cash register records) medical insurance claims telephone call detail records direct mail response records demographic data, economic data

PROFILE: counts, summary statistics, cross-tabs, cleanup

Page 35: Chapter 1 Initial Description of Data Mining in Business

結束

1-35

Prepare Data for AnalysisPrepare Data for Analysis

Summarize: too much - no discriminant information too little - swamped with useless

detailProcess for computer: ASCII, SpreedsheetData encoding: how data are recorded can vary - may have been collected with specific purposeTextual data: avoid if possible (may need to code)Missing values: missing salary - use mean?

Page 36: Chapter 1 Initial Description of Data Mining in Business

結束

1-36

Build and Evaluate ModelBuild and Evaluate Model

Build Computer ModelChoice the appropriate modeling tools and algorithmsTraining and test data sets.

Determine if hypotheses supportedstatistical practicetest rule-based systems for accuracy

Requires both business and analytic knowledge

Page 37: Chapter 1 Initial Description of Data Mining in Business

結束

1-37

SUPERVISEDSUPERVISED

Dorn, National Underwriter Oct 18, 2004, 34,39

Health care fraudUse statistics to identify

indicators of fraud or abuseCan rapidly sort through large

databasesIdentify patterns different from

normModerately successful

But only effective on schemes already detected

To benefit firm, need to identify fraud before paying claim

Page 38: Chapter 1 Initial Description of Data Mining in Business

結束

1-38

Knowledge DiscoveryKnowledge Discovery

Machine learning?Usually need intelligent analyst

Directed: explain value of some variable

Undirected: no dependent variable selectedidentify patterns

Use undirected to recognize relationships; use directed to explain once found

Page 39: Chapter 1 Initial Description of Data Mining in Business

結束

1-39

DirectedDirected

Goal-orientedExamples: If discount applies, impact on products -

who is likely to purchase credit insurance?Predicted profitability of new customer - what to bundle with a particular packageIdentify sources of preclassified dataPrepare data for analysisBuilt & train computer modelEvaluate

Page 40: Chapter 1 Initial Description of Data Mining in Business

結束

1-40

Identify Data SourcesIdentify Data Sources

Best - existing corporate data warehousedata clean, verified, consistent, aggregated

Usually need to generatemost data in form most efficient for designed

purposehistorical sales data often purged for dormant

customers (but you need that information)

Page 41: Chapter 1 Initial Description of Data Mining in Business

結束

1-41

Prepare DataPrepare Data

Put in needed format for computer

Make consistent in meaning

Need to recognize what data are missingchange in balance = new – old

add missing but known-to-be-important data

Divide data into training, test, evaluation

Decide how to treat outliersstatistically biasing, but may be most important

Page 42: Chapter 1 Initial Description of Data Mining in Business

結束

1-42

Build & Train ModelBuild & Train Model

Regression - human builds (selects IVs)

Automatic systems traingive it data, let it hammer

OVERFITTING:fit the dataTEST SET a means to evaluate model against

data not used in trainingtune weights before using to evaluate

Page 43: Chapter 1 Initial Description of Data Mining in Business

結束

1-43

Evaluate ModelEvaluate Model

ERROR RATE: proportion of classifications in evaluation set that were wrong

too little training: poor fit on training data and poor error rate

optimal training: good fit on both

too much training: great fit on training data and poor error rate

Page 44: Chapter 1 Initial Description of Data Mining in Business

結束

1-44

Undirected DiscoveryUndirected Discovery

What items sell together? Strawberries & creamDirected: What items sell with tofu? tabasco

Long distance caller market segmentationUniform usage - weekday & weekend, spikes

on holidaysAfter segmentation:

high & uniform except for several months of nothing

Page 45: Chapter 1 Initial Description of Data Mining in Business

結束

1-45

UNSUPERVISEDUNSUPERVISED

Dorn, National Underwriter Oct 18, 2004, 34,39

Health care fraudLook at historical claim

submissionsBuild ad hoc model to

compare with current claims

Assign similarity score to fraudulent claims

Predict fraud potential

Page 46: Chapter 1 Initial Description of Data Mining in Business

結束

1-46

Undirected ProcessUndirected Process

Identify data sources

Prepare data

Build & train computer model

Evaluate model

Apply model to new data

Identify potential targets for undirected

Generate new hypotheses to test

Page 47: Chapter 1 Initial Description of Data Mining in Business

結束

1-47

Generate hypothesesGenerate hypotheses

Any commonalities in data?

Are they useful?Many adults watch children’s movies

chaperones are an important market segmentthey probably make final decision

When hypothesis is generated, that determines data needed

Page 48: Chapter 1 Initial Description of Data Mining in Business

結束

1-48

Bank Case StudyBank Case Study

Directed knowledge discovery to recognize likely prospects for home equity loan

training set - current loan holdersdeveloped model for propensity to borrow got continuous scores, ranked customerssent top 11% material

Undirected: segmented market into clustersin one, 39% had both business & personal

accountscluster had 27% of the top 11%

Hypothesis: people use home equity to start business

Page 49: Chapter 1 Initial Description of Data Mining in Business

結束

1-49

Data mining products and data setsData mining products and data sets

A good source to view current DM products is www.KDNuggests.com.

The UCI Machine Learning Repository is a source of very good data mining datasets at www.ics.uci.edu/~mlearn/MLOther.html.

Weka DM software at http://www.cs.waikato.ac.nz/ml/weka/

Tanagra DM software at http://eric.univ-lyon2.fr/~ricco/tanagra/index.html