Top Banner
Introduction to Data Mining Data Mining for Business Analytics Introduction to Data Science Kent State University Spring 2015 – Class 1 These slides incorporate the result of input/collaborations with Maytal Saar-Tsechansky, Claudia Perlich, and Foster Provost.
23

Data Mining for Business Analydaytics - Week 1 (1)

Dec 07, 2015

Download

Documents

SailorOlive

data mining basics
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining for Business Analydaytics - Week 1 (1)

Introduction to Data Mining

Data Mining for Business Analytics

Introduction to Data Science

Kent State University

Spring 2015 – Class 1

These slides incorporate the result of input/collaborations with Maytal Saar-Tsechansky,

Claudia Perlich, and Foster Provost.

Page 2: Data Mining for Business Analydaytics - Week 1 (1)

An example business problem

• TelCo, a major telecommunications firm, wants to investigate its problem with customer attrition, or “churn”

• Let’s consider this for now as a marketing problem only

How would you go about targeting some customers with

a special offer, prior to contract expiration? Think about

what data should be available for you to use.

Page 3: Data Mining for Business Analydaytics - Week 1 (1)

Moneyball

•The story of Oakland A's general manager Billy Beane's successful attempt to put together a baseball club on a budget by employing computer-based data analysis to draft his players.

Page 4: Data Mining for Business Analydaytics - Week 1 (1)

Roles in Data Science

• Data Scientist• Understand the potential

• Can translate from business to execution

• Ability to evaluate proposal and execution

• Can do the actual modeling

• Applied statistician x computer scientist

Page 5: Data Mining for Business Analydaytics - Week 1 (1)
Page 6: Data Mining for Business Analydaytics - Week 1 (1)

Roles in Data Science

• Data Scientist• Understand the potential• Can translate from business to execution• Ability to evaluate proposal and execution• Can do the actual modeling• Applied statistician x computer scientist

• Collaborator in a data-centric project• Can translate from business to the execution

• Managing a data-mining project• Understanding the potential• Ability to evaluate a proposal and execution• Ability to interface with broad variety of people

• Strategist, Investor, …• Envisions opportunities, come up with novel ideas, evaluate the promise of new ideas, design data

science project / companies conceptually

Page 7: Data Mining for Business Analydaytics - Week 1 (1)

Learning Goals

• Approach business problems data-analytically• Think carefully & systematically about whether & how data can improve

performance

• Be able to interact competently on the topic of data mining for business analytics

• Know the basics of the data mining processes, techniques, and concepts well enough

• Receive hands-on experience mining data• You should be able to follow up on ideas or opportunities that present

themselves

Page 8: Data Mining for Business Analydaytics - Week 1 (1)

From data & business to strategy

HURRICANE FRANCES was on its way, barreling across the Caribbean, threatening a direct hit on Florida's Atlantic coast. Residents made for higher ground, but far away, in Bentonville, Ark., executives at Wal-Mart Stores decided that the situation offered a great opportunity for one of their newest data-driven weapons, something that the company calls predictive technology.

A week ahead of the storm's landfall, Linda M. Dillman, Wal-Mart's chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. Backed by the trillions of bytes' worth of shopper history that is stored in Wal-Mart's data warehouse, she felt that the company could "start redicting what's going to happen, instead of waiting for it to happen," as she put it.

From NY Times

Page 9: Data Mining for Business Analydaytics - Week 1 (1)

Big Data Hype?

Gartner Hype Cycle

Page 10: Data Mining for Business Analydaytics - Week 1 (1)

What do we really

care about?

Page 11: Data Mining for Business Analydaytics - Week 1 (1)

Why data mining? Why now?

• Confluence of 4 technical advances• Storage

• Disk densities have been doubling each year

• A $100 disk today has over 1,000,000x more capacity/$ than the disks of 30 years ago

• Networking• Data can be transferred easily between collection, storage, and use eBusiness systems do it as a

matter of course (w/ fewer data errors)

• Algorithms• Advanced algorithms from machine learning, pattern recognition, and applied statistics have

become mature enough for mainstream use

• Computing power• Processing power has been doubling every 1.5 years or so (Moore’s law)

• Laptops are more powerful than the supercomputers of yesteryear

• Note: all four of these are essential for effective, successful data mining

Page 12: Data Mining for Business Analydaytics - Week 1 (1)

What really is Data Mining

A process for using information technology to extract useful (non-trivial, hopefully actionable) knowledge from large bodies of data

• A set of principles, concepts, and techniques that structure thinking and analysis of data

• Extracts useful information and knowledge from large volumes of data by following a process with reasonably well defined steps

• Changes the way you think about data and its role in business

Page 13: Data Mining for Business Analydaytics - Week 1 (1)

Data Opportunities

• Volume of data

• Variety of data

• Powerful computers

• Better algorithms

Page 14: Data Mining for Business Analydaytics - Week 1 (1)

Data Mining Process Outline

• Business Understanding

• Data Understanding

• Data Preparation

• Modeling

• Evaluation

• Deployment

Page 15: Data Mining for Business Analydaytics - Week 1 (1)

Data Mining Process

Page 16: Data Mining for Business Analydaytics - Week 1 (1)

Business data mining is a process

Science Craft CreativityCommon

SenseProcess

Page 17: Data Mining for Business Analydaytics - Week 1 (1)

Mini case: What data might TelCo mine to help

with churn management?

Page 18: Data Mining for Business Analydaytics - Week 1 (1)

Types of Data Mining TasksMany business problems have as an important component one of these data mining tasks:

• Affinity grouping (a.k.a. “associations”, “market-basket analysis”)• What items are commonly purchased together?

• Similarity Matching• What other companies are like our best small business customers?

• Description/Profiling• What does “normal behavior” look like?

• Clustering• Do my customers form natural groups?

• Predictive Modeling (including causal modeling & link prediction)• Will customer X churn next month/default on her loan?

• How much would prospect X spend?

• Who might be good “friends” on our social networking site?

Un

sup

erv

ise

dS

up

erv

ise

d

Page 19: Data Mining for Business Analydaytics - Week 1 (1)

This is NOT a course about…

• Statistics

• Database Querying• SQL

• Data Warehousing

• Regression Analysis• Explanatory vs. Predictive Modeling

Page 20: Data Mining for Business Analydaytics - Week 1 (1)

Data Mining versus…

• Data Warehousing / Storage• Data warehouses coalesce data from across an enterprise, often from

multiple transaction-processing systems

• Querying / Reporting (SQL, Excel, QBE, other GUI-based querying)• Very flexible interface to ask factual questions about data

• No modeling or sophisticated pattern finding

• Most of the cool visualizations

• OLAP – On-line Analytical Processing• OLAP provides easy-to-use GUI to explore large data collections

• Exploration is manual; no modeling

• Dimensions of analysis preprogrammed into OLAP system

Page 21: Data Mining for Business Analydaytics - Week 1 (1)

Data Mining versus…

• Traditional statistical analysis• Mainly based on hypothesis testing or estimation / quantification of

uncertainty

• Should be used to follow-up on data mining’s hypothesis generation

• Automated statistical modeling (e.g., advanced regression)• This is data mining, one type – usually based on linear models

• Massive databases allow non-linear alternatives

Page 22: Data Mining for Business Analydaytics - Week 1 (1)

Answering business questions with these

techniques…

• Who are the most profitable customers?• Database querying

• Is there really a difference between profitable customers and the average customer?

• Statistical hypothesis testing

• But who really are these customers? Can I characterize them?• OLAP (manual search), Data mining (automated pattern finding)

• Will some particular new customer be profitable? How much revenue should I expect this customer to generate?

• Data mining (predictive modeling)

Page 23: Data Mining for Business Analydaytics - Week 1 (1)