Top Banner
More value from data using Data Mining Allan Mitchell SQL Server MVP
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: More value from data using Data Mining Allan Mitchell SQL Server MVP.

More value from data using Data Mining

Allan MitchellSQL Server MVP

Page 2: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Who am I

• SQL Server MVP• SQL Server Consultant• Joint author on Wrox Professional SSIS book• Worked with SQL Server since version 6.5• www.SQLDTS.com and www.SQLIS.com• Partner of SQL Know How

Page 3: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Today’s Schedule

• what is data mining (Overview)• data mining terminology• myths around data mining• excel AddIn to Office2007

– Demo Setup– Demo Key Influencers– Demo Categories– Demo Make a Prediction– Demo “Other stuff” – if time

• Questions and answers

Page 4: More value from data using Data Mining Allan Mitchell SQL Server MVP.

What is Data Mining

• The process of using statistical techniques to discover subtle relationships between data items, and the construction of predictive models based on them. The process is not the same as just using an OLAP tool to find exceptional items. Generally, data mining is a very different and more specialist application than OLAP, and uses different tools from different vendors. Normally the users are different, too. OLAP vendors have had little success with their data mining efforts.

OLAP REPORT

Page 5: More value from data using Data Mining Allan Mitchell SQL Server MVP.

What does Data Mining Do?

Explores Your Data

Finds Patterns

Performs Predictions

Query, Reporting, Analysis Data Mining

What Why

How

Page 6: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Comparative BenefitsPredictive Projects versus Nonpredictive Projects

0%

10%

20%

30%

40%

50%

60%

70%

80%

Technology Productivity Business ProcessEnhancement

Predictive Nonpredictive

Source: IDC, 2003

Page 7: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Data Mining terminology

• mining structure• mining model• mining algorithm• training dataset• testing dataset

Page 8: More value from data using Data Mining Allan Mitchell SQL Server MVP.

SQL Server 2005 Algorithms

Decision Trees Clustering Time Series

Sequence Clustering

Association Naïve Bayes

Neural NetPlus: Linear and Logistic Regression

Page 9: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Sequence Clustering

• Applied to– Click stream analysis– Customer segmentation with

sequence data– Sequence prediction

• Mix of clustering and sequence technologies

• Group individuals based on their profiles including sequence data

Page 10: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Time Series

• Applied to– Forecast sales– Web hits prediction– Stock value estimation

• Patented technique from Microsoft Research

• Uses regression tree technology to describe and predict series values

Page 11: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Clustering• Applied to

– Segmentation: Customer grouping, Mailing campaign

– Also support classification and regression

• Expectation Maximization– Probabilistic Clustering

• K-Means– Distance based

• Clusters both discrete and continuous values– Discrete values are “binarized”

• Anomaly detection• Check variable independence

– “Predict Only” attributes not used for clustering

Page 12: More value from data using Data Mining Allan Mitchell SQL Server MVP.

ClusteringDiscrete

Male Female

Son

Daughter

Parent

Age

Page 13: More value from data using Data Mining Allan Mitchell SQL Server MVP.

ClusteringAnomaly Detection

Male Female

Son

Daughter

Parent

Age

Page 14: More value from data using Data Mining Allan Mitchell SQL Server MVP.

dm data flow

Cube

HistoricalDataset

NewDataset

Data Transform (SSIS)Reporting

Mining Models

ModelBrowsing

Prediction

LOBApplication

Cube

Page 15: More value from data using Data Mining Allan Mitchell SQL Server MVP.

the steps to a successful model

MS BOL

Page 16: More value from data using Data Mining Allan Mitchell SQL Server MVP.

DMX CREATE MINING MODEL CreditRisk

(CustID LONG KEY,

Gender TEXT DISCRETE,

Income LONG CONTINUOUS,

Profession TEXT DISCRETE,

Risk TEXT DISCRETE PREDICT)

USING Microsoft_Decision_Trees

CREATE MINING MODEL CreditRisk

(CustID LONG KEY,

Gender TEXT DISCRETE,

Income LONG CONTINUOUS,

Profession TEXT DISCRETE,

Risk TEXT DISCRETE PREDICT)

USING Microsoft_Decision_Trees

INSERT INTO CreditRisk

(CustId, Gender, Income, Profession, Risk)

Select

CustomerID, Gender, Income, Profession,Risk

From Customers

INSERT INTO CreditRisk

(CustId, Gender, Income, Profession, Risk)

Select

CustomerID, Gender, Income, Profession,Risk

From Customers

Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk)

FROM CreditRisk PREDICTION JOIN NewCustomers

ON CreditRisk.Gender=NewCustomer.Gender

AND CreditRisk.Income=NewCustomer.Income

AND CreditRisk.Profession=NewCustomer.Profession

Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk)

FROM CreditRisk PREDICTION JOIN NewCustomers

ON CreditRisk.Gender=NewCustomer.Gender

AND CreditRisk.Income=NewCustomer.Income

AND CreditRisk.Profession=NewCustomer.Profession

Page 17: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Myths around data mining

• You have to be a propeller head

• It’s a new concept.• Only works with SSAS cubes

Page 18: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Excel 2007 DMAddin

• DM visualisation• table analysis• Create session models/permanent models• Connect to ssas for full blown models• intuitive interface

Page 19: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Demos

• setup• key Influencers• categories• Make a prediction• other sexy stuff

Page 20: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Resources

• Loads to be honest (DMX, API to name two things)

• Big Subject but very sexy

Page 21: More value from data using Data Mining Allan Mitchell SQL Server MVP.

Contact Details

[email protected]