Top Banner
A New Era for Predictive Analytics with SPSS
60
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 05 predictive with spss

A New Era for Predictive Analytics with SPSS

Page 2: 05 predictive with spss

© 2012 IBM Corporation

The Mining Metaphor

2

!●Gold Mining Diamond Mining Data Mining

Page 3: 05 predictive with spss

© 2012 IBM Corporation

What is Data Mining? An early definitionFinding patterns in your data which you can use to do your business better !

– It’s about patterns – It’s about something you can use – practical things – It’s about business

A recent definition▪Business-oriented discovery of patterns across all forms of data

▪Produces insight and a predictive capability

▪Deployment of predictions throughout the enterprise

Page 4: 05 predictive with spss

© 2012 IBM Corporation

What is Data Mining?

4

!Information Retrieval Information Extraction Information Analysis

! + +

Discover new, previously unknown information

Page 5: 05 predictive with spss

© 2012 IBM Corporation

IBM SPSS Supports the Predictive Enterprise Delivering Profitable Revenue Growth & Operational Efficiency

▪Capture a complete perspective –Survey customers & constituents –Leverage structured, semi-structured &

unstructured data

▪Predict behavior and preferences –Statistics for deeper insight –Data & text mining for predictive modeling

▪Act on results –Deploy scoring models for dynamic

decisions –Directly affect business process with event

integration

Page 6: 05 predictive with spss

© 2012 IBM Corporation

IBM SPSS: Our core value propositionSPSS’ goal is to apply analytic to optimize decisions at every contact point, made possible by enabling pervasive, predictive real-time decisions at the point of impact

Page 7: 05 predictive with spss

© 2012 IBM Corporation

▪ SPSS Data Collection – Collecting additional Attitudinal data for advanced

analytics typically collected through surveys !▪ SPSS Statistics

– Expand analytics capabilities to Professional Business User / Statistician

– Add advanced statistical analysis to PM !▪ SPSS Modeler

– Provide predictive analytics using data mining & text mining methods for key parts of the business

– Predict future outcome and understand what influences it. !

▪ SPSS Deployment & Collaboration Services – Analytical asset management across multiple

analysts – Audit, security, refresh – Provide a web service interface !

▪ SPSS Analytic Server – Provide Big Data connectivity to SPSS Modeler – It translate SPSS modeler server requests into

Hadoop jobs !!▪ SPSS Analytical Decision Manager

– Business scenario analysis – Complex Rule for operational decision management !

SPSS Predictive Analytic Platform

Page 8: 05 predictive with spss

© 2012 IBM Corporation

SPSS Modeler 16 Editions

• SPSS Modeler GOLD -Enables organizations to build predictive models to improve business process and help people or systems

make the right decisions each time. It combines and integrates predictive analytics, rules, scoring, and optimization techniques to deliver recommended actions at the point of impact. !

SPSS Modeler Premium + C&DS + Analytical Decision Management !• SPSS Modeler Premium - Offers a range of advanced algorithms and capabilities including text analytics, entity analytics, social network

analysis, and automated modeling and preparation techniques to address a multitude of business problems and analytic requirements on almost any type of data. !

SPSS Modeler Professional + Text Analytics Workbench !• SPSS Modeler Professional -Includes a range of advanced algorithms, data manipulation, and automated modeling and preparation

techniques to build predictive models and uncover hidden patterns in structured data.

Page 9: 05 predictive with spss

© 2012 IBM Corporation

R is gaining in popularity, Do not walk away from R opportunities it's not a competitor

You Ready ?

▪ EMBRACE: Integrate R algorithms (e.g. Random Forest) Generate R charts Use R functions for data preparations Make R available for non-programmers !▪ EXTEND: Scalability (e.g. database pushback) Leverage R engines of other vendors like SAP HANA Enterprise deployment Big Data (Analytic Server)

Powered by

Page 10: 05 predictive with spss

Introducing CRISP-DM Methodology &

SPSS Modeling Techniques

Page 11: 05 predictive with spss

© 2012 IBM Corporation

Modeler Interface

Stream Canvas

Stream, Outputs & Model Manager

Palettes Nodes

Page 12: 05 predictive with spss

© 2012 IBM Corporation

Visual Programming with Modeler

4

-Visual programming -Based on icons ("nodes") -Pick nodes from palette & place them on the bench -Edit their attributes -Connect to specify flow of data ("streams")

Page 13: 05 predictive with spss

© 2012 IBM Corporation5

Can be exported to PMML to be reuse outside of Modeler :

like in Java applications, SAS, IBM Infosphere stream using the DataMining ToolKit, …

Is the Result of a predictive model Generation

Yellow Nugget or Yellow Diamond

Page 14: 05 predictive with spss

© 2012 IBM Corporation

CRoss-Industry Standard Process for Data Mining

2

1. Business Understanding Project objectives and requirements

understanding, Data mining problem definition

2. Data Understanding Initial data collection and familiarization, data

quality problems identification

3. Data Preparation Table, record and attribute selection, data

transformation and cleaning

4. Modeling Modeling techniques selection and application,

Parameters calibration

5. Evaluation Business objectives & issues achievement

evaluation

6. Deployment Result model deployment, Repeatable data

mining process implementationCRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 15: 05 predictive with spss

© 2012 IBM Corporation

2. Data Understanding

4

Initial data collection and familiarization, data quality problems identification

CRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 16: 05 predictive with spss

© 2012 IBM Corporation

Reading Data

5

Modeler reads a variety of different file types, including data stored in spreadsheets and databases, using the nodes within the Sources palette.

Page 17: 05 predictive with spss

© 2012 IBM Corporation

Getting to Know your Data

8

Data Audit Node Distribution Node Histogram Node …

Page 18: 05 predictive with spss

© 2012 IBM Corporation

3. Data Preparation

9

! Table, record and attribute selection, data

transformation and cleaning

CRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 19: 05 predictive with spss

© 2012 IBM Corporation

Data Manipulation in Modeler

10

To prepare the data before analysis: • Eliminate missing values • Remove unwanted fields from analysis • Derive new fields • Merge and match data

Intermediate nodes in Modeler • Record operation nodes • Field operation nodes

!!▪CLEM language is case sensitive

Page 20: 05 predictive with spss

© 2012 IBM Corporation

CLEM language: The Expression Builder

11

Page 21: 05 predictive with spss

© 2012 IBM Corporation

4. Modeling

13

! Modeling techniques selection and application,

Parameters calibration

CRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 22: 05 predictive with spss

© 2012 IBM Corporation

Sampling or Partitioning your Data• May not want to use all records • Score your model with remaining Data • May wish to examine a subgroup separately • May assist us with building a predictive model (oversampling) • Keep in mind that the sampling method must be fit to the problem at hand

!-Similar customers and I want to reduce size of dataset for modelling

then I can use simple sampling. !-But if you want to directly sample from a database with customers of

different types you may want to draw a complex sample. !

Page 23: 05 predictive with spss

© 2012 IBM Corporation

Matching Data to the Modeling Tool

• For example – we want to use Rule Induction...we will need to think about !-How algorithm handles missing data !-Output that is created (binary versus larger splits) !-What are we trying to predict (numeric target or binary?) !-In Which format the input predictors have to be ?

Page 24: 05 predictive with spss

© 2012 IBM Corporation

Modeling Technics in Modeler

• Supervised techniques (Predictive Models) To model an output variable based on the several input variables, to predict future cases where the outcome is unknown

-Neural Networks, Rule Induction (C5.0, CHAID, QUEST & C&RT) -Decision List, Binary Classifier -Linear Regression and Logistic Regression -Generalized Linear Models

• Unsupervised Techniques (Clustering) No field to predict, used to group similar records within the data

-Kohonen Networks, K-Means, Two Step, Anomaly, Discriminant • Association Rules To search for things that typically occur together -APRIORI, CARMA, GRI and SLRM !• Data Reduction:

-PCA/Factor Analysis, Feature Selection • Sequence Detection Models:

-Sequence • Time Series • Text Mining

Page 25: 05 predictive with spss

!SPSS Modeling Techniques

!Association Models

Page 26: 05 predictive with spss

© 2012 IBM Corporation

Association Models!

–Association rules search for things (events, purchases, attributes) that typically occur together in the data !–They find the patterns in data that you could manually find using visualization techniques such as the web node (yikes!) but can do so much faster and can explore more complex patterns. !–Used to answer questions such as:

• Do customers who buy fruit usually buy cheese?

Page 27: 05 predictive with spss

© 2012 IBM Corporation

Output

Page 28: 05 predictive with spss

!SPSS Modeling Techniques

!Segmentation Models

Page 29: 05 predictive with spss

© 2012 IBM Corporation

Segmentation or Clustering Models

!–Clustering techniques segment data into groups of cases/records/customers that have similar patterns of input fields. !–Used in market segmentation studies whose aim it is to find distinct types of customers so they can be targeted more effectively !–Used to answer questions such as:

• How can I group my customer to address the right marketing campaign?

Page 30: 05 predictive with spss

© 2012 IBM Corporation

Clusters Output

Page 31: 05 predictive with spss

!SPSS Modeling Techniques

!Classification & Statistical

Models

Page 32: 05 predictive with spss

© 2012 IBM Corporation

Predictive or Classification Models!

–Algorithms that are used to make predictions or forecasts based on historical data !–Automatic classification allows customers to let the software determine the best one or customers can choose a specific algorithms such as Neural Networks, Logistic Regression, Time Series, etc. !–Used to answer questions such as:

• What predicts whether a customer will leave? • What predicts whether this employee will be a super-star? • How many umbrellas will I sell in the next three months in Chicago?

Page 33: 05 predictive with spss

© 2012 IBM Corporation

Output

Page 34: 05 predictive with spss

© 2012 IBM Corporation

5. Evaluation

54

Business objectives & issues achievement evaluation

CRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 35: 05 predictive with spss

© 2012 IBM Corporation

6. Deployment

55

Result model deployment, Repeatable data mining process implementation

CRoss-Industry Standard Process for - Data Mining( CRISP – DM )

Page 36: 05 predictive with spss

© 2012 IBM Corporation

Deployment Family: Products

▪IBM SPSS Collaboration and Deployment Services – A foundation for managing and

deploying analytics !▪IBM SPSS Analytical Decision Management – Integrates analytics and business

knowledge to deliver optimal outcomes

56

Page 37: 05 predictive with spss

© 2012 IBM Corporation

IBM SPSS Modeler Deployment Options

▪Client (Desktop) –Access local files –Connect to operational databases –Connect to Cognos BI –Processing performed on local installation

!!▪Client/Server

–Data operations/processing on server – In-database data mining –SQL pushback For PureData and Hadoop Platform –Modeler Batch –SuSE Linux Enterprise Server 10 (zLinux) – Inclusion in Smart Analytics System for Power (AIX)

!!!

Page 38: 05 predictive with spss

!!

What’s New & Hot

Page 39: 05 predictive with spss

© 2012 IBM Corporation

Predictive Analytics for Big Data Get more Accurate Models with bigger volume and variety of data

- Read Data from Hadoop !- Write back to Hadoop !- Export your Models to Streams !

- Prepare your Data on Hadoop !- Few Models can run on Hadoop !- R analytic capabilities in SPSS !!!

Page 40: 05 predictive with spss

© 2012 IBM Corporation

Bring Analytics on Big Data for Everyone

Automatic Summarization • Top findings in data ranked by

“interestingness” and association strength • Plain language synopsis !

Automatic Exploration • Guided presentation by selecting fields of

interest • Dynamic Visual Insights • Users can refine auto generated parameters !

Automatic Modeling • Auto selection of best models and detection

of strongest relationships: Decision Tree (CHAID) and Key Driver Reports (based on linear and logistic regression) !

Sharing of Output • Collaboration with peers • Tablet optimization !!

SPSS Analytics Catalyst CR.I.S.P.-D.M. Methology

Page 41: 05 predictive with spss

© 2012 IBM Corporation

Generate simulated data !Fit distributions from existing data !Evaluate the simulation

Example Use Cases: - A retailer wants to simulate alternative

sales scenarios to identify which strategy will make them most likely to hit their targets

!- A parts manufacturer is interested in modeling storage costs based on simulating different scenarios for future part orders against stock supplies and excess order fees !

Monte Carlo Smulation

Page 42: 05 predictive with spss

© 2012 IBM Corporation

Geospatial Data Mining– Understanding Geohashes

▪ Space-time Boxes use geohashes and timestamps to locate where and when entities exist

▪ A geohash is a unique identifier that uses latitude and longitude to create an alphanumeric string

▪ Its precision depends on its length; longer geohash = better precision

▪ For example, geohash dr5ru7 is midtown Manhattan...but how do we know?

Page 43: 05 predictive with spss

© 2012 IBM Corporation

What Exactly is a Space –Time Box?

▪ Space-time Boxes extend geohashes to include a third dimension: time

! !!▪ Space-time Boxes ‘bin’ events in 3-D space and time ▪ Density (i.e. size) of the Space-time Box is a required

input ▪ Can help analysts understand proximity between

entities, verify relationships

dr5ru7|2013-01-01 00:00:00|2013-01-01 00:15:00

Geohash Start timestamp End timestamp

Page 44: 05 predictive with spss

© 2012 IBM Corporation

IBM SPSS Modeler Embraces R1. SPSS Modeler allows the user

to build and score R models within the Modeler interface

2. SPSS Modeler allows the use of R functions for data preparation and chart/output creation

3. The Custom Dialog Builder for R allows the user to create custom nodes that run R algorithms, functions, or outputs

4. These custom nodes can be shared with other users and they do not require the end user to know any R code

Page 45: 05 predictive with spss

© 2012 IBM Corporation

Use R to build a custom node

Page 46: 05 predictive with spss

The world of analytics !made easy for everyone

Bouchra Denis Antoine Danil

Page 47: 05 predictive with spss

I am Sandra, a data analyst.

USER

CODE

Page 48: 05 predictive with spss

Sadly, SPSS Modeler cannot do

EVERYTHING

Page 49: 05 predictive with spss

SPSS Modeler Marketplace App Store for Analytics

Page 50: 05 predictive with spss
Page 51: 05 predictive with spss
Page 52: 05 predictive with spss

Spatial

Plot insightful interactive !maps to explore your data

Visualize new patterns

Page 53: 05 predictive with spss

SpatialSocialSocial

Enhance your client understanding with social data!

Analyse the public opinion!

Page 54: 05 predictive with spss

SpatialSocialDatabases

Connect to noSQL databases!

Connect to Bluemix in 2 clicks!

Connect to bigSQL and Hadoop!

Page 55: 05 predictive with spss

SpatialSocialDatabasesModels

For our Business Partner

Predict which customers will come back and how much they will spend

Implemented in a BI solution for a large retailer and Generate enterprise-grade reporting

Page 56: 05 predictive with spss

SpatialSocialDatabasesModelsAnd many more!

Come to our booth to try them out

More than 30 new functionnalities

Page 57: 05 predictive with spss
Page 58: 05 predictive with spss

Potential growth

A lot of code already available in packages

R is a widely used language

Survey of use

R

IBM SPSS Statistics

Rapid Miner

SAS

Weka

Microsoft SQL Server

Matlab

IBM SPSS Modeler

0 % 18 % 35 % 53 % 70 %

Page 59: 05 predictive with spss

Value

SPSS Modeler Marketplace

SPSS Modeler BRAND

SPSS Modeler USERS

IBM PARTNERS

NODE DEVELOPERS

Page 60: 05 predictive with spss

© 2012 IBM Corporation

Q&A