Copyright © 2012, SAS Institute Inc. All rights reserved. Copyright © 2012, SAS Institute Inc. All rights reserved. TECHNOLOGY STRATEGIES FOR BIG DATA ANALYTICS BERNARD BLAIS PRINCIPAL, GLOBAL TECHNOLOGY PRACTICE
May 10, 2015
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
TECHNOLOGY STRATEGIES FOR BIG DATA ANALYTICS
BERNARD BLAIS PRINCIPAL, GLOBAL TECHNOLOGY PRACTICE
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
VOLUME VARIETY VELOCITY
TODAY THE FUTURE
DA
TA S
IZE
THE CHALLENGE?
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
Technology Checklist for
Big Data Analytics
A flexible architecture that supports many data types and usage patterns
Upstream use of analytics to optimize data relevance
Real-time visualization and advanced analytics to accelerate understanding and action
Collaborative approaches to align Business and IT executives
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
IDENTIFY / FORMULATE
PROBLEM
DATA PREPARATION
DATA EXPLORATION
TRANSFORM & SELECT
BUILD MODEL
VALIDATE MODEL
DEPLOY MODEL
EVALUATE / MONITOR RESULTS
Domain Expert Makes Decisions Evaluates Processes and ROI
BUSINESS MANAGER
Model Validation Model Deployment Model Monitoring Data Preparation
IT SYSTEMS / MANAGEMENT
Data Exploration Data Visualization
DATA SCIENTIST
Exploratory Analysis Descriptive Segmentation Predictive Modeling
DATA MINER / STATISTICIAN
How can you create competitiveadvantage?
THE ANALYTICS LIFECYCLE
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS KEY COMPONENTS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
IDENTIFY / FORMULATE
PROBLEM
DATA PREPARATION
DATA EXPLORATION
TRANSFORM & SELECT
BUILD MODEL
VALIDATE MODEL
DEPLOY MODEL
EVALUATE / MONITOR RESULTS
Domain Expert Makes Decisions Evaluates Processes and ROI
BUSINESS MANAGER
Model Validation Model Deployment Model Monitoring Data Preparation
IT SYSTEMS / MANAGEMENT
Data Exploration Data Visualization
DATA SCIENTIST
Exploratory Analysis Descriptive Segmentation Predictive Modeling
DATA MINER / STATISTICIAN
How can you create competitiveadvantage?
HIGH-PERFORMANCE
ANALYTICS KEY COMPONENTS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS KEY COMPONENTS
DEPLOY FASTER
DECISIONS
PREPARE BIGGER
DATA
DEVELOP BETTER
RESULTS
CORE OPPORTUNITY
In Memory
Grid Computing / In Memory
In Database / In Memory
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS SAS® GRID COMPUTING
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS SAS® IN-DATABASE
Copyright © 2012, SAS Institute Inc. All rights reserved.
1. Acquire 2. Determine Relevance
3. Store
Trash Cache Storage
HOW DO WE MANAGE DATA IN THE PHYSICAL WORLD?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Data Acquisition Data Transformations
Data Normalization
Queries
Systems Users
Relevance is traditionally determined at query time . . .
“Acquire, Store, Analyze”
A Big Data Analytics strategy requires a new approach . . . “Stream it, Score it, Store it”
DATA
Copyright © 2012, SAS Institute Inc. All rights reserved.
HOW DO WE MANAGE INFORMATION IN THE IT WORLD?
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
INFORMATION MANAGEMENT
DECISIONS / ACTIONS / DATA
RAW RELEVANT DATA
LOW COST STORAGE
ENTERPRISE STREAM IT, SCORE IT, STORE IT
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
CUSTOMER CASE STUDY TRADITIONAL ANALYTICS PROCESS
DATA EXPLORATION
MODEL DEVELOPMENT
MODEL DEPLOYMENT
3 HRS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
CUSTOMER CASE STUDY HIGH-PERFORMANCE ANALYTICS PROCESS
12 minutes
Past Approach • Daily process begins
with flat file creation at 6:30am – SLA delivered at ~9:30am.
In-Database Approach • Daily process begins at
4:00am with EDW load.
• File transferred to SQL Server, limited to ~350K customer records based on specific criteria.
• All operational data loaded directly to EDW. No flat file or intermediate processing is needed.
• 300 step process to support data mining life cycle.
30 MINUTES TO SCORE ~350k customers
• 10 step process • Scoring and customer
selection done in-database against ALL customer rows
4 MINUTES TO SCORE ~40M customers
- Scope of customer analysis: 350K vs. 40M - Monthly collections: $1M-$3M per month
Business Value
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGH-PERFORMANCE
ANALYTICS SAS® IN-MEMORY ANALYTICS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
EXPLORATION AND VISUALIZATION IN-MEMORY
ARCHITECTURE
> 1.1 BILLION RECORDS
10 SECONDS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
MODEL DEVELOPMENT & DEPLOYMENT IN-MEMORY
ARCHITECTURE
82 SECONDS
5½ HRS
Copyright © 2012, SAS Institute Inc. All rights reserved.
Billions of Purchase
Transactions
Tailored and Real-time Marketing Campaigns
CUSTOMER CASE STUDY Customer Segmentation
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
CUSTOMER CASE STUDY TRADITIONAL ANALYTICS PROCESS
DATA EXPLORATION
MODEL DEVELOPMENT
MODEL DEPLOYMENT
167 Hours
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
84 SECONDS
DA
TA
EXP
LOR
ATIO
N
MO
DE
L
DE
VE
LO
PM
EN
T
MO
DE
L D
EP
LOY
ME
NT
167 Hours CUSTOMER
CASE STUDY IN-MEMORY ANALYTICS PROCESS
Bottom-line Impact: Tens of Millions of
Dollars
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS HIGH-PEFORMANCE
ANALYTICS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS HIGH-PEFORMANCE
ANALYTICS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS HIGH-PEFORMANCE
ANALYTICS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BEST PRACTICE Business Analytics Maturity Assessment
Overview: Two-day on-site discovery session focused on understanding the client’s business and IT objectives, key initiatives, existing information management and analytics architecture, top challenges, and priorities.
Process: • Review current business requirements, timeframes, critical success factors, and key
business metrics (e.g. customer retention, customer acquisition). • Review operational data sources to support business priorities. • Review analytical priorities, strategy, process, and gaps.
Deliverables: • Technology roadmap to optimize the client’s current and future IT-enabled analytical
process. • Projected high-level ROI analysis resulting from proposed analytical architecture and
process improvements.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
INDUSTRY
COMPANY
USE CASE
VALUE
SAS PROVEN VALUE PROPOSITION ACROSS MULTIPLE INDUSTRIES
FINANCIAL SERVICES
PUBLIC SECTOR TELCO RETAIL SERVICES
Risk Management
Revenue Leakage
Campaign Optimization
Inventory Management
Promotions Management
• 356X faster risk calculations
• Faster in/out markets
• Better able to audit
• Detect issues pre-refund
• 15% better campaign response rates
• Markdown optimization – from 30 hours to 2 hours
• More precise than competition
• Coupon redemption rate +15%
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USE CASE In-database Model Scoring
Overview: The largest customer behavior marketing company in the world, Catalina Marketing analyzes and
predicts shoppers’ buying behaviors to generate customized point-of-sale color coupons, advertisements and informational messages for retail stores and pharmacies nationwide.
Process and Deliverables: Leveraging In-database scoring, automated the execution of scoring models against their entire
140 million consumer database;
Impact: Catalina Marketing has reduced its model-scoring times from 4.5 hours to around 60 seconds
using SAS Scoring Accelerator. As a result, it is able to use more complex, varied models to obtain analytical results faster for more efficient, reliable decisions -- improving brand performance on behalf of its food, drug, and mass advertising and marketing partners.
Implementation of marketing campaigns in days vs. more than 1 month before.
60 SECONDS
4½ HRS
RETAIL
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USE CASE Credit Risk on Banking Data
Overview: Data Source: Bank loan portfolio covering: 3 million loans; 5,000 stress scenarios; 40 time horizons; Transition matrix approach
Process and Deliverables: Estimates of credit losses under stress over multiple horizons. Completed compute time: under 3 minutes.
Impact: Fast estimates of credit losses under stress over multiple horizons,
enables the Bank to make changes to lending practices throughout the day
3 MINUTES
FINANCIALSERVICES
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USE CASE Text Mining on Unstructured Data
Overview: USA’s National Highway Traffic Safety Administration
700,000 accident reports on Vehicles make and models, manufacturing date, purchase date, failures, mileage, number of cylinders, etc… Car components, Accidents information, etc
Process and Deliverables: Text Mining on accident reports. Analyze, Understand, Validate and Predict contents.
Report on content categorization. Text mining process runs in 1 minute 22 second on a High Performance Analytics Server, instead of in 5 ½ hours on a regular server.
Impact: 99% time improvement means the whole process can now be considered an ITERATIVE,
DYNNAMIC process
Analyst can run it 20 times before lunch, each time fine-tuning the model and improving the output, instead of maybe twice during the whole week.
82 SECONDS
5½ HRS
PUBLIC SECTOR
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USE CASE Forecasting On Smart Meter Data
Overview: Oklahoma Gas & Electric Company (OG&E) serves nearly 800,000 customers in
Oklahoma and western Arkansas. It was named the 2011 Utility of the Year.
Forecast energy demand with SAS Analytics, plan for future changes to its energy portfolio and optimize programs that encourage wiser use of energy.
Process and Deliverables: Use smart meter data coming from customers every 15 minutes (versus once a month) to
create and measure the effectiveness of programs that reduce energy consumption.
Impact: What previously took one to three days can now be done in a matter of hours.
We've gone from receiving 12 records for each customer to over 30,000 records per year.
30,000 records
12 records
UTILITIES
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
CONCLUSION What High Performance Analytics Really Mean
It’s not just about incredible speed, it’s also about:
Confidence: No more sampling, subsetting, summarizing
Accuracy: More complex models, more variables
Efficiency: Leverage the Analytical Brain on valuable tasks
Agility: Adapt and (re)Act faster
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
Copyright © 2012, SAS Institute Inc. All rights reserved.
Technology Checklist for
Big Data Analytics
A flexible architecture that supports many data types and usage patterns
Upstream use of analytics to optimize data relevance
Real-time visualization and advanced analytics to accelerate understanding and action
Collaborative approaches to align Business and IT executives