Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SANNTIDS- OG BIG DATA ANALYTICS
EN HYPE ELLER REALITET?
JONAS LIE-NIELSEN, PRINCIPAL SOLUTION ARCHITECT, NORDIC ENTERPRISE ANALYTICS COE
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
"Big data is what happened when
the cost of storing information
became less than the cost of making
the decision to throw it away.”George Dyson
BIG DATA THE ERA OF ABUNDANCE
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
Discovery-centric
Everything is
permitted unless it is
forbidden
Focus on value
Technology empowered
TWO ARAS ….TWO MINDSETS
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
THE CHANGE FROM SCARCITY TO ABUNDANT THINKING
scarcity
cost
scarcity
cost
abundant
value
abundant
value
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA INFORMATION SOURCES
Source: Gartner (September 2013)
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA BUSINESS USE CASES
Source: Hortonworks
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
HADOOP
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP? DICTIONARY DEFINITION
“Hadoop is one way of using a set of cheap
computers to store an enormous amount of data
and then to process that data in parallel."
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP?AS A DATA PLATFORM LOWER STORAGE COSTS ARE
MUCH LOWER…
$0,00
$2 000 000,00
$4 000 000,00
$6 000 000,00
$8 000 000,00
$10 000 000,00
$12 000 000,00
$14 000 000,00
$16 000 000,00
$18 000 000,00
1 10 100 1000
Tota
l Co
st
Number of Gigabytes
Hadoop
Teradata Warehouse Appliance
Oracle Exadata
IBM Netezza
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP? MAKING HADOOP EASY AND ENTERPRISE READY…
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP? BUZZWORDS: MAP AND REDUCE – PROCESSING DATA!
• Computing Power of
lots of small servers
• Standard Processing
Approach
• Custom Coding to
exploit the
environment
• Designed for batch
processing
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA ADOPTION RATES
The adoption of Hadoop has started
• Biggest adoption of Apache hadoop (pure opensouce)
• The pureplayers (Claudera, Hortenworks, MapR) are coming fast
• Low current adoption, Claudera has currelty 350 customer
• Fast growing, Hortonworks has added 250 new customers the last 5 quarters
• The big ones straigh after with IBM in front
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHY SHOULD YOU
CARE?
SOME OF THE ORGANISATIONS THAT PUBLICLY STATE
USE OF HADOOP
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
HOW DOES SAS DEAL WITH THIS?
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA HOW TO PROCESS IT?
BIG DATA FOCUS IS SHIFTING TO STREAMING DATA ANALYSIS
FOR LOW LATENCY DECISION MAKING
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS DEPLOYMENT PATTERNS
Teradata
SAS
Data
In-Database
Teradata
SAS
Traditional SAS
Even with In-Database processing there will still be some
work performed on the SAS server
Teradata
SAS
Data
Memory
Data
Even with In-Memory processing there will still be some
work performed on the SAS server
67xx, 27xx Teradata 720
In-Memory
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS HIGH PERFORMANCE ANALYTICS SOLUTIONS
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA WHY HIGH PERFORMANCE ANALYTICS?
We had one customer who was spending about five and a
half hours building an attribution model. With high-
performance data mining, they’re now building it in about
three minutes. Plus, we were able to get a factor of about
two times more lift, meaning millions of dollars for the
customer in terms of return on investment.”
~Wayne Thompson Chief Data Scientist, SAS
“Since in-memory processing is so fast, the time to
process advanced analytics on big data is reduced. This
frees up more time to actually think differently, experiment
with different approaches, fine-tune your champion model,
and eventually increase predictive power
~ Large data Big data
Data volume
Analytic models
based on samples
Omni channel
Marketing optimization
Full scale high frequency
analytic models
Analytics
Meduim
computation
Heavy
computation
Social media analytics
Sensor/log analytics
Real time analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
Advanced analytics
Forcasting
Data management
Data sources
SAS HIGH LEVEL ARCHITECTURE
SAS Compute Grid
DWH
In - database
SAP BW
Other
sources
Xml &
files
Data streamsEvent Streaming
Engine
Real time decision engine
In-Memory Analytics Engine(s)
High
performance
analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS IN-DATABASE
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
THE VALUE OF INDB PERFORMANCE
DS2IP_Hive Demo SlidesFrom:
0
20
40
60
80
100
120
140
160
180
200
0 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0
TIM
E (S
)
NUMBER OF ROWS
INITIAL PERFORMANCE NUMBERS
Hadoop INDB Not INDB Greenplum INDB
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
THE GOAL OF EP RUN TK IN THE DATABASE
TK runs in Database (New)TK runs on Client (Old)
Database
SAS Server
SAS Procs
Data Data Data
TK
Database
SAS Server
SAS Procs
Data Data Data
Database
Process
Database
Process
EP
TK
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS AND HADOOP
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS FOR HADOOP TECHNOLOGY DIRECTION
SAS
Hive
SAS/Access to Hadoop - Extract
data from Hadoop into SAS
Embedded Process - Push
some SAS processing to
Hadoop with Map Reduce
SAS
Score A Code AImpala
In-Memory Analytics - Use
Hadoop for Storage persistence
and commodity computing.
SAS
HPA LASR
Some inspiration: https://www.youtube.com/watch?v=J3b8nMUMo4Y
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS FOR HADOOP SAS / ACCESS
SAS/Access to Hadoop
SAS/Access to Cloudera Impala
SAS/Access to Hadoop – Push some SAS processing to Hadoop via Hive QL 1
HADOOP
Data
SAS/ACCESS
Hadoop
SASSERVER
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS/ Embedded Process – Push SAS data management processing to Hadoop with Map Reduce 2
SAS FOR HADOOP SAS / EMBEDDED PROCESS
SAS/Scoring Accelerator for Hadoop
SAS/Code Accelerator for Hadoop (July 2014)
SAS/Data Quality Accelerator for Hadoop (July 2014)
proc ds2 ;
/* thread ~ eqiv to a mapper */
thread map_program;
method run(); set dbmslib.intab;
/* program statements */
end; endthread; run;
/* program wrapper */
data hdf.data_reduced;
dcl thread map_program map_pgm; method run();
set from map_pgm threads=N;
/* reduce steps */ end; enddata;
run; quit;
HADOOP
SAS Data Step and DS2 Jobs
SASSERVER
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS HADOOP
ARCHITECTURE IN-MEMORY SOLUTION
SAS®
LASR ANALYTIC SERVER
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
HADOOPWEB CLIENTS APPLICATIONS
In-Memory Analytics – Process in Memory, use Hadoop for Storage persistence and commodity computing
4 SAS ANALYTIC HADOOP ENVIRONMENT
Visual Analytics
Visual Statistics
Visual Scenario Designer
In-Memory Statistics
Visual Data Builder
Str
eam
ing
(ES
P)
Dis
trib
ute
d
(EP
/SQ
OO
P)
Query
(SQ
L/F
TP
etc
…)
ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
fWeb and
Social
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS TYPICAL SOLUTION (ANALYTICS FOCUSED)
Hadoop
Data sources
cdr
Probes &
NE
counters
alarms
Social
data
Unstruct
ured
data
High Volume
Incremental,
Streaming &
batch
Data lake
Analytical tables
DQ gate
SAS analytics
DWHIn - database
Deploy
Data Integration
Operational
system
In - database
Deploy
In-Memory Analytics Engine(s)
DeployRead
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS® Studio
WEB-BASED CLIENT
MPP DATASTOREBLADE ENVIRONMENT
HIGH LEVEL
ARCHITECTURE
DISTRIBUTED DEPLOYMENT ON COMMODITY HARDWARE
(DEDICATED RACK)
IN-MEMORY STORE
SAS® LASR™
ANALYTIC SERVER
SAS®
IN-MEMORY STATISTICS FOR HADOOP
Not part of
IMSTAT
Can be separated
HADOOP
SAS Embedded Process
WORKSPACE SERVER
MID-TIER
METADATASERVER (Optional)
OtherRDBMS Nonrelational Click Stream PC Files
Hadoop
(Cloudera,
Hortonworks)
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS IMSTAT
FEATURE SUMMARY
www.sas.com/hpa
Data
Management• APPEND
• BALANCE
• TABLEINFO
• COLUMINFO
• COMPUTE
• DELETEROWS
• DROPTABLE
• FETCH
• PARTITION
• PROMOTE
• PURGETEMPTABLES
• SCHEMA
• SCORE
• SET
• TABLE
• UPDATE
Data Exploration• DISTINCT
• BOXPLOT
• CORR
• CROSSTAB
• DISTRIBUTIONINFO
• FREQUENCY
• HISTOGRAM
• KDE
• MDSUMMARY
• PERCENTILE
• SUMMARY
• TOPK
• GROUPBY
Descriptive Modeling• CLUSTER
• ARM
Deployment• SCORE
DATA
MANAGEMENT &
EXPLORATION
MODEL
DEVELOPMENT
MODEL
DEPLOYMENT
ANALYTICAL
LIFECYCLE
Miscellaneous• EXTERNAL (C API)
• FREE
• REPLAY
• SAVE
• STORE
Text Analytics• Parsing
• SVD
• Topic generation
• Document projection
• Sentiment analysis
Recommender• CLUSTER
• KNN
• ARM (Rule mining)
• SVD
• ENSEMBLE methods
Predictive Modeling• ASSESS
• DECISIONTREE
• FORECAST
• GENMODEL
• GLM
• LOGISTIC
• OPTIMIZE
• RANDOMWOODS
• Regression Trees
IMXFERsasiola
sashdat
Anyfile Reader
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS ESP
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
Advanced analytics
Forcasting
Data management
Data sources
SAS HIGH LEVEL ARCHITECTURE
SAS Compute Grid
DWH
In - database
SAP BW
Other
sources
Xml &
files
Data streamsEvent Streaming
Engine
Real time decision engine
In-Memory Analytics Engine(s)
High
performance
analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
ANALYTICS AND INSIGHTS ON STREAMING DATA
TECHNOLOGY
INTEGRATION
• SAS® Event Stream Processing Engine is integrated
into some SAS Solutions and can be deployed at the
front-end of most others
• Complements batch and real-time capabilities of SAS
solutions with streaming data analysis
STREAMING
ANALYTICS
• Enables SAS analytic solutions to process streaming
events.
• Leverages analytical model results to provide real time
insights and action on streaming data
• Enabled the deployment of additive and incremental
analytic models on streaming data
STREAMING
ANALYTICS
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS® EVENT STREAM
PROCESSING ENGINE
3 KEY CHARACTERISTICS
TECHNOLOGY
The SAS® ESP Engine provides the architecture to process streams
of data and business events, on the move, prior to storage, when
events happen
SPEED
The SAS® ESP Engine can process huge volumes of streaming data
flowing at very high rates (Millions of events/sec) with very short
latency (<1 millisecond)
ACTIONABLE
INTELLIGENCE
The SAS® ESP Engine filters/aggregates/correlates the stream to
focus and detect specific events, patterns or characteristics, that will
help the business
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS EVENT STREAM
PROCESSINGDATAFLOW CENTRIC
SAS EVENT STREAM PROCESSING ENGINE
DATA IN (Events)
DATA OUT(Events)
Event
Stream
Event
Stream
Event
StreamEvent
Stream
Event
Stream
Event
Stream
Event
Stream
Design of the rule model (called “Continuous Query”)
using components (called “Windows”)
Event
Stream
DATA IN (Events)
DATA IN (Events)
DATA OUT(Events)
SOURCE1
WINDOW
SOURCE2
WINDOW
SOURCE3
WINDOW
FILTER
WINDOW
CALCULATIONS
WINDOW
JOIN
WINDOW
JOIN
WINDOW
CALCULATIONS
WINDOW
THRESHOLD
WINDOW
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA TARGET ARCHITECTURE
Data &
Decision
Management
Hadoop Data store
In Memory Data store
Streaming data
dwh
Monitoring &
Reporting
transa
ction
transactions
transactions
datastream
Datastream
Data store Analytical tables
batc
h
Realtime
Event
stream
processing
Data
exploration &
visualization
Real time data stream
ale
rts
Probes &
sensors
alarms
Statistical
exploration &
Modelling
High performance analytics
scoringFore-
casting Modelling
Statistical
programming
Modelling
• Monitoring
• Operation center
• SMS
DQ
gate
DQ
gate
Web Services
alarms