Top Banner
© Hortonworks Inc. 2013 Modern Data Architecture …for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Page 1
28

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Jan 26, 2015

Download

Technology

Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.

Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.

This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop

Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Modern Data Architecture…for Predictive AnalyticsDavid Smith VP Marketing and Community - Revolution Analytics

John KreisaVP Strategic Marketing- Hortonworks

Page 1

Page 2: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Your Presenters

• David Smith (@revodavid)–VP Marketing and Community at Revolution

Analytics–Data Scientist, Blogger and co-author of An

Introduction to R

• John Kreisa (@marked_man)–VP Strategic Marketing, Hortonworks–Over 20 years in data management as a

developer and a marketer–Avid camper

Page 2

Page 3: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Today’s Topics

• Introduction• Drivers for the Modern Data Architecture (MDA)• Apache Hadoop in the MDA• R’s role in the MDA• Q&A

Page 3

Page 4: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Poll #1: What stage are you at looking in Hadoop?

•Research

•Evaluation

•Trial

•Haven’t started research

Page 4

Page 5: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Existing Data Architecture

Page 5

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

PackagedApplications

Page 6: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Existing Data Architecture

Page 6

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Business Analytics

Custom Applications

PackagedApplications

Source: IDC

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

Page 7: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Modern Data Architecture Enabled

Page 7

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

PackagedApplications

Page 8: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Hadoop Powers Modern Data Architecture

Page 8

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.

Hadoop Cluster

compute&

storage. . .

. . .

. .compute

&storage

.

.

Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware

Page 9: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Driving Efficiency Driving Opportunity

Drivers for Hadoop Adoption

Modern Data ArchitectureHadoop has a central role in next

generation data architectures while integrating with existing data systems

Business ApplicationsUse Hadoop to extract insights that enable new customer value and competitive edge

ExistingTraditionalServer log

Clickstream

Big Data SetsEmerging

Sentiment/SocialMachine/SensorGeo-locations

Page 10: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Opportunity in types of data

1. SentimentUnderstand how your customers feel about your brand and products – right now

2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website

3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines

4. GeographicAnalyze location-based data to manage operations where they occur

5. Server LogsResearch logs to diagnose process failures and prevent security breaches

6. Unstructured (txt, video, pictures, etc..)Understand patterns in files across millions of web pages, emails, and documents

Value

Page 10

Page 11: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Efficiency in the Modern Data Architecture

Page 11

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

Business Analytics

Custom Applications

PackagedApplications

• Drive efficiency via modern data architecture

• Store data once and access it in many ways

• Often referred to a data lake or data repository

• Infrastructure platform driven

• IT-oriented, TCO based

Page 12: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Engineered for Interoperability

Page 12

APPL

ICAT

ION

SDA

TA S

YSTE

MSO

URC

ES

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

HANA

BusinessObjects BI

OPERATIONAL TOOLS

DEV & DATA TOOLS

Existing Sources (CRM, ERP, Clickstream, Logs)

INFRASTRUCTURE

Page 13: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

IntegratedInteroperable with existing data center investments Skills

Leverage your existing skills: development, operations, analytics

Requirements for Hadoop Adoption

Page 13

Key ServicesPlatform, operational and data services essential for the enterprise

3Requirements for Hadoop’s Role in the Modern Data Architecture

Page 14: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013 - Confidential

Revolution R Enterprise Architecture

Page 14

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

PackagedApplications

= Revolution R Enterprise

Page 15: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Today’s Topics

• Introduction• Drivers for the Modern Data Architecture (MDA)• Apache Hadoop’s role in the MDA• R’s role in the MDA• Q&A

Page 15

Page 16: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Poll #2: Which of the following best describes your use of R and Hadoop?

•We have R+ Hadoop in Production

•We have testing R+ Hadoop

•We have started to investigate but nothing is implemented

•No current plansPage 16

Page 17: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution ConfidentialWhat is the Open Source R Project?

The R Language: Object-Oriented Language for Stats, Math and Data Science Comprehensive data visualization and statistical modeling capabilities

The R Community: 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical

Analysis and Machine Learning Projects New graduates with data skills learn R

The R Ecosystem: 5000+ Freely Available Algorithms in CRAN Specialized methods for finance, economics, genomics, linguistics,

and every data-driven domain

17

Page 18: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution Confidential

R is open source and drives analytic innovation but has some limitations for Enterprises

Bigger data sizes

Speed of analysis

Production support

Memory Bound Big Data

Single ThreadedScale out, parallel processing, high speed

Community SupportCommercial production support

Innovation and scale

Innovative5000+ packages Exponential growth

Combines with open source R packages where needed

Page 19: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution ConfidentialRevolution R Enterprise

19

Enterprise-Ready

Revolution R Enterprise is the only commercial big data analytics platform

based on open source R statistical computing language

Cross-Platform

Big Data Analytics

High Performance Analytics

Easier Build & Deploy

Page 20: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Modern Data ArchitectureExtract and Analyze

Ad-hoc Data Distillation Exploratory Data Analysis / Data Visualization Model Development

AMBARI

MAPREDUCE

YARN

HDFS REST

DATA REFINEMENT

HIVEPIG CUSTOM

HTTP

STREAM

LOAD

SQOOP

FLUME

WebHDFS

NFS

STRUCTURE

HCATALOG (metadata services)

Query/Visualization/ Reporting/Analytical

Tools and Apps

SOURCE DATA

- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory

DBs

JMSQueue’s

FilesFilesFiles

LOAD

SQOOP/Hive

Web HDFS

Data Sources

CSV

DATABASES

INTERACTIVE

HIVE Server2

Analytical ToolsANALYTICAL

rHadoop

Page 21: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution ConfidentialThe Data Scientist’s Big Data Toolkit

21

Statistical Tests

Machine Learning

Simulation

Descriptive Statistics

Data Visualization

R Data Step

Predictive Models

Sampling

Page 22: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Parallel External-Memory Algorithms

22

CPU

CPU

CPU

SMP SERVER

Page 23: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Parallel External-Memory Algorithms

23

HADOOP NODE

HADOOP NODE

HADOOP NODE

HADOOP CLUSTER

Page 24: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution Confidential

Modern Data Architecture with RRE7In-Hadoop Predictive Analytics Production Data Distillation (e.g. Semantic Analysis) Production Model Processing / Re-Estimation Production Model Scoring

AMBARI

MAPREDUCE

YARN

HDFS REST

DATA REFINEMENT

HIVEPIG CUSTOM

DISTILLED DATA FILES

HTTP

STREAM

LOAD

SQOOP

FLUME

WebHDFS

NFS

STRUCTURE

HCATALOG (metadata services)

Query/Visualization/ Reporting/Analytical

Tools and Apps

SOURCE DATA

- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory

DBs

JMSQueue’s

FilesFilesFiles

LOAD

SQOOP/Hive

Web HDFS

Data Sources

CSV

DATABASES

INTERACTIVE

HIVE Server2

Analytical ToolsANALYTICAL

Revolution R Enterprise

Page 25: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Revolution ConfidentialHadoop As An R Engine

Use Revolution R Enterprise

PEMAs in Hadoop No need to change existing R code

Simple R programming No need to “Think In MapReduce”

Eliminate data movement to

slash latencies

Use Hadoop nodes as parallel R

computation engines

25

Hadoop

Page 26: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

IntegratedInteroperable with existing data center investments Skills

Leverage your existing skills: development, operations, analytics

Requirements for Hadoop Adoption

Page 26

Key ServicesPlatform, operational and data services essential for the enterprise

3Requirements for Hadoop’s Role in the Modern Data Architecture

Page 27: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Poll #3: Which of the following would you most like to accomplish with R + Hadoop?

•Build a model to be put in product in Hadoop

•Build a model to be put in product elsewhere

•Create new data from Hadoop to supplement an existing analytics process

•Something else

Page 27

Page 28: The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

© Hortonworks Inc. 2013

Next Steps:

Page 28

More about Revolution Analytics and Hadoophttp://www.revolutionanalytics.com/products/r-for-hadoop.php

Get started on Hadoop with Hortonworks Sandboxhttp://hortonworks.com/sandbox

Follow us:@hortonworks@RevolutionR