Top Banner
SAP HANA, HADOOP and other Big Data Tools
29

HANA, HADOOP - · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Feb 11, 2018

Download

Documents

lydung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

SAPHANA, HADOOPand other Big Data Tools

Page 2: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Big Data: Why now?

of Top 500 enterprises willFail to exploit Big Data2

85%

of enterprises have no formalconcept for data management5

>30%

digital data globally doublesevery two years1

x2 90%of all data is unstructured and

cannot be handled with traditionalanalytics tools1

10-50%cost reduction in production

through Big Data exploitation4

of all IT invest 2015 will beBig Data driven2

70%

1 IDC Predictions 2012 , 2 Gartner, Predicts 2012.

4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity, 5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset

Page 3: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Mobile database In-memory database

Databaseappliances Cloud

databaseRelational OLTP

Objectdatabase

Graphdatabase

Documentdatabase

Key-value

Traditional EDW Column-store EDW MPP EDW

Enterprise data warehouse

NoSQL (nonrelational)

RelationalScale-outrelational

Traditional data sources New data sourcesCRM ERP Legacy apps Public data Sensors Marketplace

Social media Geo-locationSource: Forrester Research, Inc.

The BI Ecosystem according to Forrester

Page 4: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Cost of a Terabyte of Enterprise Disk Storage• 1990 – in the region of USD 9 million

• 2013 – in the region of USD 100

Cost of a Terabyte of RAM• 1990 – in the region of USD 106 million

• 2013 – in the region of USD 500

• i.e. over the last 20 years the price ratio of Memory to Storage has dropped from 1:12 to 1:5

• But in real terms the drop in price is 200 000 times

Performance Comparison of Memory to Disk Read• Enterprise Disk – between 4 and 13 million nanoseconds

• Memory – between 0.4 and 40 nanoseconds

• i.e. between 150 000 and 1 million times faster when already in memory

The facts behind in-memory

Page 5: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Positioning Big Data TechnologiesNovember 2013

Approaching and beyond mainstream adoption

Hadoop SQL Interfaces

Hadoop Distribution

In-memory Analytics

Page 6: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Big Data tools complement existing BI investmentThey do not replace them - Yet

Existing data sources

Business Intelligence Tools and analytical applications

TransactionalOLTP DBMS

BusinessApplications

ERP, CRM, etc.

DataWarehouse

Data MartCube

Appliance

Reporting Dashboard OLAP Data & Text Mining

Data integration ETL

Page 7: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Big Data tools complement existing BI investmentThey do not replace them - Yet

Hadoop,NoSQL,

Log-Data

In-MemoryDatabase

Static data Flowing data

Real-time dataprocessing andanalysis

Complex eventprocessing

Structured andunstructured data

New data sources

OperationalIntelligence

PredictiveAnalytics

Existing data sources

Business Intelligence Tools and analytical applications

TransactionalOLTP DBMS

BusinessApplications

ERP, CRM, etc.

DataWarehouse

Data MartCube

Appliance

Reporting Dashboard OLAP Data & Text Mining

Data integration ETL

Page 8: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

The 3 V’s of Big Data

BusinessProblem

TechnologySolution

Backward-lookinganalysisUsing data out ofbusiness applications

SAP HANA Cloudera HadoopHortonworks Hadoop

StructuredLimited (2 – 3 TB in RAM)

StructuredLimited (1 PB in RAM)

Structured or unstructuredQuasi unlimited(20 – 30 PB)

Legacy BI High performance BI „Hadoop“ Ecosystem

Selected Vendors

Data Type/Scalability

SAP Business ObjectsIBM CognosMicroStrategy

Quasi-real-time,In-memory analysisUsing data out ofbusiness applicationsComplex EventProcessing

Batch, Forward-lookingpredictive analysisQuestions defined in themoment, using datafrom many sources

Page 9: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

HADOOP vs In-Memory analytics

How fastdo you want your

delivery made?

Whatis being delivered?

How muchdo you want to spend?

Do you havespecialist drivers?

? $ +

Page 10: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

HADOOP vs In-Memory analytics

Hadoop (with Impala)

MPV

Good performanceCapacity

Easy to driveAffordable

Hadoop (without Impala)

Long Haul Trucks

Excellent CapacityDrives overnight

Moderate performance

Needs a specialist driver’s license

IMA

Ferrari

SexyVery fast

Limited luggage space

Page 11: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

HADOOP vs In-Memory analyticsSome Hadoop improvements

Cloudera’s Hadoop offeringswhen you buy the Trucks they throw in the MPV's for free

Hadoop becomes easier and easier to useWith the ecosystem of contributors and distributionse.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative

Hadoop 2.0brings YARN, Graph Analysis and Stream Processing

The speed of improvements in HDFS/HBase/Hive/YarnThe gap between batch and real-time/low-latency is going to be cut fairly soone.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x

Page 12: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Use case segmentation drives solution design andtechnology selection

Real-time Reporting of SAP OLTP data, including joinsand data transformations

Summarise Unstructured DATA LOGS (scheduled)

Realtime reporting of Summarised Data Logs, with Joinsto other NON OLTP Data

Near Realtime reporting of Social Media Data

Realtime reporting of recent OLTP data joined withrecent Social Media Data

Image Analysis Processing (scheduled)

Image Analysis Reporting

Predictive Analysis Reporting (comparing OLTP & NONOLTP DATA)

SAP HANA

HADOOP MAP/REDUCE

IMPALA

IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)

HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Dataand load into HANA)

HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video filesand stores results in a structured file)

IMPALA (to report on results file)

HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicableHistoric or relevant Non OLTP Data to HANA)

USE CASE POTENTIAL TOOL

Page 13: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

The NEW Real time analytics with SAP HANA &HadoopIntegrate and federate

non-SAP

SAP

In-Memory

HadoopMapReduce/Batch C

Computing engine

SAP HANA

Hadoop

UI/Front end analytics

SAPERP/DW

Sybase ASE & IQ

3rd party DBMS

Sybase ESP

SAPLIVE & UI Analytics

Mobile & EmbeddedApplications

non-SAP BI

SLTDXCETL

SmartAccess

SAP DS

SmartAccess

Page 14: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Learning some of the language of Big Data

Jaspersoft

KarmasphereStudio

Talend Pentaho

Continuity

NoSQLMongoDB

Cassandra

CouchDB

Redis Riak

Neo4j

Platfora

Tableau

Splunk

Shep

Hadoop

MapReduce

ZooKeeper

Avro

Nutch

HDFS

Matlab

R

Python JRuby

Ruby

Java

C++

Kafka

InfoChimps

Skytree

GreenPlum

Aster

GoPivotal

Hive Pig

Hbase

Chukwa

Yarn

Page 15: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

The other Big Data toolsOnce you have a data store and a means of accessing the data.

OperationalIntelligence

Platform

Video search, audiosearch and content

analytics

Text search Graphdatabases

Complex eventprocessing

In-memorydata grid

Speechrecognition

Patternrecognition

Page 16: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Some new roles in data/analyticsThe coming of age of data in the enterprise

The DataScientist

The ChiefData Officer

Data Explorer CampaignExpert

Data SecurityOfficer

Business SolutionArchitect/ Domain

Expert

Data Hygienist/Data Steward

Big Data talent gapexpected until 2018

50%

Page 17: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

external online sources

FacebookTwitterLinkedInGoogle+YouTube

TomTomMarketWatchFinancial TimesBloomberg

the information-driven Transport &logistics & Retail provider

new customer base

FinancialIndustry

PublicAuthorities

MarketResearch

SME Retail

commercial dataservices

Adress VerificationMarket IntelligenceSupply Chain MonitoringEnvironmental Statistics

MarketingAnd Sales

ProductManagement

Operations

NewBusiness

Order volume,received service quality

Customer sentiment and feedback

Location, Destination,Availability

Network flow data

Network flow data

Real-timeincidents

MarketandCustomerIntelligence

Location, traffic density,directions, delivery sequence

Continuoussensor data

existing customer base

High-Tech / Pharma

Manufacturing / FMCG

Commerce Sector

Households / SME

real-time route optimizationDelivery Routes are dynamicallycalculated based on deliverysequence, traffic conditions andrecipient status.

1

2 consolidated pickupand deliveryCarriers of multiple existing fleets are leveragedto pick up or deliver shipments along routes theywould take anyway.

3

strategic network planningLong-term demand forecasts fortransport capacity are generatedin order to support strategicinvestments into the network.

4operational capacity planningShort- and mid-term capacity planning allowsoptimal utilization and scaling of manpower andresources.

5

customer loyalty managementPublic customer information is mappedagainst business parameters in order topredict churn and initiate countermeasures.

6

service improvementand product innovationA comprehensive view on customerrequirements and service quality is used toenhance the product portfolio.

7

risk evaluation andresilience PlanningBy tracking and predicting events that lead tosupply chain disruptions, the resilience level oftransport services is increased manpower andresources.

8market intelligence for smeSupply chain monitoring data is used to createmarket intelligence reports for small andmedium-sized companies.

9

financial demand andsupply chain analyticsA micro-economic view is createdon global supply chain data that helpsfinancial institutions improve theirrating and investment decisions.

10address verificationFleet personnel verifies recipient addresses which aretransmitted to a central address verification serviceprovided to retailers and marketing agencies.

11

environmental intelligenceSensors attached to delivery vehicles producefine-meshed statistics on pollution, trafficdensity, noise, parking spot utilization etc.

Predictive analytics for transport, logistics & retail

Page 18: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

smartPORT logisticsdeveloped by T-Systems, Deutsche TelekomInnovation Laboratories,SAP Research and Hamburg Port Authority

Only location-basedinformation sent to driver, thanksto geo-fencing

Precise communicationsthanks to real-time data andsmart devices

Stakeholder integrationIncl. port authority, forwarding agents, terminal and parkinglot operators, plus others as required (sea shippingcompanies etc.)

5-10 minutes saved per tourmeans one more pick-up per day

Portal provides transparencyfor all stakeholders, with role-based access

Cloud solutioncollects all relevant real-time information inone place

Greater Efficiency for truck and container movementsThe right information, in the right place, in time, predictable

Page 19: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

100 %compliance withlegal requirements

Up to 20 %lower costs 1)

Fulltransparency

Up to 20 %reduction in HR coststhanks to automation

Seamlessdata flow

Rapidreactions

Patient controlled data distribution

VOLUME VELOCITYVARIETYVALUE

VOLUME VELOCITYVARIETYVALUE

IntegrationConsolidationOptimization

Processing & integratingsmart data management

Factor of 5.8:Potential growthby 2015 2)

Secured connection forerror-free data transfer

Optimizationand automationof processes

Pinpointingguzzlers

Intelligent managementof medical care

Managementof Devices

Immediate availabilityof patient and poc data

Physicians, Specialists,Family Doctors

Insurance

Hospitals & Pharma

Health care & Pharmagrids got smartTransparency enhanced with predictive analytics

Page 20: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Summary

Data Volumes are here to stay

In-Memory Computing is becoming increasingly “affordable”

Hadoop is not your Big Data answer it is part of your BI and BigData ecosystem

BI and Big Data Ecosystem will likely benefit from other tools as well

An Enterprise Data Strategy and Data Governance iscritical to success

Page 21: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Summary

Make sure you have two conversations in your enterprise

1 2

A BusinessConversationabout the business values from your BIEcosystem

An IT Conversationto ensure your IT Organisationunderstands the new world of BI, theshortcomings, the strengths and rolesof the component technologies

Page 22: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Summary

“What matters is how — and why — vastly moredata leads to vastly greater value creation.

Designing and determining those links is typicallyin the province of top management”

but needs to be facilitated by the IT Organisationin Business terms

Page 23: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

A parting thought: Big Data‘s 4 V‘s

VALUEvalue comes from knowing more than the rest

ANALYTICScreates

Page 24: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

QUESTIONS?

Page 25: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

BACKUP

Page 26: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

HADOOP Innovation #1: Much cheaper storage

0.5 Petabytes200,000 IOPS8 Gbyte/sec

1 Petabyte200,000 IOPS10 Gbyte/sec

10 Petabytes400,000 IOPS

250 Gbyte/sec

$1 Milliongets you

Software HDS, bundled withhardware by HDS

NetApp, bundled withhardware by NetApp

Open source Hadoop ecosystem,hardware self-assembled

Gigabyte

SAN Storage NAS File Servers Local Storage

$2 - $10 $1 - $5 <$0.50

Page 27: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Learning the language of Big DataColour coding key

Core HadoopKernel/ModulesHadoop DW ModulesNoSQL DB PlatformsMPP Analytics PlatformsProgramming LanguagesIDEsData HubsBI SuiteAnalysis and VisualisationData Analysis ToolData Integration ToolStartup - undefined

Page 28: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

How use case segmentation drives solution designand technology selection

Page 29: HANA, HADOOP -  · PDF fileThe 3 V’s of Big Data Business Problem Technology Solution Backward-looking analysis Using data out of business applications SAP HANA Cloudera Hadoop

Gartner hyper cycle for analytic applicationsA great starting point for BI and Big Data use cases