Top Banner
1 © Cloudera, Inc. All rights reserved. Modernizing Business Intelligence and Analytics 1 © Cloudera, Inc. All rights reserved. Justin Erickson Senior Director, Product Management
29

Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

Aug 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

1©Cloudera,Inc.Allrightsreserved.

ModernizingBusinessIntelligenceandAnalytics

1©Cloudera, Inc.Allrightsreserved.

JustinEricksonSeniorDirector,ProductManagement

Page 2: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

2©Cloudera,Inc.Allrightsreserved.

•WhatbenefitscanIachievefrommodernizingmyanalyticDB?•WhenandhowdoImigratefromcurrentsystems?• Howdoesitworkinthecloud?

Agenda

Page 3: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

3©Cloudera,Inc.Allrightsreserved.

EDWOptimization

DataPreparation

Self-ServiceBI&Exploration

UseyourEDWmoreefficientlybyoffloadingworkloadstoHadoop

Fast,flexibleETLoverlargedatavolumes,sodataisalwaysreadyforyourbusiness

Fastesttime-to-insightswithamodernanalyticdatabasedesignedwithHadoop’sflexibilityandagility

KeyApplications

Page 4: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

4©Cloudera,Inc.Allrightsreserved.

Cloudera’sAnalyticDatabase

Identify,offload,&optimizeworkloadsto

Hadoop

NavigatorOptimizer

IntelligentSQLeditor

Hue

Audit,lineage,encryption,key

management,&policylifecycles

Navigator

IntegrationwiththeleadingBItools

BIPartners

InteractivequeryengineforBI&SQLanalytics

Impala

Large-scaleETL&batchprocessingengine

Hive-on-Spark

Multi-Storage,Multi-Environment

DataStorageforFast&ChangingData

Kudu

Page 5: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

5©Cloudera,Inc.Allrightsreserved.

KeyBenefitsAnanalyticdatabasedesignedforHadoop

High-PerformanceBIandSQLAnalytics

FlexibilityforDataandUseCaseVariety

Cost-effectiveScaleforTodayandTomorrow

GoBeyondSQLwithanOpenArchitecture

Page 6: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

6©Cloudera,Inc.Allrightsreserved.

AnalyticDBAnatomyBuiltforself-serviceandhybridcloud

Page 7: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

7©Cloudera,Inc.Allrightsreserved.

AnatomyofanAnalyticDatabaseCloudera DecoupledbyDesign

QueryEngine

StorageEngine

Catalog

QueryEngine(Impala)

Catalog(HMS)

MonolithicAnalyticDatabase ModernAnalyticDatabase

Storage(Kudu)

Storage(S3)

Storage(HDFS)

Page 8: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

8©Cloudera,Inc.Allrightsreserved.

LimitedtoSQLonly• Maintaindatacopiesfornon-SQL

RigidDataModel• Tightlycoupledstorageandcompute

StaticSizing• Majormaintenancetoaddcapacity/nodes

PoorlyDesignedforCloud• Noelasticityorintegrationwithobjectstorage

PainPointsTraditionalMonolithicAnalyticDatabases

COMPUTESTORE

Page 9: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

9©Cloudera,Inc.Allrightsreserved.

Benefits ofCloudera’sModernApproachCloud-Native&On-Premise

GoBeyondSQL• OpenArchitecture:Openformatsandopenstorage

• ShareddataacrossSQLandnon-SQLworkloads

DataFlexibility• Faster,moreagiledataacquisition• Dataportability:Openformatsandopenstorage

Cost-EffectiveScalability• Elasticscaleon-premorinthecloud

• Cloud-nativepay-per-useandtransience

• Provenatbigdatascale

Hybrid• Runsacrossmulti-cloud&on-prem

• Multi-storageoverS3,HDFS,Kudu,Isilon,DSSD,etcSharedData

Page 10: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

10©Cloudera,Inc.Allrightsreserved.

EDWOptimizationExpandtheValueofYourDataWarehousingLandscape

Page 11: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

11©Cloudera,Inc.Allrightsreserved.

MotivationsforOptimizingtheEDW

CostcontainmentforexistingworkloadsLimitedbudgetforexpansion

UnabletotakeonnewworkloadsUnabletokeepupwithchangingbusinessneeds

Difficultyhandlingbothfixed-SLAreportsandself-serviceexploration

Growingimportanceofself-serviceBI,advancedanalytics,andcloud

$$

Page 12: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

12©Cloudera,Inc.Allrightsreserved.

ExistingEDWLandscape

DataSources

ETL/Staging

EDW

Archive

DataMarts

CannedReports

Dashboards/AnalyticApplications

Non-SQLWorkloads

Self-ServiceBI/AdHoc

Page 13: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

13©Cloudera,Inc.Allrightsreserved.

OptimizingtheEDWwithCloudera

• Cost-EffectiveScale• Sayyestomorewithouttherisk

• GoBeyondSQL• Exploration,advancedanalytics,andmoreallinoneplatform

•ModernizetheDataWarehouseLandscape• MaximizetheEDWwhileenablingiterative,self-serviceaccess/BI• Well-suitedforon-prem,cloud,andhybriddeployments

90%lessperTBvsRDBMSand75%lessvsNetezza

Augmented itsOracleEDWwithmulti-tenantClouderasystemwiththeirBItoolconfiguredtoallowuserstopullreportsfromboth

MediaResearchFirmSavedtensofmillionsbyoffloadingDBMStoClouderainthecloud

Page 14: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

14©Cloudera,Inc.Allrightsreserved.

ModernDataWarehouseEnvironment

DataSources

EDW

AnalyticDatabase

OperationalDatabase

DataScience&Engineering

SharedDataLayer

ModernDataPlatform

FixedReports

Dashboards/AnalyticApplications

Non-SQLWorkloads

Self-ServiceBI/AdHoc

FlexibleReporting

Page 15: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

15©Cloudera,Inc.Allrightsreserved.

Plan Offload Optimize

EstimateEffort

RiskAnalysis

SchemaDesign

FineTuningDataModelonHadoop

OptimizeQueriesforPerformance

Test&Validate

Evaluate

IdentifyUseCases

ImpactAnalysis

Objectives PrioritizedPlan

ValidateROI,CostInitialPOC

OffloadeachworkloadEvaluatetheneedforoffload Impactanalysis,prioritizedplan

Optimizeperformance

WorkloadVisibility

NavigatorOptimizerBuilttohelpyouthroughtheoptimizationprocess

OffloadActions

Page 16: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

16©Cloudera,Inc.Allrightsreserved.

WorkloadVisibilityGetinsightsintowhat’shappeningtoday

EvaluateQueries• Topqueries• Queryduplication• Querycomplexity• Commonaccesspatterns

EvaluateDataAccess• Toptables,topcolumns• Usage-basedERdiagram• Alltables/columnsinuse

EvaluatePOC• IdentifyinitialworkloadpieceforPoC• Getpartitioningkeysuggestions

Evaluate

Page 17: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

17©Cloudera,Inc.Allrightsreserved.

ImpactAnalysis&PrioritizedPlanUnderstandwhatittakestooffload

ImpactAnalysis• Focuseffortsbyidentifyingduplication• Workloadriskassessmentbasedoncomplexityandbestpractices

• Understandquerycompatibility

PrioritizedPlan• Estimateeffort• Identifyeasiestpiecestostartforfastsuccess• Prioritizeworkloadsforoffload

Plan

Page 18: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

18©Cloudera,Inc.Allrightsreserved.

PredictableOffloadRemovetheguesswork

Understandoffloadrequirements• Determinemostcommonworkload

patterns• Developdata-/usage-drivenoffload

strategy

Actionablerecommendations• Complexityassessmentforriskierareas• Focuseffortsbyidentifyingduplication• Designrecommendationsforbestresults

Offload

Page 19: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

19©Cloudera,Inc.Allrightsreserved.

OptimizingwithinHadoopMaintainpeakperformance

Understandusageandkeepupwithdataneeds• Understandmostcommonusagepatterns• Identifyoptimizationopportunities• Proactivelyadjustdatamodels

Performanceoptimizations• BestpracticeguidanceforHiveandImpala• Queryperformanceoptimization• Increaseplatformadoption

Optimize

Page 20: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

20©Cloudera,Inc.Allrightsreserved.

Builtforhybridcloud

Page 21: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

21©Cloudera,Inc.Allrightsreserved.

What’sDrivingAnalyticstotheCloud?Bigdatadeploymentsincloudareaccelerating:

● ExecutiveMandate:Minimizeon-premdatacenterfootprint

● IncreasedAgility:End-userself-service

● Elasticity:Optimizeinfrastructureusage

● LowerOverallTCO

Page 22: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

22©Cloudera,Inc.Allrightsreserved.

MostOrganizationsAreorWillbeHybridCloud

• 76%willembracehybridcloud(Gartner1)• 82%willhaveamulti-cloudstrategy(RightScale2)• 50%will“repatriate”atleastonepubliccloudworkloadbacktoprivatecloudor

on-prem forcostreasons(4513)• 50%ofCloudera’scloudcustomersrunahybridenvironment

1Gartner,MarketTrends:CloudAdoptionTrendsFavorPublicCloudWithaHybridTwist20152RightScale 2016StateoftheCloudReport3451Research:AWSLambda:newandexciting,oldandrehashed,morevendorlock-in(oralltheabove)?,November22,2016

Whyisthisacriticalstrategy?

Portability&Cost Functionality DataGravity

Page 23: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

23©Cloudera,Inc.Allrightsreserved.

Cost-Efficiencies&FlexibilityintheCloudPrimaryAnalyticDatabasePatterns

Onlypayforwhatyouneed,whenyouneedit

▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment

ETL

ReduceOperatingCosts NewInsights,NewRevenue

BI/Analytics

Exploreandanalyzealldata,whereveritlives

▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment

Page 24: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

24©Cloudera,Inc.Allrightsreserved.

AddUseCases,Analytics,andDataOn-Demand• AvoidtheITbacklogwithinstantaccesstoalldata

• On-demandclustersquerydirectlyonsharedobjectstorage

PredictableResultsWheneverYouWant• Consistentqueryperformance,evenduringpeaktimes

• Multi-tenancyviaisolatedclustersonshareddata

Just-in-TimeResources• Real-timecapacityforyourneeds,astheychange

• Elasticallygrow/shrinkyourclusterviadecoupledarchitecture

Contention-FreeETL• ETLanytimewithoutimpactingotherworkloadsorriskingSLAs

• SeparateETLclustersas-neededonshareddata

AdditiveBenefitsintheCloudExtendingcoreperformance,flexibility,scalability,andopenarchitecturebenefits

Page 25: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

25©Cloudera,Inc.Allrightsreserved.

BI/AnalyticsintheCloudThreeArchitecturesOptionstoOptimizePrice/Performance

ObjectStorage

TransientCluster

TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser

PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentCluster

PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata

PersistentCluster HDFSand/orKudu

PersistentCluster

TransientCluster

DefaultChoice

Page 26: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

26©Cloudera,Inc.Allrightsreserved.

PersistentBIonObjectStorageBestforelasticity(andspeedvstransient)

● Thisisusuallythebestchoice● Bestwhenworkloadsare:

o Flexibleandchangingo Frequentduringmostworkingdayso Notscheduledforfixedhours

● Benefitsinclude:o Predictableresultsreadilyavailableo Fullmulti-tenantisolationo Commondatainsharedobjectstorageo Grow/shrinkforTCOefficiency

● Tradeoffs:o Pernodeperfofobjectstorage(usemore,

cheapernodes)ObjectStorage

SharedHMSDB

PersistentBI(regularusage)Persistentclustersforreadyavailability● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentCluster

PersistentCluster

DefaultChoice

Page 27: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

27©Cloudera,Inc.Allrightsreserved.

PersistentBIwithLocally-AttachedStorageBestperformanceforconsistentworkloads

● Bestwhenworkloadsare:o Regularandconsistento Consistentlyqueryingcommondatao TightSLAsforperformanceo Fastchangingdata(thatneedsKudu)o Runningwithoutobjectstorage(eg.Azure,GCE)

● Benefitsinclude:o Fasterperformancepernodeonlocaldatao Abilitytoqueryobjectstorageforrestofdata

● Tradeoffs:o Lesselasticthanobjectstoredbasedclusterso Lessisolationformulti-tenantworkloadsusing

sameHDFSdatao Costifthereareoff-peakhours

ObjectStorage

PersistentBIwithHDFS(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● SharedclusterforsharedHDFSdata

PersistentCluster

LocalHMSDB

HDFSand/orKudu

Page 28: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

28©Cloudera,Inc.Allrightsreserved.

TransientBIonObjectStorageBestTCOforinfrequentusage

ObjectStorage

ClouderaDirector

● Bestwhenworkloadsare:o Infrequentorscheduled

● Benefitsinclude:o LowestTCOwithclustersonlywhenneededo Fullmulti-tenantisolationo Commondatainsharedobjectstorage

● Tradeoffs:o Delaytospin-upclusterswhenneededo CapabilityofBIuserstospinupclusterso Pernodeperfofobjectstorage(usemore,

cheapernodes)SharedHMSDB

TransientCluster

TransientBI(infrequentusage)Spinupclusterswhenneeded.● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser

TransientCluster

Page 29: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over

©Cloudera,Inc.Allrightsreserved. 29

ThankyouThankYouJustinErickson