Top Banner
Pentaho Data Integration Best Architecture Practices Matt Casters Pentaho Chief Architect of Data Integration, Hitachi Vantara
31

Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

PentahoDataIntegrationBestArchitecturePracticesMattCastersPentahoChiefArchitectofDataIntegration,HitachiVantara

Page 2: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Contents

• Introduction• Generaladvice• Specificadvice• Practicalexamples

• Recap• Q&A

Page 3: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Introduction:Whatis“DataIntegrationArchitecture?”

Page 4: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Introduction

• Whatis“dataintegrationarchitecture”?– Highlevelviewona(potential)DIsolution– Describescomponentsandtheirrelationships– Takingintoaccountallparts– Avoidingdetailswithoutskippinganything

Page 5: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Introduction

• Whydoyouneedanarchitecture?– Solutionsgetverycomplex– Teamsofengineersgetlarge– Consciousdecisionsonuseofsolutioncomponents– Holisticviewsonsecurity,quality,transparency,performance– Allowsforvalidationofhighlevelrequirements– Allowsforthecreationandvalidationofscenarios– Clearlydefinesstakeholders

Page 6: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice:SomePointersinSettingupSolidArchitecturesforSolidSolutions

Page 7: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice– Don’tForgettheDetails…

• Learnthebasicsofthebuildingblocks…– PDIBestPractices#PWorld14• Standards,naming,…– PDIBestGovernancePractices#PWorld15• PM,CI,VCS,Testing,…– Getexpertiseforallsoftwarecomponentsyouuse

Page 8: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice– Whiteboarding

• Whiteboarding– Isdonewithinterestedstakeholders– Triestocompromiseknowledgefromvariousparties– Allowsforquickhighleveldesign– Itisjustastartingpoint!– Needstogetfollowedup,validatedagainstscenarios– Forgetconviction:timetochangeyourmind

Page 9: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice– Scalability

• Parallelizeonahighlevel– Aggressivelowlevelparallelizationcangetyouintotrouble

• Remembertoallowdatatoflowinswimlanes– Parallelizationofasmuchaspossible– “Sharding”andsoonshouldbearchitectedin

• Identifytimewindowearlyon,assessHWneeds

Page 10: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice– Transparency

• Greatcomplexityrequirestransparency– Somethingwillalwaysgowrong– Attheworstpossibletime

• Asarule:– alwaystracedatamovingbetweenpartsofarchitecture–Whenindoubt:addmorelogging,trackingandtracing

• Usecomponentsinarchitecturethatallowformonitoring– Preferserversthatallowyoutoseewhat’sgoingon

Page 11: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

GeneralAdvice– Predictability

• Enormousworkloads,batchjobs,putsystemsunderstress

• Batchestendtogrowbiggerovertime,causingmorestress

• Asarule:– Ifyoucaninanyway,usemicro-batching– Chopup1largenightlyworkloadintohundredsofsmallonesthroughouttheday

• Advantages:– Morefrequentupdates– Predictableworkload– Failearlyscenario:problemsaredetectedearlier

Page 12: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

SpecificAdvice:AdviceforIoTandOthers

Page 13: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

SpecificAdvice– Hadoop

• Hadoophasitselfbecomeanecosystemofsoftware

• Selectthesoftwareintheecosystemtofityouridealarchitecture

• Onlyselectproperlysupportedcomponents,avoidbleedingedge

• Combatlackoftransparencywithextensivelogging

• Followtherightsizingforyourarchitecture,balancecorrectly• Useitasascalablepart,notjustasa“Database”

Page 14: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

SpecificAdvice– IoT

• IoTisMessy– DataQualityvarying– DataConnectivityproblems– Latearrivingdata– Flash-floodsofdata(lowpredictability)– Highcomplexity– Varyingdataformatsandversions– Numberofdifferentdevicescanbehigh

Page 15: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

HitachiVantara IoTOfferings

CONNECTEDTHINGS

OperationalInsights

AssetIntelligence

MaintenanceOptimization

ManufacturingOptimization

EDGE

AssetAvatar State

CORE ANALYTICS

FOUNDRY

DataCollection

AssetManagement

AssetAvatar

ArtificialIntelligence

Batch/Stream/Analytics

DataBlending/Orchestration

AssetIntegration

EdgeAnalytics

DataFiltering

DataTransformation

DashboardAlerts/

NotificationsApplicationEnablement

Page 16: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

SpecificAdvice– IoT

• Planaheadforfailure• UsemoderntechniqueslikeMetadataInjection

• Makeextensiveuseofqueuesinanyformat

• Assumethatthingswillgowrongineveryscenario

• Designthearchitecturetocopewithfailures• Designthearchitecturetoreportonstatistics

Page 17: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

PracticalExamples:WarStoriesfromtheField

Page 18: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– LargeServicesVendor

• Movinglargeamountsofsmalldatapacketsaround

• Pickedtherighttools,didn’tpickanoverallarchitecture• Differentteams“workingtogether”indifferentcountries

• Architecturebecamesecondarytotheoverallsolution

• Technologywasselectednotarchitecture

Page 19: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– LargeServicesVendor

• Carteserversgothammeredthousandsoftimespersecond– Useofaspecificschedulerwasmandated– Runningoutofsockets,HTTPserverbucklingundertheload

• ComplaintsaboutPDIstartuptimes

• Overallperformancetoolow

• Servicescalledintosolve“critical”issuesinoursoftware

Page 20: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– LargeServicesVendor

• Don’tallowinternalorganizationalneedsdrivethearchitecture• Don’tallowtechnologychoicestodrivearchitecture– Andifyoutoo,handletheimplications

• Toscale,rampupperformance,alwaysqueueandintelligentlyhandlequeuedtasks(notoneatatimeforexample)

• Theperformanceofthewholeisdeterminedbytheslowestlink– Considerthisup-frontinthearchitecture

Page 21: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– HandlingTVSet-topData

• Periodicinnature,handlingclicks• ReadingfromMQTT,dumpingdataintoOracleforanalysis

• ReportedPDIperformancetrouble,servicescalledin

• Smallscaletest,predictedten-foldincreaseinsize,alreadyintrouble

Page 22: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– HandlingTVSet-topData

• MQTT:greatforqueuingandIoT

• Notalwayspossibletoreadinparallelfromqueues!

• OracleisanRDBMS,killsparallelisminarchitecture

Page 23: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– HandlingTVSet-topData

• Considerpartitioninglargeamountsofclients

• Considerdataextractionforanydatastoragemechanism

Page 24: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– BigBank

• Processedagazillionrecordseverynight• Hadabatchwindowof2hours• Gotamonstercomputertodothejobwith64cores

• RancomplexdataqualityvalidationsinPDI,hundredsofsteps

• Gotintoaperformanceproblem

• Neededextensiveperformancetuning

Page 25: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– BigBank

Pick2

Good

FastCheap

Page 26: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– BigBank

Pick2

Lotsofwork

InbatchwindowOn1server

Page 27: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Examples– BigBank

• Considerup-frontwhetherHWchoiceswillpinyoudownlater

• Weightheimportanceofspecificrequirementsintothearchitecture– timevscomplexityvshardwareinthiscase

Page 28: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Recap:PDIBestArchitecturePractices

Page 29: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

Recap

• Makeanarchitectureup-front,notaspartofthedocumentation

• Becritical• Bedetailed• Runscenariosagainstit• Bereadytochangeyourmind

• Getstakeholdersinvolved• UsePDI:PessimisticDataIntegration

Page 30: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.

QuestionsandDiscussion

Page 31: Pentaho Data Integration Best Architecture Practices ...€¦ · Integration Edge Analytics Data Filtering Data Transformation Dashboard Alerts / Notifications Application Enablement.