Top Banner
Streaming Transformations Using Oracle Data Integration Michael Rainey | BIWA Summit 2017
26

Streaming with Oracle Data Integration

Mar 19, 2017

Download

Software

Michael Rainey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Streaming with Oracle Data Integration

StreamingTransformationsUsingOracleDataIntegration

MichaelRainey|BIWASummit2017

Page 2: Streaming with Oracle Data Integration

• MichaelRainey-TechnicalAdvisor• SpreadingthegoodwordaboutGluentproductswiththeworld

• OracleDataIntegrationexpertise• OracleACEDirector• mRainey.co

2

Introduction

we liberate enterprise data

Page 3: Streaming with Oracle Data Integration

Whatis“Streaming”

Page 4: Streaming with Oracle Data Integration

• Theprocessingandanalysisofstructuredor“unstructured”datainreal-time

• WhyStreaming?• Whenspeed(velocity)ofdataiskey• Streamingdataisprocessedin“timewindows”,inmemory,acrossaclusterofservers

• Examples:• Calculatingaretailbuyingopportunity• Real-timecostcalculations• IoTdataanalysis

4

Whatis“Streaming”

Page 5: Streaming with Oracle Data Integration

“Publish-subscribemessagingrethoughtasadistributedcommitlog”

5

Streamingdata-ApacheKafka

Image source: kafka.apache.org/

Page 6: Streaming with Oracle Data Integration

EnterpriseDataBus

6

Page 7: Streaming with Oracle Data Integration

EnterpriseDataBus

6

Page 8: Streaming with Oracle Data Integration

• Scalable,fault-tolerant,high-throughputstreamprocessing• SparkStreamingreceivesliveinputdatastreamsfromvarioussources• ContinuousstreamofdataisknownasadiscretizedstreamorDStream

• Dataisdividedintomini-batchesandprocessedbytheSparkengine• Operationssuchasjoin,filter,map,count,windowedcomputations,etcareusedtotransformdatain-flight

7

Streamprocessing-ApacheSpark

Page 9: Streaming with Oracle Data Integration

WhyOracleDataIntegration?

Page 10: Streaming with Oracle Data Integration

• EnterprisehasinvestedheavilyinODIand/orGoldenGate

• Gettingstartedwithdevelopmentlanguages(Python/pySpark,Java,etc)

• Centralizedmetadatamanagement• Integratewithotherdatasourcesusingasingleinterface

• Realizedcostsavings• AccordingtoGartner,200%increaseinmaintenancecostswhencustomcoding(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)

9

WhyOracleDataIntegration?

Page 11: Streaming with Oracle Data Integration

10

StreamingwithOracleDataIntegration

Page 12: Streaming with Oracle Data Integration

10

StreamingwithOracleDataIntegration

Real-timedatareplication

Streamingintegration:OGG->Kafka

Streamingintegration:Kafka->SparkStreaming

Page 13: Streaming with Oracle Data Integration

11

RelationaldatabasetransactionstoKafka

Page 14: Streaming with Oracle Data Integration

• GoldenGate• …isnon-invasive• …hascheckpointsforrecovery• …movesdataquickly• …iseasytosetup

12

WhyGoldenGatewithKafka?

Page 15: Streaming with Oracle Data Integration

• Heterogeneoussourcesandtargets• Builttointegratealldata

• Flexibility• Reusablecodetemplates(KnowledgeModules)

• ReusableMappings• ODIcanadapttoyourdatawarehouse-andnottheotherwayaround

• Flowbasedmappings

13

WhyOracleDataIntegratorwithSparkStreaming?

Page 16: Streaming with Oracle Data Integration

GettingstartedwithstreamingusingOracleDataIntegration

Page 17: Streaming with Oracle Data Integration

• StandardGoldenGateExtract/PumpprocessestocaptureRDBMSdata• ReplicatforJavaparameterfile&processgroupcreatedandsetup• KakfaProducerpropertiesandKafkaHandlerconfigurationsetup

15

OracleGoldenGateforBigData-KafkaHandlerSetup

Page 18: Streaming with Oracle Data Integration

• Kafkahandlerproperties• SetpropertiesforhowGoldenGateinteractswithKafka• Format,transactionvsoperationmode,etc

• Kafkaproducerconfiguration

16

GoldenGateforKafkasetup

http://mrainey.co/ogg-kafka-oow

Page 19: Streaming with Oracle Data Integration

17

KafkaandOracleDataIntegratorsetup

Page 20: Streaming with Oracle Data Integration

17

KafkaandOracleDataIntegratorsetup

Page 21: Streaming with Oracle Data Integration

• CreateModelusingKafkaLogicalSchema

• CreateDatastore• Similartostandard“File”datastore,definefileformatandsetupcolumns

• OnlysupportforCSV• FutureformatsmayincludeJSON,Avro,etc

• AddDatastoretomapping

18

KafkaandOracleDataIntegrator

Page 22: Streaming with Oracle Data Integration

• CreateSparkDataServer,Physical/LogicalSchema• SetHadoopDataServer• Addproperties,suchascheckpointing,asynchronousexecutionmode,etc• Additionalpropertiescanbeadded:http://spark.apache.org/docs/latest/configuration.html

• SparkServerissetupasStaginglocation• SourceDatastorefromKafka,OracleDB,etc• TargetDatastoreisCassandra,OracleDB,etc

• CodegeneratedbyKMispySpark• pySparkcodecanbeaddedtofilters,joins,othercomponentsfortransformations• Additionallanguages(Scala,Java)maybecomingsoon

19

SparkStreamingandOracleDataIntegrator

Page 23: Streaming with Oracle Data Integration

20

SparkStreamingandOracleDataIntegrator

EnabletheStreamingflaginthePhysicaldesignofamapping.

TogenerateSparkcode,settheExecuteOnHintoptiontousetheSparkdataserverasthestaginglocationforyourmapping

TargetIKMshouldnotbeset.Sparkgeneratedcodewillhandleintegrationandloadintotarget.

Page 24: Streaming with Oracle Data Integration

21

Trackingtheprocess

Whenexecuting,theprocesswillruncontinuouslyintheODIOperator.

IftheconnectionbetweentheODIAgentandSparkAgentislost,itwillreestablishitselfafterrecovery.

Page 25: Streaming with Oracle Data Integration

• Streamingisthe“velocity”indata.AKA“FastData”

• OracleDataIntegratorandOracleGoldenGateprovideaframeworkfordevelopmentandmanagementofdatastreamingprocesses• BigDataadd-onscontinuetosupportnewtechnologies

• BuildastreamingarchitectureusingGoldenGateandODI:• Metadatamanagement• IntegrationofRDBMSdatawith“schemaonread”data• Buildupontheskillsin-house

22

Recap

Page 26: Streaming with Oracle Data Integration

23

we liberate enterprise data

thank you!