Top Banner
Dataflow with Apache NiFi Aldrin Piri - @aldrinpiri Apache NiFi Crash Course DataWorks Summit 2017 – Munich 6 April 2017
34

Dataflow with Apache NiFi

Apr 13, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dataflow with Apache NiFi

DataflowwithApacheNiFiAldrinPiri- @aldrinpiriApacheNiFi CrashCourseDataWorks Summit2017– Munich

6April2017

Page 2: Dataflow with Apache NiFi

2 ©HortonworksInc.2011– 2016.AllRightsReserved

Key:'ApacheNiFi’Value:'PMCMember'

Key:'Work’Value:’Sr.MemberofTechnicalStaff@Hortonworks'

Key:'WorkingwithNiFi Since’Value:'2010’

Page 3: Dataflow with Apache NiFi

3 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaWhatisdataflowandwhatarethechallenges?

ApacheNiFi

Architecture

LiveDemo

Community

Page 4: Dataflow with Apache NiFi

4 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaWhatisdataflowandwhatarethechallenges?

ApacheNiFi

Architecture

LiveDemo

Community

Page 5: Dataflow with Apache NiFi

5 ©HortonworksInc.2011– 2016.AllRightsReserved

Let’sConnectAtoBProducersA.K.AThings

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …MoreThings

Page 6: Dataflow with Apache NiFi

6 ©HortonworksInc.2011– 2016.AllRightsReserved

Movingdataeffectivelyishard

Standards:http://xkcd.com/927/

Page 7: Dataflow with Apache NiFi

7 ©HortonworksInc.2011– 2016.AllRightsReserved

Whyismovingdataeffectivelyhard?

à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity

à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery

Page 8: Dataflow with Apache NiFi

8 ©HortonworksInc.2011– 2016.AllRightsReserved

Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsLet’sconsidertheneedsofacourierservice

PhysicalStore

GatewayServer

MobileDevices

Registers

ServerCluster

DistributionCenter CoreDataCenteratHQ

ServerCluster

OnDeliveryRoutes

Trucks Deliverers

DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/

Page 9: Dataflow with Apache NiFi

9 ©HortonworksInc.2011– 2016.AllRightsReserved

Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack

PhysicalStore

GatewayServer

MobileDevices

Registers

ServerCluster

DistributionCenter

Kafka

CoreDataCenteratHQ

ServerCluster

Others

Storm/Spark/Flink /Apex

Kafka

Storm/Spark/Flink /Apex

OnDeliveryRoutes

Trucks Deliverers

DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/

Page 10: Dataflow with Apache NiFi

10 ©HortonworksInc.2011– 2016.AllRightsReserved

Whyismovingdataeffectivelyhardwhenscopedinternally?

à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity

à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery

Page 11: Dataflow with Apache NiFi

11 ©HortonworksInc.2011– 2016.AllRightsReserved

Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsOh,thatcourierserviceisglobal

Page 12: Dataflow with Apache NiFi

12 ©HortonworksInc.2011– 2016.AllRightsReserved

Whyismovingdataeffectivelyhardwhenscopedglobally?

à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity

à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery

Page 13: Dataflow with Apache NiFi

13 ©HortonworksInc.2011– 2016.AllRightsReserved

TheUnassumingLine:ACaseStudyWe’veseenafewlinesshowupinthewildthusfar

Internet! Inter- &Intra- connectionsinourglobalcourierenterprise

Spotlight:ArthurLacôte,https://thenounproject.com/turo/

Page 14: Dataflow with Apache NiFi

14 ©HortonworksInc.2011– 2016.AllRightsReserved

DataflowLineAnatomy101Let’sdissectwhatthislinetypicallyrepresents

Fig1.Lineus Worldwidewebus.CommonName:Internet!

ScriptorApplication

ScriptorApplication

Data Data

DisparateTransportMechanisms

Page 15: Dataflow with Apache NiFi

15 ©HortonworksInc.2011– 2016.AllRightsReserved

DataflowLineAnatomy201Sometimesthattransportisjustmorelines

Fig1.Lineus Worldwidewebus.CommonName:Internet!

ScriptorApplication

ScriptorApplication

LineInception

Data Data

Page 16: Dataflow with Apache NiFi

16 ©HortonworksInc.2011– 2016.AllRightsReserved

DataflowLineAnatomy301Butthoselinescouldalsohavecomponents…

Fig1.Lineus Worldwidewebus.CommonName:Internet!

Page 17: Dataflow with Apache NiFi

17 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaWhatisdataflowandwhatarethechallenges?

ApacheNiFi

Architecture

LiveDemo

Community

Page 18: Dataflow with Apache NiFi

18 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheNiFiKeyFeatures

• Guaranteeddelivery• Databuffering

- Backpressure- Pressurerelease

• Prioritizedqueuing• FlowspecificQoS

- Latencyvs.throughput- Losstolerance

• Dataprovenance• Supportspushandpull

models

• Recovery/recordingarollinglogoffine-grainedhistory

• Visualcommandandcontrol

• Flowtemplates• Pluggable/multi-role

security• Designedforextension• Clustering

Page 19: Dataflow with Apache NiFi

19 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheNiFi Subproject:MiNiFi

à LetmegetthekeypartsofNiFi closetowheredatabeginsandprovidebidrectionalcommunication

à NiFi livesinthedatacenter.Giveitanenterpriseserveroraclusterofthem.

à MiNiFi livesasclosetowheredataisbornandisaguestonthatdeviceorsystem

Page 20: Dataflow with Apache NiFi

20 ©HortonworksInc.2011– 2016.AllRightsReserved

Let’srevisitourcourierservicefromtheperspectiveofNiFi

PhysicalStore

GatewayServer

MobileDevices

Registers

ServerCluster

DistributionCenter

Kafka

CoreDataCenteratHQ

ServerCluster

Others

Storm/Spark/Flink /Apex

Kafka

Storm/Spark/Flink /Apex

OnDeliveryRoutes

Trucks Deliverers

DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/

ClientLibraries

ClientLibraries

MiNiFi

MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi

ClientLibraries

Page 21: Dataflow with Apache NiFi

21 ©HortonworksInc.2011– 2016.AllRightsReserved

ApacheNiFi ManagedDataflowSOURCES REGIONAL

INFRASTRUCTURECORE

INFRASTRUCTURE

Page 22: Dataflow with Apache NiFi

22 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFi isbasedonFlowBasedProgramming(FBP)

FBPTerm NiFi Term DescriptionInformationPacket

FlowFile Each objectmovingthroughthesystem.

Black Box FlowFileProcessor

Performsthework, doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.

BoundedBuffer

Connection Thelinkage betweenprocessors, actingasqueuesandallowingvariousprocessestointeractatdifferingrates.

Scheduler FlowController

Maintainstheknowledgeofhowprocessesareconnected, andmanagesthethreadsandallocationsthereofwhichallprocessesuse.

Subnet ProcessGroup

Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocess groupallowscreationofentirelynewcomponentsimplybycompositionofits components.

Page 23: Dataflow with Apache NiFi

23 ©HortonworksInc.2011– 2016.AllRightsReserved

FlowFiles &DataAgnosticism

à NiFi isdataagnostic!

à But,NiFi wasdesignedunderstandingthatusers

cancareaboutspecificsandprovidestooling

tointeractwithspecificformats,protocols,etc.

ISO8601- http://xkcd.com/1179/

Robustnessprinciple

Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers“

Page 24: Dataflow with Apache NiFi

24 ©HortonworksInc.2011– 2016.AllRightsReserved

FlowFiles arelikeHTTPdataHTTPData FlowFile

HTTP/1.1200OKDate:Sun,10Oct201023:26:07GMTServer:Apache/2.2.8(CentOS)OpenSSL/0.9.8gLast-Modified:Sun,26Sep201022:04:35GMTETag:"45b6-834-49130cc1182c0"Accept-Ranges:bytesContent-Length:13Connection:closeContent-Type:text/html

Helloworld!

StandardFlowFile AttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFile AttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’

BinaryContent*

Header

Content

Page 25: Dataflow with Apache NiFi

25 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaWhatisdataflowandwhatarethechallenges?

ApacheNiFi

Architecture

LiveDemo

Community

Page 26: Dataflow with Apache NiFi

26 ©HortonworksInc.2011– 2016.AllRightsReserved

Extension/IntegrationPoints

NiFi Term DescriptionFlow FileProcessor

Push/Pull behavior.CustomUI

ReportingTask

Used topushdatafromNiFi tosomeexternalservice(metrics,provenance,etc..)

ControllerService

Usedtoenablereusablecomponents/ sharedservicesthroughouttheflow

RESTAPI Allowsclientstoconnecttopullinformation,changebehavior,etc..

©HortonworksInc.2011–2016.AllRightsReserved�X

Architecture

OS/Host

JVM

FlowController

WebServer

Processor1 ExtensionN

FlowFileRepository

Content Repository

ProvenanceRepository

LocalStorage

Standalone

Cluster

Page 27: Dataflow with Apache NiFi

27 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFiArchitecture– Repositories- Passbyreference

FlowFile Content Provenance

F1à C1 C1 P1à F1

Excerptofdemoflow… What’shappeninginsidetherepositories…

BEFORE

AFTER

F2à C1 C1 P3à F2 – Clone(F1)

F1à C1 P2à F1 – Route

P1à F1 – Create

Page 28: Dataflow with Apache NiFi

28 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFiArchitecture– Repositories– CopyonWrite

FlowFile Content Provenance

F1à C1 C1 P1à F1- CREATE

Excerptofdemoflow… What’shappeninginsidetherepositories…

BEFORE

AFTER

F1à C1F1.1à C2 C2(encrypted)

C1(plaintext)

P2à F1.1 - MODIFY

P1à F1- CREATE

Page 29: Dataflow with Apache NiFi

29 ©HortonworksInc.2011– 2016.AllRightsReserved

AgendaWhatisdataflowandwhatarethechallenges?

ApacheNiFi

Architecture

Demo

Community

Page 30: Dataflow with Apache NiFi

30 ©HortonworksInc.2011– 2016.AllRightsReserved

Learn,ShareatBirdsofaFeatherIOT,STREAMING&DATAFLOW

Thursday,April65:50pm,Room5

Page 31: Dataflow with Apache NiFi

31 ©HortonworksInc.2011– 2016.AllRightsReserved

WhyNiFi?

à Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Thinkofourcourierexampleandorganizationslikeit:intervs intra,domestically,internationally

à Providecommontoolingandextensionsthatarecommonlyneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure

à Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent

Page 32: Dataflow with Apache NiFi

32 ©HortonworksInc.2011– 2016.AllRightsReserved

Learnmoreandjoinus!

Apache NiFi sitehttp://nifi.apache.org

Subproject MiNiFi sitehttp://nifi.apache.org/minifi/

Subscribe to and collaborate [email protected]@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI

Follow us on Twitter@apachenifi

Page 33: Dataflow with Apache NiFi

33 ©HortonworksInc.2011– 2016.AllRightsReserved

OurLabforToday

à WewillbeexploringsomeexamplestoworkthroughcreatingadataflowwithApacheNiFi

à UseCase:Anurbanplanningboardisevaluatingtheneedforanewhighway,dependentoncurrenttrafficpatterns,particularlyasotherroadworkinitiativesareunderway.Integratinglivedataposesaproblembecausetrafficanalysishastraditionallybeendoneusinghistorical,aggregatedtrafficcounts.Toimprovetrafficanalysis,thecityplannerwantstoleveragereal-timedatatogetadeeperunderstandingoftrafficpatterns.NiFi wasselectedforforthisreal-timedataintegration.

à Labsareavailableathttp://tinyurl.com/nificrashcourse

Page 34: Dataflow with Apache NiFi

34 ©HortonworksInc.2011– 2016.AllRightsReserved

ThankYou