Top Banner

Click here to load reader

Apache NiFi Crash Course - San Jose Hadoop Summit

Apr 13, 2017

ReportDownload

Technology

Presentation Title Goes Here with a Maximum of Three Lines of Copy

Dataflow with Apache NiFiAldrin Piri - @aldrinpiriApache NiFi Crash CourseHadoop Summit 2016 San Jose

29 June 2016

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data1

Key: 'Apache NiFi Value: 'PMC Member'Key: 'Work Value: Sr. Member of Technical Staff @ Hortonworks'Key: 'Working with NiFi Since Value: '2010

# Hortonworks Inc. 2011 2016. All Rights ReservedAgendaWhat is dataflow and what are the challenges?Apache NiFiArchitectureLive DemoCommunity

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedAgendaWhat is dataflow and what are the challenges?Apache NiFiArchitectureLive DemoCommunity

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedLets Connect A to B

Producers A.K.A ThingsAnythingAND Everything

Internet!

ConsumersUserStorageSystemMore Things

# Hortonworks Inc. 2011 2016. All Rights ReservedMoving data effectively is hardStandards: http://xkcd.com/927/

# Hortonworks Inc. 2011 2016. All Rights ReservedWhy is moving data effectively hard? StandardsFormatsExactly Once DeliveryProtocolsVeracity of InformationValidity of InformationEnsuring SecurityOvercoming SecurityComplianceSchemasConsumers ChangeCredential ManagementThat [person|team|group]NetworkExactly Once Delivery

# Hortonworks Inc. 2011 2016. All Rights ReservedLets Connect Lots of As to Bs to As to Cs to Bs to s to Cs to sLets consider the needs of a courier service

Physical Store

Gateway Server

Mobile Devices

Registers

Server ClusterDistribution Center

Core Data Center at HQ

Server Cluster

On Delivery Routes

Trucks

Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

# Hortonworks Inc. 2011 2016. All Rights ReservedGreat! I am collecting all this data! Lets use it!Finding our needles in the haystack

Physical Store

Gateway Server

Mobile Devices

Registers

Server ClusterDistribution Center

Kafka

Core Data Center at HQ

Server ClusterOthersStorm / Spark / Flink / ApexKafkaStorm / Spark / Flink / Apex

On Delivery Routes

Trucks

Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

# Hortonworks Inc. 2011 2016. All Rights ReservedWhy is moving data effectively hard when scoped internally? StandardsFormatsExactly Once DeliveryProtocolsVeracity of InformationValidity of InformationEnsuring SecurityOvercoming SecurityComplianceSchemasConsumers ChangeCredential ManagementThat [person|team|group]NetworkExactly Once Delivery

# Hortonworks Inc. 2011 2016. All Rights ReservedLets Connect Lots of As to Bs to As to Cs to Bs to s to Cs to sOh, that courier service is global

# Hortonworks Inc. 2011 2016. All Rights ReservedWhy is moving data effectively hard when scoped globally? StandardsFormatsExactly Once DeliveryProtocolsVeracity of InformationValidity of InformationEnsuring SecurityOvercoming SecurityComplianceSchemasConsumers ChangeCredential ManagementThat [person|team|group]NetworkExactly Once Delivery

# Hortonworks Inc. 2011 2016. All Rights ReservedThe Unassuming Line: A Case StudyWeve seen a few lines show up in the wild thus far

Internet!Inter- & Intra- connections inour global courier enterpriseSpotlight: Arthur Lacte, https://thenounproject.com/turo/

# Hortonworks Inc. 2011 2016. All Rights ReservedDataflow Line Anatomy 101Lets dissect what this line typically represents

Fig 1. Lineus Worldwidewebus. Common Name: Internet!

Script or ApplicationScript or Application

Data

Data

Disparate TransportMechanisms

# Hortonworks Inc. 2011 2016. All Rights ReservedDataflow Line Anatomy 201Sometimes that transport is just more lines

Fig 1. Lineus Worldwidewebus. Common Name: Internet!

Script or ApplicationScript or Application

Line Inception

Data

Data

# Hortonworks Inc. 2011 2016. All Rights ReservedDataflow Line Anatomy 301But those lines could also have componentsFig 1. Lineus Worldwidewebus. Common Name: Internet!

Fig 2. Good Recursion Joke

NoSuchJokeException

footage not found

# Hortonworks Inc. 2011 2016. All Rights ReservedAgendaWhat is dataflow and what are the challenges?Apache NiFiArchitectureLive DemoCommunity

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedApache NiFiKey FeaturesGuaranteed deliveryData buffering BackpressurePressure releasePrioritized queuingFlow specific QoSLatency vs. throughputLoss toleranceData provenanceSupports push and pull models

Recovery/recording a rolling log of fine-grained historyVisual command and controlFlow templatesPluggable/multi-role securityDesigned for extensionClustering

# Hortonworks Inc. 2011 2016. All Rights ReservedApache NiFi Subproject: MiNiFiLet me get the key parts of NiFi close to where data begins and provide bidrectional communication

NiFi lives in the data center. Give it an enterprise server or a cluster of them.MiNiFi lives as close to where data is born and is a guest on that device or system

# Hortonworks Inc. 2011 2016. All Rights ReservedLets revisit our courier service from the perspective of NiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server ClusterDistribution Center

Kafka

Core Data Center at HQ

Server ClusterOthersStorm / Spark / Flink / ApexKafkaStorm / Spark / Flink / Apex

On Delivery Routes

Trucks

Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/Client LibrariesClient LibrariesMiNiFiMiNiFiNiFiNiFiNiFiNiFiNiFiNiFi

Client Libraries

# Hortonworks Inc. 2011 2016. All Rights ReservedApache NiFi Managed Dataflow

SOURCESREGIONAL INFRASTRUCTURECORE INFRASTRUCTURE

# Hortonworks Inc. 2011 2016. All Rights ReservedNiFi is based on Flow Based Programming (FBP)FBP TermNiFi TermDescriptionInformation PacketFlowFileEach object moving through the system.Black BoxFlowFile ProcessorPerforms the work, doing some combination of data routing, transformation, or mediation between systems.Bounded BufferConnectionThe linkage between processors, acting as queues and allowing various processes to interact at differing rates.SchedulerFlow ControllerMaintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.SubnetProcess GroupA set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.

# Hortonworks Inc. 2011 2016. All Rights ReservedFlowFiles & Data AgnosticismNiFi is data agnostic!But, NiFi was designed understanding that userscan care about specifics and provides tooling to interact with specific formats, protocols, etc.

ISO 8601 - http://xkcd.com/1179/Robustness principleBe conservative in what you do, be liberal in what you accept from others

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data23

FlowFiles are like HTTP data

HTTP DataFlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTETag: "45b6-834-49130cc1182c0"Accept-Ranges: bytesContent-Length: 13Connection: closeContent-Type: text/html

Hello world!Standard FlowFile AttributesKey: 'entryDateValue: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'lineageStartDate Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSizeValue: '23609'FlowFile Attribute Map ContentKey: 'filenameValue: '15650246997242'Key: 'pathValue: './

Binary Content *HeaderContent

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data24

AgendaWhat is dataflow and what are the challenges?Apache NiFiArchitectureLive DemoCommunity

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedExtension / Integration PointsNiFi TermDescriptionFlow File ProcessorPush/Pull behavior. Custom UIReporting TaskUsed to push data from NiFi to some external service (metrics, provenance, etc..)Controller ServiceUsed to enable reusable components / shared services throughout the flowREST APIAllows clients to connect to pull information, change behavior, etc..

# Hortonworks Inc. 2011 2016. All Rights Reserved

26

OS/Host

JVM

Flow Controller

Web Server

Processor 1

Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1

Extension N

FlowFileRepository

Conte

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.