Top Banner
Apache NiFi 1.0 in Nutshell Koji Kawamura – Software Engineer Arti Wadhwani – Technical Support Engineer 2016 October 27
45

Apache NiFi 1.0 in Nutshell

Jan 07, 2017

Download

Technology

Hadoop Summit
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache NiFi 1.0 in Nutshell

Apache NiFi 1.0 in NutshellKoji Kawamura – Software EngineerArti Wadhwani – Technical Support Engineer

2016 October 27

Page 2: Apache NiFi 1.0 in Nutshell

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 3: Apache NiFi 1.0 in Nutshell

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 4: Apache NiFi 1.0 in Nutshell

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

November 2014NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator.

2006NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA)

A Brief History

July 2015NiFi reaches ASF top-level project status

Page 5: Apache NiFi 1.0 in Nutshell

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

” NiFi is like digging irrigation ditches as the water flows, rather than building out a sprinkler system in advance."

“NiFiは事前にスプリンクラーを配備するというより、

水が流れるのに合わせて用水路を整備するようなもんさ”

https://mail-archives.apache.org/mod_mbox/nifi-users/201604.mbox/%[email protected]%3E

What’s Apache NiFi?

Page 6: Apache NiFi 1.0 in Nutshell

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi is a tool for

Data FlowManagement

Page 7: Apache NiFi 1.0 in Nutshell

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Store Data

Process and Analyze Data

Acquire Data

Simplistic View of DataFlows: Easy, Definitive

Dataflow

Page 8: Apache NiFi 1.0 in Nutshell

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Realistic View of Dataflows: Complex, Convoluted

Store Data

Process and Analyze Data

Acquire Data

Store DataStore Data

Store Data

Store Data

Acquire Data

Acquire Data

Acquire Data

Dataflow

Page 9: Apache NiFi 1.0 in Nutshell

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 has 170+ Processors, 30% Increase from NiFi 0.7

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

Route Content

Route Context

Route Text

Control Rate

Distribute Load

Generate Table Fetch

Jolt Transform JSON

Prioritized Delivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

All Apache project logos are trademarks of the ASF and the respective projects.

Fetch

Page 10: Apache NiFi 1.0 in Nutshell

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Deeper Ecosystem Integration – New Processors

Processor Description

Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively

JoltTransformJson Manipulate JSON data on the fly, with a preview functionality

GenerateTableFetch Incremental fetch + parallel fetch against source table partitions

PutHiveQL Ingest to Hive tables

SelectHiveQL Select from Hive tables

PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API

CovertAvroToORC Format conversation, Avro to ORC

Publish/ConsumeMQTT MQTT is a popular protocol in IoT world

Page 11: Apache NiFi 1.0 in Nutshell

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

SOURCES REGIONALINFRASTRUCTURE

COREINFRASTRUCTURE

Data Movement Management

ConstrainedHigh-Latency

Localized Context

Hybrid – Cloud/On-PremiseLow-Latency

Global Context

Page 12: Apache NiFi 1.0 in Nutshell

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow (HDF)

Constrained High-latency Localized context

Hybrid – cloud/on-premises Low-latency Global context

SOURCES REGIONAL INFRASTRUCTURE

CORE INFRASTRUCTURE

Page 13: Apache NiFi 1.0 in Nutshell

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Flow Management

Detailed Break Down of Requirements

Req 1: Acquire data from various Wearable Device’s Cloud Instances

Req 2: Move Data from Customer Cloud Instances to on-premise instance

Req 3: Perform intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.

Req 4: Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.

Req 5: Parse the device data to standardized format that downstream sysem can understand

Req 6: Enrich the data with contextual information including patient/customer info (age, gender, etc..)

Req 7: Recognize the pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.

Req 8: Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.

Stream Processing & Analytics

Page 14: Apache NiFi 1.0 in Nutshell

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 15: Apache NiFi 1.0 in Nutshell

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Modernized UI

Page 16: Apache NiFi 1.0 in Nutshell

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Modernized UI – Complete Interface Redesign

Page 17: Apache NiFi 1.0 in Nutshell

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connect Components to design your data flow

Component What for?Processor Purpose built processing unit e.g. GetXXX, PutXXXInput Port Receiving data endpoint btw Process Groups (local/remote)Output Port Exposing data endpoint btw Process Groups (local/remote)Process Group Must have, to design well structured data flowRemote Process Group Enable data transfer btw NiFi deployments via Site-to-SiteFunnel Bundle multiple relationships into oneTemplate Share part of data flowLabel Useful to visually group processors, and description

From left to right

Page 18: Apache NiFi 1.0 in Nutshell

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Provenance

Page 19: Apache NiFi 1.0 in Nutshell

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Multitenant Authorization

Page 20: Apache NiFi 1.0 in Nutshell

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 0.x - Authorization Model

Previously had role based authorization– Dataflow Manager (DFM)– Monitor – Provenance– Admin– Proxy– NiFi

Limitation - All or nothing model– DFM can change everything, Monitor can change nothing– Can’t give a user ability to modify/view only certain components– Would require standing up multiple NiFi instances

Page 21: Apache NiFi 1.0 in Nutshell

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 - Authorization Model

NiFi 1.0 introduces a new delegated authorization model Authorize each request based on user identity, action, and resource

– Example for user1 modifying properties on processor1: • User Identity: user1• Action: WRITE• Resource: processor1 (uuid)

If authorizer says resource not found, parent is checked… if parent isn’t found, parent’s parent is checked, and so on…

Page 22: Apache NiFi 1.0 in Nutshell

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – NiFi Managed Authorizer vs. External Authorizer

Managed Authorizer– File based persistence

• Could be be extended to other persistence mechanisms– NiFi UI to manage policies– NiFi controls authorization logic

External Authorizer– Ranger integration– Ranger UI to manage policies– Ranger controls authorization logic

Page 23: Apache NiFi 1.0 in Nutshell

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Managing Users

Clicking the new user icon allows the admin to create Users and Groups– Individual Users can be grouped– Groups can be assigned

members

Clicking the edit user icon allows the admin to update a specific User/Group

Page 24: Apache NiFi 1.0 in Nutshell

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – UI OverviewUsers Icon in Global

Menu used to accessUsers/Groups

Lock Icon in GlobalMenu used to

accessGlobal policies

Page 25: Apache NiFi 1.0 in Nutshell

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – UI Overview

Lock Icon in palette used to access

policies for currently selected component

Selection Context

Page 26: Apache NiFi 1.0 in Nutshell

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Overriding Component Policies

Component inherit policies from the closest ancestor Process Group with policies defined

View/Modify policies handled independently

Click Override to define a new policy, then add Users and Groups

New Users and Groups override the inherited policies (whitelisting)

Page 27: Apache NiFi 1.0 in Nutshell

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 - Multi-Tenancy Example

Create a Group for Team 1 and a Group for Team 2 Give Team 1 view & modify for Process Group 1 Give Team 2 view & modify for Process Group 2 A user from Team 1 would see:

Can’t see the name of the group and can’t right-click to configure the group, but can enter the group

Page 28: Apache NiFi 1.0 in Nutshell

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Revisions

Revision per component Supports concurrent editing of different components without need for refreshing

Page 29: Apache NiFi 1.0 in Nutshell

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Zero Master Clustering

Page 30: Apache NiFi 1.0 in Nutshell

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 0.x: NCM (NiFi Cluster Manager)

NCM

Node1

Node2

ExternalData Source

Chunk

Chunk

Chunk

Distribution mechanismdepends on data source

Web UI

OtherNiFi

Interact with NCM

Site-to-Site:Get topology from NCMThen transfer data p2p

Primary

Page 31: Apache NiFi 1.0 in Nutshell

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: ZMC (Zero Master Clustering)

Node1

Node2

Node3

ExternalData Source

Chunk

Chunk

Chunk

Distribution mechanism depends on data source

Web UI

OtherNiFi

Interact with any node

Site-to-Site:Get topology from one of nodes

Then transfer data p2pZookeeper

Primary

Coordinator

Zookeeper electsCluster Coordinator and Primary node

Any node can fail

Page 32: Apache NiFi 1.0 in Nutshell

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: And More!

Page 33: Apache NiFi 1.0 in Nutshell

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Foundational Work for SDLC Deterministic template export

– Deterministic ordering, template xml file

– Version control of the template

– Collaborative SDLC effort

Variable registry

– Phase one implementation

– In-memory variable registry

– The same key referenced in a template, mapped to different environmental

specific values

Page 34: Apache NiFi 1.0 in Nutshell

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 35: Apache NiFi 1.0 in Nutshell

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

JVM

REST API

NiFi

Framework

Proc CS Report Task

Extension API

S2S API

JVM

S2S Client Libraries

Site-to-Site Refactoring – S2S HTTP(S) Protocol through Proxy Server

Socket protocol: TCP

HDF 2.0: HTTP(s) protocol

HTTP proxy

Page 36: Apache NiFi 1.0 in Nutshell

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 37: Apache NiFi 1.0 in Nutshell

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

Page 38: Apache NiFi 1.0 in Nutshell

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 39: Apache NiFi 1.0 in Nutshell

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 40: Apache NiFi 1.0 in Nutshell

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Common issues

Hbase Connection Issues - ClassNotFoundException NiFi SSL issues ExecuteSQL Processor issues NiFi Content Repo full PutKafka/GetKafka issues Issues after enabling Kerberos OutOfMemory Issues

Page 41: Apache NiFi 1.0 in Nutshell

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interesting Issues/Use Cases

TBD (need to add 2-3 interesting issues/use cases)

Page 42: Apache NiFi 1.0 in Nutshell

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Best Practices

Debug Logging in case of Processor issues

NiFi Site-to-Site Practices

Core Properties tuning

JVM tuning

Understanding health via NiFi UI

Page 43: Apache NiFi 1.0 in Nutshell

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

Page 44: Apache NiFi 1.0 in Nutshell

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What’s Next

Framework extension– Distributed data durability (HA

data)– Configuration management flows

(SDLC) Enhanced User Experience

– Template/Extension Registry– Variable Registry

Deeper ecosystem integration

Central Command and Control Native Agent (GA)

NiFi MiNiFi

https://cwiki.apache.org/confluence/display/NIFI/Product+requirements

Nifi product requirements Search!

Page 45: Apache NiFi 1.0 in Nutshell

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You