Apache NiFi 1.0 in Nutshell Koji Kawamura – Software Engineer Arti Wadhwani – Technical Support Engineer 2016 October 27
Apache NiFi 1.0 in NutshellKoji Kawamura – Software EngineerArti Wadhwani – Technical Support Engineer
2016 October 27
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
November 2014NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator.
2006NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA)
A Brief History
July 2015NiFi reaches ASF top-level project status
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
” NiFi is like digging irrigation ditches as the water flows, rather than building out a sprinkler system in advance."
“NiFiは事前にスプリンクラーを配備するというより、
水が流れるのに合わせて用水路を整備するようなもんさ”
https://mail-archives.apache.org/mod_mbox/nifi-users/201604.mbox/%[email protected]%3E
What’s Apache NiFi?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is a tool for
Data FlowManagement
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Store Data
Process and Analyze Data
Acquire Data
Simplistic View of DataFlows: Easy, Definitive
Dataflow
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realistic View of Dataflows: Complex, Convoluted
Store Data
Process and Analyze Data
Acquire Data
Store DataStore Data
Store Data
Store Data
Acquire Data
Acquire Data
Acquire Data
Dataflow
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 has 170+ Processors, 30% Increase from NiFi 0.7
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deeper Ecosystem Integration – New Processors
Processor Description
Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively
JoltTransformJson Manipulate JSON data on the fly, with a preview functionality
GenerateTableFetch Incremental fetch + parallel fetch against source table partitions
PutHiveQL Ingest to Hive tables
SelectHiveQL Select from Hive tables
PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API
CovertAvroToORC Format conversation, Avro to ORC
Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SOURCES REGIONALINFRASTRUCTURE
COREINFRASTRUCTURE
Data Movement Management
ConstrainedHigh-Latency
Localized Context
Hybrid – Cloud/On-PremiseLow-Latency
Global Context
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow (HDF)
Constrained High-latency Localized context
Hybrid – cloud/on-premises Low-latency Global context
SOURCES REGIONAL INFRASTRUCTURE
CORE INFRASTRUCTURE
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management
Detailed Break Down of Requirements
Req 1: Acquire data from various Wearable Device’s Cloud Instances
Req 2: Move Data from Customer Cloud Instances to on-premise instance
Req 3: Perform intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.
Req 4: Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.
Req 5: Parse the device data to standardized format that downstream sysem can understand
Req 6: Enrich the data with contextual information including patient/customer info (age, gender, etc..)
Req 7: Recognize the pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.
Req 8: Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.
Stream Processing & Analytics
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Modernized UI
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modernized UI – Complete Interface Redesign
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect Components to design your data flow
Component What for?Processor Purpose built processing unit e.g. GetXXX, PutXXXInput Port Receiving data endpoint btw Process Groups (local/remote)Output Port Exposing data endpoint btw Process Groups (local/remote)Process Group Must have, to design well structured data flowRemote Process Group Enable data transfer btw NiFi deployments via Site-to-SiteFunnel Bundle multiple relationships into oneTemplate Share part of data flowLabel Useful to visually group processors, and description
From left to right
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Provenance
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Multitenant Authorization
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 0.x - Authorization Model
Previously had role based authorization– Dataflow Manager (DFM)– Monitor – Provenance– Admin– Proxy– NiFi
Limitation - All or nothing model– DFM can change everything, Monitor can change nothing– Can’t give a user ability to modify/view only certain components– Would require standing up multiple NiFi instances
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 - Authorization Model
NiFi 1.0 introduces a new delegated authorization model Authorize each request based on user identity, action, and resource
– Example for user1 modifying properties on processor1: • User Identity: user1• Action: WRITE• Resource: processor1 (uuid)
If authorizer says resource not found, parent is checked… if parent isn’t found, parent’s parent is checked, and so on…
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – NiFi Managed Authorizer vs. External Authorizer
Managed Authorizer– File based persistence
• Could be be extended to other persistence mechanisms– NiFi UI to manage policies– NiFi controls authorization logic
External Authorizer– Ranger integration– Ranger UI to manage policies– Ranger controls authorization logic
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Managing Users
Clicking the new user icon allows the admin to create Users and Groups– Individual Users can be grouped– Groups can be assigned
members
Clicking the edit user icon allows the admin to update a specific User/Group
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – UI OverviewUsers Icon in Global
Menu used to accessUsers/Groups
Lock Icon in GlobalMenu used to
accessGlobal policies
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – UI Overview
Lock Icon in palette used to access
policies for currently selected component
Selection Context
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Overriding Component Policies
Component inherit policies from the closest ancestor Process Group with policies defined
View/Modify policies handled independently
Click Override to define a new policy, then add Users and Groups
New Users and Groups override the inherited policies (whitelisting)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 - Multi-Tenancy Example
Create a Group for Team 1 and a Group for Team 2 Give Team 1 view & modify for Process Group 1 Give Team 2 view & modify for Process Group 2 A user from Team 1 would see:
Can’t see the name of the group and can’t right-click to configure the group, but can enter the group
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Revisions
Revision per component Supports concurrent editing of different components without need for refreshing
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Zero Master Clustering
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 0.x: NCM (NiFi Cluster Manager)
NCM
Node1
Node2
ExternalData Source
Chunk
Chunk
Chunk
Distribution mechanismdepends on data source
Web UI
OtherNiFi
Interact with NCM
Site-to-Site:Get topology from NCMThen transfer data p2p
Primary
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: ZMC (Zero Master Clustering)
Node1
Node2
Node3
ExternalData Source
Chunk
Chunk
Chunk
Distribution mechanism depends on data source
Web UI
OtherNiFi
Interact with any node
Site-to-Site:Get topology from one of nodes
Then transfer data p2pZookeeper
Primary
Coordinator
Zookeeper electsCluster Coordinator and Primary node
Any node can fail
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: And More!
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Foundational Work for SDLC Deterministic template export
– Deterministic ordering, template xml file
– Version control of the template
– Collaborative SDLC effort
Variable registry
– Phase one implementation
– In-memory variable registry
– The same key referenced in a template, mapped to different environmental
specific values
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
JVM
REST API
NiFi
Framework
Proc CS Report Task
Extension API
S2S API
JVM
S2S Client Libraries
Site-to-Site Refactoring – S2S HTTP(S) Protocol through Proxy Server
Socket protocol: TCP
HDF 2.0: HTTP(s) protocol
HTTP proxy
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery Data buffering
‒ Backpressure‒ Pressure release
Prioritized queuing Flow specific QoS
‒ Latency vs. throughput‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine-grained history
Designed for extension
Different from Apache NiFi Design and Deploy Warm re-deploys
Key Features
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common issues
Hbase Connection Issues - ClassNotFoundException NiFi SSL issues ExecuteSQL Processor issues NiFi Content Repo full PutKafka/GetKafka issues Issues after enabling Kerberos OutOfMemory Issues
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interesting Issues/Use Cases
TBD (need to add 2-3 interesting issues/use cases)
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best Practices
Debug Logging in case of Processor issues
NiFi Site-to-Site Practices
Core Properties tuning
JVM tuning
Understanding health via NiFi UI
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s Next
Framework extension– Distributed data durability (HA
data)– Configuration management flows
(SDLC) Enhanced User Experience
– Template/Extension Registry– Variable Registry
Deeper ecosystem integration
Central Command and Control Native Agent (GA)
NiFi MiNiFi
https://cwiki.apache.org/confluence/display/NIFI/Product+requirements
Nifi product requirements Search!
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You