Transcript

Apache NiFi Better Analytics Demand Better Dataflow

Presented by: Joe Witt Apache NiFi PPMC Member

2  

@apachenifi

History

3  

•  Developed at NSA for over eight years

•  Donated to the Apache Software Foundation Nov 2014

•  Undergoing incubation

•  Three ASF releases to date •  Closing in on 0.2.0 release

 

The problem space: Enterprise Dataflow

4  

Automate the flow of data from any source

…to systems which extract meaning and insight

…and to those that store and make it available for users

Use Cases for NiFi

5  

•  Remote sensor delivery

•  Inter-site / global distribution

•  Intra-site distribution

•  ‘Big Data’ ingest

•  Data Processing (enrichment, filtering, sanitization)  

The challenges we faced

6  

•  Transport / Messaging was not enough

•  Needed to understand the big picture

•  Needed the ability to make *immediate* changes

•  Must maintain chain of custody for data •  Rigorous security and compliance requirements  

Why transport and messaging was not enough?

7  

•  Data access exceeded resources to transport

•  Decoupling systems is about more than the connectivity

•  Message sizes ranged from B to GB

•  Not all data is created equal

•  Needed precise security controls •  SSL and topic level authorization insufficient

 

 The basic building blocks

Real-time Command and Control

The Power of Provenance

8  

Apache NiFi Foundational Concepts

2

3

1

HEADER  -­‐  UUID  -­‐  Name  -­‐  Size  -­‐  Entry  Time  

           A3ributes  Map                [[Key  |  Value]]  

CONTENT  

Flow File

9  

•  Types •  Events •  Objects •  Files •  Messages •  Media

•  Formats •  JSON •  Avro •  Text •  Mp4 •  Proprietary

•  Sizes •  Bytes to GBs

Flow File Processor

10  

• Routing •  Context •  Content

• Transformation •  Enrich •  Obfuscate •  Filter •  Convert •  Analyze •  Split •  Aggregate

• Mediation •  Push / Pull • …

Connections

11  

• Queuing • Back Pressure • Expiration

• Prioritize

• Swapping

Flow Controller

12  

NiFi Architecture

13  

NiFi Clustering Model

14  

Tighten the feedback loop •  Changes have consequences (good or bad) •  And you see them as they occur

Continuous Improvement •  Compare real-time vs. historical statistics •  View data provenance •  View Content at any stage Intuitive user experience •  Visual programming •  Logical flow graph

15  

Real-time command and control 2

Latency Optimization •  Intra process •  Inter process •  End-to-end Compliance •  Prove handling •  Assess impact Understanding •  Step through time •  View content •  View Context

16  

The Power of Provenance – Chain of custody for data 3

17  

Demo

Flow File Repo – Write Ahead Log Content Repo

Add more partitions Input/Output Streams

Copy on Write Pass by Reference Allow tradeoffs of latency vs throughput

18  

How fast is it and why?

- User to System and System to System -  Authentication (2-Way SSL, more coming…)

-  Authorization (pluggable)

-  Authorize a specific piece of data to a specific system

-  Data provenance -  Prove you have done the right thing -  Recover when you have not

19  

How does it deal with security?

Web UI Push API

Reporting Tasks (ganglia, graphite, etc…) Pull API

REST API

20  

How can I monitor this at runtime?

Flow File Processors Advanced UI

Flow File Prioritizer Reporting Tasks Controller Services Build Clients against our REST API

21  

What are the points of extension?

Status and direction for NiFi

22  

Efficient use of each node -  100s of MB/s per node -  100Ks transactions/s per node Simple / Effective scaling model Runtime Command and Control Data Provenance  

Distributed durability of data - Maybe Kafka backed queues High Availability Cluster Manager Live / Rolling Upgrades Provenance Query Language / Reporting A complete user experience enabled by provenance

Existing Strengths Roadmap Highlights

Apache NiFi (incubating) site http://nifi.incubator.apache.org Subscribe to and collaborate at dev@nifi.incubator.apache.org users@nifi.incubator.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI @apachenifi  

23  

Learn more about Apache NiFi

top related