Top Banner
Data Ingestion & Distribution with Apache NiFi
35

Data ingestion and distribution with apache NiFi

Feb 21, 2017

Download

Data & Analytics

Lev Brailovskiy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data ingestion and distribution with apache NiFi

Data Ingestion & Distribution with

Apache NiFi

Page 2: Data ingestion and distribution with apache NiFi

Agenda

Introduction to NiFi

Our use case for NiFiDemoQ&A

Page 3: Data ingestion and distribution with apache NiFi

Introduction toNiFi

Page 4: Data ingestion and distribution with apache NiFi

History & Facts

Created by : NSA

Incubating : 2014

Available : 2015

Main contributors: Hortonworks

Current Stable Version : 1.1.1

Delivery Guarantees : at least once

Out of Order Processing : no

Windowing : no

Back-pressure : yes

Latency : configurable

Resource Management : native

API : REST (GUI)

Page 5: Data ingestion and distribution with apache NiFi

Ecosystem

Stream ProcessingData Moving

Page 6: Data ingestion and distribution with apache NiFi

Architecture

Page 7: Data ingestion and distribution with apache NiFi

Flow Files

Basic Abstraction● Pointer to content

● Content Attributes (key/value)

● Connection to provenance events

Page 8: Data ingestion and distribution with apache NiFi

Repositories● FlowFile

● Content

● Provenance

● Immutable

● Copy-on-write

Page 9: Data ingestion and distribution with apache NiFi

ProcessorProcessors actually perform the work of

data routing, transformation, or

mediation between systems. Processors

have access to attributes of a given

FlowFile and its content stream.

Processors can operate on zero or more

Flow Files in a given unit of work and

either commit that work or rollback

Page 10: Data ingestion and distribution with apache NiFi

Processor● Basic Work Unit

● State

● Statistics

● Settings

● Input/Output

● Provenance

● Scheduling

● Logging (bulletins)

Page 11: Data ingestion and distribution with apache NiFi

ConnectionConnections provide the actual linkage

between processors. These act as

queues and allow various processes to

interact at differing rates. These queues

can be prioritized dynamically and can

have upper bounds on load, which

enable back pressure

Page 12: Data ingestion and distribution with apache NiFi

Connection● Queue

● Statistics

● Settings

● Prioritization

● Details

Page 13: Data ingestion and distribution with apache NiFi

Process GroupSpecific set of processes and their

connections, which can receive data

via input ports and send data out via

output ports. In this manner, process

groups allow creation of entirely new

components simply by composition of

other components

Page 14: Data ingestion and distribution with apache NiFi

TemplatesTemplates tend to be highly pattern oriented and while there are often many

different ways to solve a problem, it helps greatly to be able to share those

best practices. Templates allow subject matter experts to build and publish

their flow designs and for others to benefit and collaborate on them

● XML Based

● Reusable unit

● Versioning (versioning with Git)

Page 15: Data ingestion and distribution with apache NiFi

Data ProvenanceNiFi automatically records, indexes, and makes available

provenance data as objects flow through the system even

across fan-in, fan-out, transformations, and more. This

information becomes extremely critical in supporting

compliance, troubleshooting, optimization, and other scenarios

Page 16: Data ingestion and distribution with apache NiFi

Data Provenance● Details

● Attributes

● Content

Page 17: Data ingestion and distribution with apache NiFi

Controller ServiceController Service allows

developers to share functionality

and state across the JVM in a

clean and consistent manner

● No scheduling

● No connections

● Used by Processors,

Reporting Tasks, and other

Controller Services

Page 18: Data ingestion and distribution with apache NiFi

Reporting TasksProvides a capability for reporting

status, statistics, metrics, and

monitoring information to external

services

● ElastichSearchProvenanceReporter and DataDogReportingTask

Page 19: Data ingestion and distribution with apache NiFi

Extensibility● Ready to use maven template

● Well defined interface for each component

● Classloader Isolation (.nar files)

● Great documentation for developers

Page 20: Data ingestion and distribution with apache NiFi

Statistics● 200+ built in Processors

● 10+ built Control Services

● 10+ built in Reporting Tasks

Page 21: Data ingestion and distribution with apache NiFi

Introduction Summary● Processor

● Connection

● Processing Group

● Template

● Controller Service

● Reporting Task

Page 22: Data ingestion and distribution with apache NiFi

Our use case forNiFi

Page 23: Data ingestion and distribution with apache NiFi

What was before● Inhouse built file collector

● Footprint of 10 server

● Hard to manage, scale, extend

Page 24: Data ingestion and distribution with apache NiFi

DWH Real Time

Page 25: Data ingestion and distribution with apache NiFi

DWH Batch

Page 26: Data ingestion and distribution with apache NiFi

Reports Distribution

Page 27: Data ingestion and distribution with apache NiFi

Statistics

20TBData Ingested Daily

250KFiles Ingested Daily

Near Real Time Data AvailabilityMinimum Interval :1 min

1 TBData Distributed Reports

1 TB

30KFiles Exported Daily

Page 28: Data ingestion and distribution with apache NiFi

AWS - Hadoop Ingestion

Page 29: Data ingestion and distribution with apache NiFi

AWS - Hadoop Ingestion

Page 30: Data ingestion and distribution with apache NiFi

Kafka Reprocessing

Page 31: Data ingestion and distribution with apache NiFi

sFTP - HDFS Ingestion

Page 32: Data ingestion and distribution with apache NiFi

Let’s break something ;)

Page 33: Data ingestion and distribution with apache NiFi

Use Cases Summary● Web User Interface

● Configurable

● Scalable

● Easy to Manage

● Designed for Extension

Page 34: Data ingestion and distribution with apache NiFi

Q & A

Page 35: Data ingestion and distribution with apache NiFi

THANKYOU