Top Banner
Introduction to Apache NiFi & Storm Jungtaek Lim
39

Introduction to Apache NiFi And Storm

Apr 13, 2017

Download

Software

Jungtaek Lim
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Apache NiFi And Storm

Introduction to Apache NiFi & Storm

Jungtaek Lim

Page 2: Introduction to Apache NiFi And Storm

WHO AM I?• Staff Software Engineer @ Hortonworks

• remote worker

• Open source prosumer

• Committer of Jedis

• PMC member of Apache Storm

• Contributor of Apache (Spark, Zeppelin, Ambari, Calcite), Redis, and so on.

• Contact

[email protected]

• Twitter / LinkedIn / Github / Facebook

• @heartsavior

Page 3: Introduction to Apache NiFi And Storm

CoreInfrastructureSources

à ConstrainedÃHigh-latencyà Localizedcontext

ÃHybrid– cloud/on-premisesà Low-latencyÃGlobalcontext

RegionalInfrastructure

DATA IN MOTION IN HORTONWORKS DATAFLOW (HDF)

Source: http://ko.hortonworks.com/products/data-center/hdf/

Page 4: Introduction to Apache NiFi And Storm

What is Apache NiFi?

Page 5: Introduction to Apache NiFi And Storm

An easy to use, powerful, and reliable system to process and distribute data.

Page 6: Introduction to Apache NiFi And Storm

History of Apache NiFi

Page 7: Introduction to Apache NiFi And Storm

• Created by the United States National Security Agency (NSA)

• originally named Niagarafiles

• In 2014 the NSA submitted the source code to Apache Software Foundation, via the NSA Technology Transfer Program, entered incubation in December 2014

• Development of Apache NiFi continued at Onyara, Inc., a start up company

• Became Apache Top-Level Project in July 2015

• Hortonworks acquired Onyara, Inc. in August 2015

Page 8: Introduction to Apache NiFi And Storm

Role of Apache NiFi

Page 9: Introduction to Apache NiFi And Storm

• Data acquisition and delivery

• Simple transformation and data routing

• Simple event processing

• End to end provenance

• Edge intelligence and bi-directional comms.

Page 10: Introduction to Apache NiFi And Storm

NOT intended to REPLACE ‘distribute computation engines’

(a.k.a streaming processing frameworks)

Page 11: Introduction to Apache NiFi And Storm

Features of Apache NiFi

Page 12: Introduction to Apache NiFi And Storm

Highly configurable

• Loss tolerant vs guaranteed delivery

• Low latency vs high throughput

• Dynamic prioritization

• Flow can be modified at runtime

• Back pressure

Page 13: Introduction to Apache NiFi And Storm

More…• Designed for extension

• Build your own processors and more

• Secure

• SSL, SSH, HTTPS, encrypted content, etc...

• Multi-tenant authorization and internal authorization/policy management

• MiNiFi subproject

• Reduce footprint to ~ 40 MB

Page 14: Introduction to Apache NiFi And Storm

What is Apache Storm?

Page 15: Introduction to Apache NiFi And Storm

A free and open source distributed realtime computation system.

Page 16: Introduction to Apache NiFi And Storm

History of Apache Storm

Page 17: Introduction to Apache NiFi And Storm

Source: http://hortonworks.com/blog/brief-history-apache-storm/

Page 18: Introduction to Apache NiFi And Storm

Concepts of Apache Storm

Page 19: Introduction to Apache NiFi And Storm

• Spout: a source of streams in a topology

• Bolt: a processing component which includes Sink

• Stream: an unbounded sequence of tuples, defined with schema

• Stream groupings: defines how that stream should be partitioned among the bolt's tasks

• Topology: the logic for a realtime application represented to a DAG

Page 20: Introduction to Apache NiFi And Storm

Core vs Trident

Page 21: Introduction to Apache NiFi And Storm

Core Trident

Computation Unit Record (tuple) Micro batch

Latency Very low (sub-seconds) High (up to batch size)Similar to Spark Streaming

Delivery Guarantee At least once Exactly once

API Compositional Declarative

Stateful Operator Supported from v1.0.0 Core feature(exactly-once)

Windowing Time (processing time, event time), CountTumbling window, Sliding window

Page 22: Introduction to Apache NiFi And Storm

Features of Apache Storm

Page 23: Introduction to Apache NiFi And Storm

• Supports number of connectors (17 connectors in master branch)

• Automatic back-pressure

• Distributed Cache

• Flux (constructing topology via yaml)

• Distributed Log Search

• Dynamic Worker Profiling

• Dynamic Log Levels

• Topology Event Inspector

• Resource Aware Scheduler

• SQL (Experimental)

Page 24: Introduction to Apache NiFi And Storm

Future of Apache StormApache Storm 2.0 and beyond

Page 25: Introduction to Apache NiFi And Storm

• Clojure to Java translation

• Unified Stream API with supporting exactly-once

• Rework Metrics feature

• Apache Beam runner

• Streaming SQL with Apache Calcite

• And more…

• Performance

• Usability

Page 26: Introduction to Apache NiFi And Storm

THANKS!Any questions?

Page 27: Introduction to Apache NiFi And Storm

Appendix A. Apache NiFi

Page 28: Introduction to Apache NiFi And Storm
Page 29: Introduction to Apache NiFi And Storm

NiFi EvaluateJsonPath / RouteOnAttribute configuration

Page 30: Introduction to Apache NiFi And Storm

NiFi PutHDFS / PublishKafka configuration

Page 31: Introduction to Apache NiFi And Storm

NiFi Queue options – Status History

Page 32: Introduction to Apache NiFi And Storm

NiFi Queue options – List queue

Page 33: Introduction to Apache NiFi And Storm

NiFi Data Provenance

Page 34: Introduction to Apache NiFi And Storm

Appendix B. Apache Storm

Page 35: Introduction to Apache NiFi And Storm

Distributed Log Search

Page 36: Introduction to Apache NiFi And Storm

Dynamic Worker Profiling

Page 37: Introduction to Apache NiFi And Storm

Dynamic Log Levels

Page 38: Introduction to Apache NiFi And Storm

Topology Event Inspector

Page 39: Introduction to Apache NiFi And Storm

Resource Aware SchedulerSource:ResourceAwareSchedulinginApacheStorm,HadoopSummitSanJose2016