Top Banner
Joey Echeverria | April 13, 2015 Integrating Event Streams and File Data with Apache Flume and Apache NiFi
97

Integrating Event Streams and File Data with Apache Flume and ...

Feb 13, 2017

Download

Documents

vanphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrating Event Streams and File Data with Apache Flume and ...

Joey Echeverria | April 13, 2015

Integrating Event Streams and File Data with Apache Flume and Apache NiFi

Page 2: Integrating Event Streams and File Data with Apache Flume and ...

Data integration

Page 3: Integrating Event Streams and File Data with Apache Flume and ...

Data integration

•  Multiple data source

Page 4: Integrating Event Streams and File Data with Apache Flume and ...

Data integration

•  Multiple data source •  Questions

Page 5: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

Page 6: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources

Page 7: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources –  Format

Page 8: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources –  Format –  Schema

Page 9: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources –  Format –  Schema –  Protocol

Page 10: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources –  Format –  Schema –  Protocol –  Batchiness

Page 11: Integrating Event Streams and File Data with Apache Flume and ...

Challenges

•  Unique sources –  Format –  Schema –  Protocol –  Batchiness

•  Big data

Page 12: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

Page 13: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  In so far as anything with Apache Hadoop can be called “traditional”

Page 14: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Identify source class

Page 15: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Identify source class –  Event streams

Page 16: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Identify source class –  Event streams –  Database tables

Page 17: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Identify source class –  Event streams –  Database tables –  Files

Page 18: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Map class to system

Page 19: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Map class to system –  Event streams è Apache Flume

Page 20: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Map class to system –  Event streams è Apache Flume –  Database tables è Apache Sqoop

Page 21: Integrating Event Streams and File Data with Apache Flume and ...

Traditional (Hadoop) approach

•  Map class to system –  Event streams è Apache Flume –  Database tables è Apache Sqoop –  Files è hdfs dfs -put?

Page 22: Integrating Event Streams and File Data with Apache Flume and ...

Integrate in the repository

Page 23: Integrating Event Streams and File Data with Apache Flume and ...

Integrate in the repository

•  Ingest raw data

Page 24: Integrating Event Streams and File Data with Apache Flume and ...

Integrate in the repository

•  Ingest raw data –  Raw database tables?

Page 25: Integrating Event Streams and File Data with Apache Flume and ...

Integrate in the repository

•  Ingest raw data –  Raw database tables? –  Raw events?

Page 26: Integrating Event Streams and File Data with Apache Flume and ...

Integrate in the repository

•  Ingest raw data –  Raw database tables? –  Raw events?

•  MapReduce jobs for ETL

Page 27: Integrating Event Streams and File Data with Apache Flume and ...

Use case

Page 28: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Completely contrived for this presentation, but maybe you really want to do this

Page 29: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources

Page 30: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources –  Twitter fire hose

Page 31: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources –  Twitter fire hose*

*1%

Page 32: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources –  Twitter fire hose* –  My tweet archive

*1%

Page 33: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources –  Twitter fire hose* –  My tweet archive

•  Goal

*1%

Page 34: Integrating Event Streams and File Data with Apache Flume and ...

Use case

•  Data sources –  Twitter fire hose* –  My tweet archive

•  Goal –  Identify the user most similar to me

*1%

Page 35: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Page 36: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er  

Tweet  Archive   HDFS  

Page 37: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er  

Tweet  Archive  

Flume  

HDFS  

Page 38: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er  

Tweet  Archive  

Flume  

HDFS  

Twi$er  Source  

Page 39: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er  

Tweet  Archive  

Flume  

HDFS  

Twi$er  Source   Channel  

Page 40: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er   Twi$er  Source   Channel   HDFS  

Sink  

HDFS  Tweet  Archive  

Flume  

Page 41: Integrating Event Streams and File Data with Apache Flume and ...

(Mostly) traditional solution

Twi$er   Twi$er  Source   Channel   HDFS  

Sink  

HDFS  Tweet  Archive  

Kite  CLI  

Flume  

Page 42: Integrating Event Streams and File Data with Apache Flume and ...

Demo

Page 43: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

Page 44: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems

Page 45: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems –  Separate monitoring

Page 46: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems –  Separate monitoring –  Separate failure modes

Page 47: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems –  Distinct monitoring –  Distinct failure modes –  Distinct debugging

Page 48: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems –  Distinct monitoring –  Distinct failure modes –  Distinct debugging

•  Manual integration

Page 49: Integrating Event Streams and File Data with Apache Flume and ...

Drawbacks

•  Two ingest systems –  Distinct monitoring –  Distinct failure modes –  Distinct debugging

•  Manual integration –  Kite CLI with cron

Page 50: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache NiFi

Page 51: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache NiFi

Page 52: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

Page 53: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

•  You control all the parts

Page 54: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

•  You control all the parts –  Protocols

Page 55: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

•  You control all the parts –  Protocols –  Schemas

Page 56: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

•  You control all the parts –  Protocols –  Schemas –  Formats

Page 57: Integrating Event Streams and File Data with Apache Flume and ...

Bounded context

•  You control all the parts –  Protocols –  Schemas –  Formats –  Changes

Page 58: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

Page 59: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

•  Generic data flow

Page 60: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

•  Generic data flow •  Built-in editor/monitor

Page 61: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

•  Generic data flow •  Built-in editor/monitor •  Varying object size

Page 62: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

•  Generic data flow •  Built-in editor/monitor •  Varying object size •  Traditional sources

Page 63: Integrating Event Streams and File Data with Apache Flume and ...

NiFi strengths

•  Generic data flow •  Built-in editor/monitor •  Varying object size •  Traditional sources

–  Files, FTP, SFTP, HTTP, etc.

Page 64: Integrating Event Streams and File Data with Apache Flume and ...

NiFi limitations

Page 65: Integrating Event Streams and File Data with Apache Flume and ...

NiFi limitations

•  Streaming sources

Page 66: Integrating Event Streams and File Data with Apache Flume and ...

NiFi limitations

•  Streaming sources –  ListenHttp

Page 67: Integrating Event Streams and File Data with Apache Flume and ...

NiFi limitations

•  Streaming sources –  ListenHttp –  ListenUdp

Page 68: Integrating Event Streams and File Data with Apache Flume and ...

NiFi limitations

•  Streaming sources –  ListenHttp –  ListenUdp –  GetKafka

Page 69: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache Flume

Page 70: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache Flume

•  Streaming from the start

Page 71: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache Flume

•  Streaming from the start •  Rich set of sources/sinks

Page 72: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache Flume

•  Streaming from the start •  Rich set of sources/sinks

–  Apache Avro, Apache Thrift, Twitter, NetCat, Syslog

Page 73: Integrating Event Streams and File Data with Apache Flume and ...

Enter Apache Flume

•  Streaming from the start •  Rich set of sources/sinks

–  Apache Avro, Apache Thrift, Twitter, NetCat, Syslog –  HDFS, IRC, Hbase, Kite

Page 74: Integrating Event Streams and File Data with Apache Flume and ...

Cake

Page 75: Integrating Event Streams and File Data with Apache Flume and ...

Cake

•  NiFi combines ingest contexts

Page 76: Integrating Event Streams and File Data with Apache Flume and ...

Cake

•  NiFi combines ingest contexts •  Flume requires static stream configuration

Page 77: Integrating Event Streams and File Data with Apache Flume and ...

Cake

•  NiFi combines ingest contexts •  Flume requires static stream configuration •  I want both

Page 78: Integrating Event Streams and File Data with Apache Flume and ...

Flume architecture

Source  

Channel  

Sink  

Page 79: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

Page 80: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink

Page 81: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink •  Event

Page 82: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink •  Event •  Channel

Page 83: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink è Processor

Page 84: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink è Processor •  Event è FlowFile

Page 85: Integrating Event Streams and File Data with Apache Flume and ...

Flume è NiFi

•  Source/Sink è Processor •  Event è FlowFile •  Channel è FlowFile Queue/Connection

Page 86: Integrating Event Streams and File Data with Apache Flume and ...

Solution

Page 87: Integrating Event Streams and File Data with Apache Flume and ...

Solution

•  NiFi processors to run Flume sources/sinks

Page 88: Integrating Event Streams and File Data with Apache Flume and ...

Solution

•  NiFi processors to run Flume sources/sinks •  Prototype

Page 89: Integrating Event Streams and File Data with Apache Flume and ...

Solution

•  NiFi processors to run Flume sources/sinks •  Prototype •  http://bit.ly/flume-processors

Page 90: Integrating Event Streams and File Data with Apache Flume and ...

Demo

Page 91: Integrating Event Streams and File Data with Apache Flume and ...

Summary

Page 92: Integrating Event Streams and File Data with Apache Flume and ...

Summary

•  Integrating data is challenging

Page 93: Integrating Event Streams and File Data with Apache Flume and ...

Summary

•  Integrating data is challenging •  Managing multiple systems adds complexity

Page 94: Integrating Event Streams and File Data with Apache Flume and ...

Summary

•  Integrating data is challenging •  Managing multiple systems adds complexity •  NiFi supports generic data flow

Page 95: Integrating Event Streams and File Data with Apache Flume and ...

Summary

•  Integrating data is challenging •  Managing multiple systems adds complexity •  NiFi supports generic data flow •  NiFi can be extended to solve new use cases

Page 96: Integrating Event Streams and File Data with Apache Flume and ...

Joey  Echeverria  [email protected]  @fwiffo  

Page 97: Integrating Event Streams and File Data with Apache Flume and ...

Big Data Meets IT Ops