Top Banner
Kafka & Hadoop Gwen Shapira / Software Engineer
25

Kafka and Hadoop at LinkedIn Meetup

Jul 14, 2015

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kafka and Hadoop at LinkedIn Meetup

Kafka & Hadoop

Gwen Shapira / Software Engineer

Page 2: Kafka and Hadoop at LinkedIn Meetup

2©2014 Cloudera, Inc. All rights reserved.

• 15 years of moving data around

• Formerly consultant

• Now Cloudera Engineer:– Sqoop Committer

– Kafka

– Flume

About Me

Page 3: Kafka and Hadoop at LinkedIn Meetup

3©2014 Cloudera, Inc. All rights reserved.

There’s a book on that!

Page 4: Kafka and Hadoop at LinkedIn Meetup

4©2014 Cloudera, Inc. All rights reserved.

We are also blogging

Page 5: Kafka and Hadoop at LinkedIn Meetup

6

Getting Data from Kafka to Hadoop

There are only bad options.

It's about finding the best one.

©2014 Cloudera, Inc. All rights reserved.

Page 6: Kafka and Hadoop at LinkedIn Meetup

7

Batch

©2014 Cloudera, Inc. All rights reserved.

Page 7: Kafka and Hadoop at LinkedIn Meetup

8©2014 Cloudera, Inc. All rights reserved.

Camus

Page 8: Kafka and Hadoop at LinkedIn Meetup

9©2014 Cloudera, Inc. All rights reserved.

Camus

ZooKeeper

Setup

Topic Offsets

Pro

cesses

HD

FS

Oth

er

Syste

ms

TaskTask

Task

In process

Avro Files

In process

Avro FilesAudit Counts

Clean Up

Kakfa

B

A

C

D

F

G H

I

E

Page 9: Kafka and Hadoop at LinkedIn Meetup

10©2014 Cloudera, Inc. All rights reserved.

Sqoop2

From

(RDBMS,

HDFS,

Hive,

Hbase)

To

(RDBMS,

HDFS,

Hbase,

Hive

Kafka)

Engine

(Webserver,

Rest API,

Repository,

MapReduce)

Client

Page 10: Kafka and Hadoop at LinkedIn Meetup

11©2014 Cloudera, Inc. All rights reserved.

NiFi!

Page 11: Kafka and Hadoop at LinkedIn Meetup

12

Mappers

HiveKa = Hive + Kafka

Hive

Storag

e

Handle

r

KafkaInputFor

mat.

getSplits()

Kafka

Get topic, partitionsand offsets

MapReduc

e

SetupMappers

Mappers

KafkaRecordRea

der

Get data

Avro

SerDe

KafkaKafka

Page 12: Kafka and Hadoop at LinkedIn Meetup

13Click to enter confidentiality information

Page 13: Kafka and Hadoop at LinkedIn Meetup

14Click to enter confidentiality information

Page 14: Kafka and Hadoop at LinkedIn Meetup

15

Streaming

©2014 Cloudera, Inc. All rights reserved.

Page 15: Kafka and Hadoop at LinkedIn Meetup

16©2014 Cloudera, Inc. All rights reserved.

Flume + Kafka = Flafka

Page 16: Kafka and Hadoop at LinkedIn Meetup

17

Sources Interceptors Selectors Channels Sinks

Flume Agent

How does work?Twitter, logs,

webserver,

Kafka…

Mask, re-format,

validate…DR, critical

Memory, file,

Kafka

HDFS,

Hbase, Solr,

Kafka

Page 17: Kafka and Hadoop at LinkedIn Meetup

18

But I just want to

get data from Kafka

to Hbase / HDFS

©2014 Cloudera, Inc. All rights reserved.

Page 18: Kafka and Hadoop at LinkedIn Meetup

19

Channels Sinks

Flume Agent

Kafka ChannelKafka! HDFS,

Hbase, Solr

Page 19: Kafka and Hadoop at LinkedIn Meetup

20

Kafka Channel

Sources Interceptors Selectors Channels

Flume Agent

Twitter, logs,

webserver,

Kafka…

Mask, re-format,

validate…DR, critical

Memory, file,

Kafka

Page 20: Kafka and Hadoop at LinkedIn Meetup

21©2014 Cloudera, Inc. All rights reserved.

SparkStreaming

Single Pass

SourceRawInput

DStreamRDD

SourceRawInput

DStreamRDD

RDD

Filter Count Print

SourceRawInput

DStreamRDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first

Batch

First

Batch

Second

Batch

Page 21: Kafka and Hadoop at LinkedIn Meetup

22©2014 Cloudera, Inc. All rights reserved.

Storm

Spout

Source

Split

words

bolts

Split

words

bolts

Spout

Split

words

bolts

Split

words

bolts

Count

Count

Count

Spout Layer Fan out Layer 1 Shuffle Layer 2

Page 22: Kafka and Hadoop at LinkedIn Meetup

23©2014 Cloudera, Inc. All rights reserved.

Retro Thoughts

Page 23: Kafka and Hadoop at LinkedIn Meetup

24©2014 Cloudera, Inc. All rights reserved.

• Data often has schema

• At least it should

• Kafka is unaware – which is good

• Need capability to figure out schema for events

• Without including it in every event

Schema

Page 24: Kafka and Hadoop at LinkedIn Meetup

25©2014 Cloudera, Inc. All rights reserved.

Kafka in Cloudera Manager

Page 25: Kafka and Hadoop at LinkedIn Meetup

Questions?