Top Banner
Kafka & Hadoop Gwen Shapira / Software Engineer
18

Kafkameetup shapira-141016070340-conversion-gate01

Jul 14, 2015

Download

Internet

alex lefur
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kafkameetup shapira-141016070340-conversion-gate01

Kafka & Hadoop

Gwen Shapira / Software Engineer

Page 2: Kafkameetup shapira-141016070340-conversion-gate01

2©2014 Cloudera, Inc. All rights reserved.

• 15 years of moving data around

• Formerly consultant

• Now Cloudera Engineer:– Flume

– Sqoop

– Kafka

About Me

Page 3: Kafkameetup shapira-141016070340-conversion-gate01

3©2014 Cloudera, Inc. All rights reserved.

There’s a book on that!

Page 4: Kafkameetup shapira-141016070340-conversion-gate01

4©2014 Cloudera, Inc. All rights reserved.

We are also blogging

Page 5: Kafkameetup shapira-141016070340-conversion-gate01

5

Getting Data from Kafka to Hadoop

There are only bad options.

It's about finding the best one.

©2014 Cloudera, Inc. All rights reserved.

Page 6: Kafkameetup shapira-141016070340-conversion-gate01

6©2014 Cloudera, Inc. All rights reserved.

Camus

Page 7: Kafkameetup shapira-141016070340-conversion-gate01

7©2014 Cloudera, Inc. All rights reserved.

Camus

ZooKeeper

Setup

Topic Offsets

Pro

cesses

HD

FS

Oth

er

Syste

ms

TaskTask

Task

In process

Avro Files

In process

Avro FilesAudit Counts

Clean Up

Kakfa

B

A

C

D

F

G H

I

E

Page 8: Kafkameetup shapira-141016070340-conversion-gate01

8©2014 Cloudera, Inc. All rights reserved.

• Kafka has no MR layer– InputFormat, OutputFormat, Utils…

• Sqoop is a generic batch ingest framework– Why no Kafka?

Missing in Action

Page 9: Kafkameetup shapira-141016070340-conversion-gate01

9©2014 Cloudera, Inc. All rights reserved.

Flume + Kafka = Flafka

Page 10: Kafkameetup shapira-141016070340-conversion-gate01

10

Sources Interceptors Selectors Channels Sinks

Flume Agent

How does work?Twitter, logs,

webserver,

Kafka…

Mask, re-format,

validate…DR, critical

Memory, fileHDFS,

Hbase, Solr,

Kafka

Page 11: Kafkameetup shapira-141016070340-conversion-gate01

11

But I just want to

get data from Kafka

to Hbase / HDFS

©2014 Cloudera, Inc. All rights reserved.

Page 12: Kafkameetup shapira-141016070340-conversion-gate01

12

Channels Sinks

Flume Agent

Kafka ChannelKafka! HDFS,

Hbase, Solr

Page 13: Kafkameetup shapira-141016070340-conversion-gate01

13©2014 Cloudera, Inc. All rights reserved.

SparkStreaming

Single Pass

SourceRawInput

DStreamRDD

SourceRawInput

DStreamRDD

RDD

Filter Count Print

SourceRawInput

DStreamRDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first

Batch

First

Batch

Second

Batch

Page 14: Kafkameetup shapira-141016070340-conversion-gate01

14©2014 Cloudera, Inc. All rights reserved.

Storm

Spout

Source

Split

words

bolts

Split

words

bolts

Spout

Split

words

bolts

Split

words

bolts

Count

Count

Count

Spout Layer Fan out Layer 1 Shuffle Layer 2

Page 15: Kafkameetup shapira-141016070340-conversion-gate01

15©2014 Cloudera, Inc. All rights reserved.

Retro Thoughts

Page 16: Kafkameetup shapira-141016070340-conversion-gate01

16©2014 Cloudera, Inc. All rights reserved.

• Data often has schema

• At least it should

• Kafka is unaware – which is good

• Need capability to figure out schema for events

• Without including it in every event

Schema

Page 17: Kafkameetup shapira-141016070340-conversion-gate01

17©2014 Cloudera, Inc. All rights reserved.

Kafka in Cloudera Manager

Page 18: Kafkameetup shapira-141016070340-conversion-gate01

18

Visit us at Booth #305

BOOK SIGNINGS THEATER SESSIONS

TECHNICAL DEMOS GIVEAWAYS