Top Banner
1 Data Pipelines Made Simple With Apache Kafka Ewen Cheslack-Postava Engineer, Apache Kafka Committer
16

Data Pipelines Made Simple with Apache Kafka

Apr 05, 2017

Download

Software

Confluent
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Pipelines Made Simple with Apache Kafka

1

Data Pipelines Made Simple With Apache KafkaEwen Cheslack-PostavaEngineer, Apache Kafka Committer

Page 2: Data Pipelines Made Simple with Apache Kafka

2

Attend the whole series!

Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent

Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent

Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product

Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent

https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/

What’s New in Apache Kafka 0.10.2 and Confluent 3.2

Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing

Page 3: Data Pipelines Made Simple with Apache Kafka

3

The Challenge: Streaming Data Pipelines

Page 4: Data Pipelines Made Simple with Apache Kafka

4

Simplifying Streaming Data Pipelines with Apache Kafka

Page 5: Data Pipelines Made Simple with Apache Kafka

5

Kafka Connect

Page 6: Data Pipelines Made Simple with Apache Kafka

6

Streaming ETL

Page 7: Data Pipelines Made Simple with Apache Kafka

7

Single Message Transforms for Kafka Connect

Modify events before storing in Kafka:• Mask sensitive information

• Add identifiers• Tag events

• Store lineage • Remove unnecessary columns

Modify events going out of Kafka:• Route high priority events to

faster data stores• Direct events to different

Elasticsearch indexes• Cast data types to match

destination

• Remove unnecessary columns

Page 8: Data Pipelines Made Simple with Apache Kafka

8

Where Single Message Transforms Fit In

Page 9: Data Pipelines Made Simple with Apache Kafka

9

Built-in Transformations

• InsertField – Add a field using either static data or record metadata• ReplaceField – Filter or rename fields• MaskField – Replace field with valid null value for the type (0, empty string, etc)• ValueToKey – Set the key to one of the value’s fields• HoistField – Wrap the entire event as a single field inside a Struct or a Map• ExtractField – Extract a specific field from Struct and Map and include only this field in results• SetSchemaMetadata – modify the schema name or version• TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful

when using a sink that needs to write to different tables or indexes based on timestamps• RegexpRouter – modify the topic of a record based on original topic, replacement string and a

regular expression

Page 10: Data Pipelines Made Simple with Apache Kafka

10

Configuring Single Message Transforms

name=local-file-sourceconnector.class=FileStreamSourcetasks.max=1file=test.txttopic=connect-testtransforms=MakeMap,InsertSourcetransforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Valuetransforms.MakeMap.field=linetransforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Valuetransforms.InsertSource.static.field=data_sourcetransforms.InsertSource.static.value=test-file-source

Page 11: Data Pipelines Made Simple with Apache Kafka

11

Why only single messages?

• Delivery guarantees!• Always provide at least once semantics• For supported connectors, provide exactly once semantics

• No additional complication: transformations happens inline with import/export

Page 12: Data Pipelines Made Simple with Apache Kafka

12

When should I use each tool?

Kafka Connect & Single Message Transforms• Simple, message at a time• Transformation can be performed inline• Transformation does not interact with

external systems

Kafka Streams• Complex transformations including

• Aggregations• Windowing• Joins

• Transformed data stored back in Kafka, enabling reuse

• Write, deploy, and monitor a Java application

Page 13: Data Pipelines Made Simple with Apache Kafka

13

Conclusion

Single Message Transforms in Kafka Connect• Lightweight transformation of individual messages• Configuration-only data pipelines• Pluggable, with lots of built-in transformations

Page 14: Data Pipelines Made Simple with Apache Kafka

14

Attend the whole series!

Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent

Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent

Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product

Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent

https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/

What’s New in Apache Kafka 0.10.2 and Confluent 3.2

Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing

Page 15: Data Pipelines Made Simple with Apache Kafka

15

Get Started with Apache Kafka Today!

https://www.confluent.io/downloads/

THE place to start with Apache Kafka!

Thoroughly tested and quality assured

More extensible developer experience

Easy upgrade path to Confluent Enterprise

Page 16: Data Pipelines Made Simple with Apache Kafka

16

Discount code: kafcom17  Use the Apache Kafka community discount code to get $50 off  www.kafka-summit.orgKafka Summit New York: May 8Kafka Summit San Francisco: August 28

Presented by