Top Banner
Apache Kafka DC Replicating DB Binary Logs to Kafka Mark Bittmann 7 April 2016
30

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Jan 16, 2017

Download

Data & Analytics

Mark Bittmann
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Apache Kafka DCReplicating DB Binary Logs to Kafka

Mark Bittmann7 April 2016

Page 2: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Agenda

Meetup Intro

Tech Overview: Kafka and Binary Logs (binlogs)

Change Data Capture Overview

Demo: binlogs -> maxwell -> kafka -> HDFS/Spark/Zeppelin + Elastic

Page 3: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

About Me• Data Scientist who leans Computer

Scientist

• Lead Data Scientist, Stackspace.io and b23.io

• PMC Member & Committer, Apache Metron (incubating)

• Contributed to Apache Spark, MLlib

• @_mbittmann_

Page 4: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

All businesses are data businesses.

Page 5: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Tech Overview: Kafka

Page 6: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Page 7: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.}

FastScalable}

Durable

http://kafka.apache.org/documentation.html

Page 8: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Design Features

• Distributed => cluster-centric design offers strong durability and fault-tolerance guarantees

• Partitioned => messages spread over a cluster of machines for streams that might exceed capacity of a single machine

• Replicated => messages persisted on disk and replicated within the cluster to prevent data loss

http://kafka.apache.org/documentation.html

Page 9: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Topics

http://kafka.apache.org/documentation.html

Page 10: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Producers/Consumers

}consumer groups for queues

http://kafka.apache.org/documentation.html

Page 11: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html

The power of Kafka lies within what you build around it.

Page 12: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

quick Kafka demo

Page 13: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Tech Overview: Binary Logs

Page 14: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

The binary log contains a record of all changes to the databases, both

data and structure.

https://mariadb.com/kb/en/mariadb/binary-log/

Page 15: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Typical Usage: Replication

http://www.cnblogs.com/fangwenyu/archive/2012/09/03/2669419.html

Page 16: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

What does a binary log look like?

It looks like binary.

Page 17: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

ROW based binlog

{"database":"bintest","table":"mytable","type":"delete","ts":1459958130,"xid":14261,"commit":true,"data":{"some_blob":"AMgyGQr/","some_text":"text object","id":98,"some_bool":0,"uuid":"fcb3a514-fc0f-11e5-841c-60f81dc2691c","some_value":0,"ts":"2016-04-06"}}

Page 18: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Implementations• MySQL/MariaDB/Aurora/Percona: binlog

• Oracle: GoldenGate

• PostgreSQL: logical decoding

• MongoDB: oplog

• CouchDB: changes feed

Page 19: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

quick binlog demo

Page 20: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

a database binary log

looks a whole lot like a commit log

Page 21: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Change Data Capture

Page 22: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

A change in data means something happened

and when something happens many applications

might want to know about it.

Page 23: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Take a snapshot of your database.

Page 24: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Your database snapshot is out of date by the time it is done

snapshotting.

Page 25: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

https://martin.kleppmann.com/2015/04/23/bottled-water-real-time-postgresql-kafka.html

Page 26: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html

Page 27: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html

Page 28: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

– all the developers

“Stupid data engineer, ain't no way I'm changing the web app.”

Page 29: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

https://martin.kleppmann.com/2015/04/23/bottled-water-real-time-postgresql-kafka.html

Page 30: Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

beta.stackspace.io

Demo• Database

• MySQL • DB Client

• Custom Python • Binlog Replicator

• Maxwell by ZenDesk

• Data Stacks (AWS)• Kafka/Zookeeper • Spark/YARN/HDFS/Zeppelin • ElasticSearch/Kibana • StreamSets