Top Banner
When One Data Center is not Enough Guozhang Wang Strata San Jose, 2016 Building large-scale stream infrastructure across multiple data centers with Apache Kafka
35

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

Apr 21, 2017

Download

Engineering

Guozhang Wang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

When One Data Center is not Enough

Guozhang Wang Strata San Jose, 2016

Building large-scale stream infrastructure across multiple data centers with Apache Kafka

Page 2: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

2

• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion

Agenda

Page 3: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

3

Why across Data Centers?

Page 4: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

4

Why across Data Centers

• Catastrophic / expected failures

• Routine maintenance

• Geo-locality (Example: CDNs)

Page 5: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

5

Why NOT across Data Centers

• Low bandwidth (10Mbps - 1Gbps)

• High latency (50ms - 450ms)

• Much More $$$

Page 6: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

6

Why NOT across Data Centers

• … is hard and expensive

Page 7: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

7

Why NOT across Data Centers

• … is hard and expensive

• … with real-time writes? Harder

Page 8: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

8

Why NOT across Data Centers

• … is hard and expensive

• … with real-time writes? Harder

• … consistently? Oh My!

Page 9: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

9

Consistency

• Weak

• Eventual

• StrongLatency Guarantee

Page 10: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

10

Weak No Consistency

• Now you see my writes, now you don’t

• Best effort only, data can be stale

• Examples: think of “caches”, VoIP

Page 11: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

11

Eventual Consistency

• You will see my writes, … eventually

• May need to resolve conflicts (manually)

• Examples: think of “emails”, SMTP

Page 12: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

12

Strong Consistency

• You get what you write, for sure

• External > Sequential > Causal (Session)

• Examples: RDBMS, file systems

Page 13: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

13

• LAN: consistency over latency

• WAN: latency over consistency

Latency vs. Consistency

Page 14: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

14

• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion

Agenda

Page 15: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

15

Option I: Don’t do it

• Bunkerize the single data center

• Expect data loss at failures

• Examples: ??

Page 16: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

16

Option II: Primary with Hot Standby

• Failover to hot standby (maybe inconsistent)

• Window of data loss at failures

• Examples: MySQL binlog

Page 17: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

17

Option III: Active-Active

• Accepts writes in multi-DC

• Resolve conflicts (strong / week consistency)

• Examples: Amazon DynamoDB (vector clock) Google Spanner (2PC), Mesa (Paxos)

Page 18: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

18

Ordering is the Key!

Page 19: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

19

Ordering is Key

• Vector clocks: partial ordering

• Paxos, 2PC: global ordering

• Log shipping: logical ordering (per-partition)

Page 20: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Page 21: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

21

Apache Kafka

• A distributed messaging system

..that store messages as a log!

Page 22: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

22

Store Messages as a Log

4 5 5 7 8 9 10 11 12...

Producer Write

Consumer1 Reads (offset 7)

Consumer2 Reads (offset 10)

Messages

3

Page 23: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

23

Partition the Log across Machines

Topic 1

Topic 2

Partitions

Producers

Producers

Consumers

Consumers

Brokers

Page 24: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

24

ACK mode Latency On Failures

“no" no network delay some data loss

“leader" 1 network roundtrip a few data loss

“all" ~2 network roundtrips no data loss

Configurable ISR Commits

Page 25: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

25

• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion

Agenda

Page 26: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

26

Option I: Active-Passive Replication

Kafka local

producers

consumer consumer

DC 1

MirrorMaker

DC 2

Kafka replica

Page 27: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

27

Option I: Active-Passive Replication

• Async- replication across DC

• May lose data on failover

• Example: ETL to data warehouse / HDFS

Kafka local

producers

consumer consumer

DC 1

MirrorMaker

DC 2

Kafka replica

Page 28: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

28

Option II: Active-Active Replication

Kafka local

Kafka aggregate

Kafka aggregate

producers producers

consumer consumer

MirrorMakerKafka local

on DC1 failure

DC 1 DC 2

Page 29: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

29

Option II: Active-Active Replication

• Global view on agg. cluster

• Require offsets to resume

• Example: store materialization, index updates

Kafka local

Kafka agg

Kafka agg

producers producers

consumer consumer

MirrorMakerKafka local

on DC1 failure

DC 1 DC 2

Page 30: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

30

• Offsets not identical between Kafka clusters• Duplicates during failover• Partition selection may be different

• Solutions• Resume from log end offset (suitable for real-time apps)• Resume from a timestamp (ListOffsets, offset index: KIP-33)

Caveats: offsets across DCs

Page 31: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

31

Option III: Deploy across DCs

Kafka

producers producers

consumer consumer

DC 1 DC 2

Page 32: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

32

Option III: Deploy across DCs

• Multi-tenancy support• Security (0.9)

• Quota Management (0.9)

• Latency optimization• Rack-aware partition assignment (0.10)

• Read affinity (future?)

Kafka

producers producers

consumer consumer

DC 1 DC 2

Page 33: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

33

• Same region: essentially same network• asymmetric partitioning is rare, low latency• Need at least 3 DCs for Zookeeper

• Reserved instance to reduce churns• EIP for external clients, private IPs for internal communication• Reserved instance, local storage

Example: EC2 multi-AZ Deployment

Page 34: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

34

Take-aways• Multi-DC: trade-off between latency and consistency

• Kafka: replicated log streams for multihoming

Page 35: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

Thank youGuozhang | [email protected] | @guozhangwang

Meet Confluent in booth #838

Confluent University ~ Kafka training ~ confluent.io/training

Join the Stream Data Hackathon Apr 25, SFkafka-summit.org/hackathon/

Download Apache Kafka & Confluent Platform

confluent.io/download