Top Banner
February 5, 2014 Pablo Barrera <[email protected]> Reliable RT processing @ Spotify
69

Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Jan 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

February 5, 2014

Pablo Barrera <[email protected]>

Reliable RT processing @ Spotify

Page 2: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Spotify

Page 3: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�3

Spotify

• the right music for every moment • over 6 million paying customers • over 24 million active users each month • over 20 million songs • over 1.5 billion playlists created so far • available in 55 markets

Page 4: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

i/o tribe

responsible for building the awesome infrastructure that supports the Spotify experience

�4

Page 5: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Our goal

this looks easy

Page 6: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 7: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 8: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 9: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

That was easy !

!

MISSING FIGURE!!!

�7

Page 10: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

but we have a problem...

�8

Page 11: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Naïve approach (tm)

�9

SYSLOG FILE SCP LOG

ARCHIVER CURL HDFS PROXY HADOOP

Page 12: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�10

SCP CURL

Page 13: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�10

SCP CURL

Page 14: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Scalability�11

SCP CURL

Page 15: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Scalability�11

SCP CURLfor(;;) { (file) }

for(;;) { (file) }

Page 16: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�12

Page 17: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

We have a problem...

�13

thousands of servers several data centres millions of users

10 TB each day!

Page 18: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Our Needs !

•reliable delivery •fast data transfer •per-service subscription •low cpu overhead

�14

Page 19: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�15

Page 20: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Other options !

•active mq/rabbit mq •flume/flume-ng •others: scribe, chukwa, bookkeeper

�16

Page 21: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Apache Kafka !

!

distributed pub/sub system

Page 22: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka coolness !

•at least once read •O(1) •network bounded

�18

Page 23: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka architecture

�19

!KAFKA BROKER

TOPIC A

TOPIC B

TOPIC C

TOPIC D

TOPIC E

KAFKA PRODUCER

KAFKA CONSUMER

Page 24: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Cons !

•no reliability •no replication •manual tuning

�20

Page 25: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Spotify <3 Kafka

running in production!

Page 26: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka at Spotify !

•key component of our log delivery system •kafka 0.7.1 •java 7

�22

Page 27: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Custom extensions !

•end-to-end reliable delivery •compression/encryption service

�23

Page 28: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

End-to-end reliable delivery

Page 29: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�25

syslog file

syslog file

syslog file

Page 30: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

Page 31: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

Page 32: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMER

Page 33: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMER

Page 34: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMERACK

Page 35: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 36: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

is that all?

�26

Page 37: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Piece of cake

right?

Page 38: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�28

Page 39: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�29

Kafka Producer

Kafka Broker

Zookeeper

Kafka Consumer Hadoop

Page 40: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�29

Kafka Producer

Kafka Broker

Zookeeper

Kafka Consumer Hadoop

Cross-site problems

Page 41: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

TCP window !

•TCP parameters for big latency •linux TCP scaling algorithm

�30

Page 42: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

IPSEC !

•linux IPSEC + firewall is slow •major drop in throughput •can not tweak it at app level

�31

Page 43: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 44: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 45: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTION

Page 46: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTIONCompressed

Page 47: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTIONCompressed

Page 48: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 49: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Garbage collector !

•50% of performance drop •25% of cpu time •young generation tuning

�34

Page 50: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�35

0

20

40

60

80

100

0 2 4 6 8 10 12 14

Tim

e sp

ent o

n Fu

ll GC

(%)

Time (minutes)

% of time spent doing Full GC before tuning

Page 51: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�36

0

20

40

60

80

100

0 200 400 600 800 1000

Tim

e sp

ent o

n Fu

ll GC

(%)

Time (minutes)

% of time spent doing Full GC after tuning

Page 52: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Hadoop replication factor !

•stochastic failure mode •no real ack from Hadoop •files open for a long time

�37

Page 53: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 54: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Apache Storm !

!

distributed computation framework

Page 55: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

•abstractions: topology, bolt, stream, tuple, grouping •great community •ack + retries •but not for reliable apps

•use Hadoop instead

�40

Page 56: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka integration !

•reliable data for reporting •low latency data for RT

�41

Page 57: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

KAFKA SYSLOG

PRODUCER

Page 58: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

STORM

KAFKA SYSLOG

PRODUCER

Page 59: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

Retries

STORM

KAFKA SYSLOG

PRODUCER

Page 60: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

RT apps

Page 61: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 62: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 63: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 64: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Body copy large

Page 65: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 66: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 67: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

�49

Page 68: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

�49

Page 69: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

February 5, 2014

Thanks!

Pablo Barrera <[email protected]> !

!

Want to join the band?spotify.com/jobs