Top Banner
Playlists at Spotif A massively scalable storage system Marcus Better Software Engineer [email protected]
17

Playlists at Spotify

Jan 22, 2018

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Playlists at Spotify

Playlists at SpotifA massively scalable storage system

Marcus BetterSoftware Engineer

[email protected]

Page 2: Playlists at Spotify

Spotif by the Numbers

‣ 75 million active users– 20 million paying subscribers

‣ 30 million songs‣ 1.5 billion playlists created‣ 6 000 servers in 4 data centres‣ Available in 58 markets

Page 3: Playlists at Spotify

Architecture overview● 400+ loosely coupled services● Backend is mostly Java● Storage options:

– Cassandra– PostgreSQL– Sparkey (our own open-source database

for static data sets)● 120+ Cassandra clusters

Page 4: Playlists at Spotify

Playlists

Page 5: Playlists at Spotify

Requirements● Over 1 billion lists● > 100k reqs/s● Collaborative editing● Concurrent changes● Offline editing

Page 6: Playlists at Spotify

Version controlPlaylists as versioned objectsStore all changesChanges are immutable!

ROOT

1,2bfd16

3,def87a

2,f7a9ba

Head revision

ADD i=0, items=[A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

List: A, B, C

List: B, A, C

List: A, C

Page 7: Playlists at Spotify

Branches

ROOT

1,2bfd16

2,81ahcd2,f7a9ba

Two heads!

Concurrent updates lead to branchingThese will be automatically merged by the system

Page 8: Playlists at Spotify

Merging

ROOT

1,2bfd16

2,81ahcd2,f7a9ba

Concurrent updates lead to branchingThese will be automatically merged by the system3,39acc 3,8a0ba

2,f7a9ba

ADD i=5, [A] REM i=2, len=3

ADD i=2, [A]REM i=2, len=3

Page 9: Playlists at Spotify

Playlist data model

ROOT

1,2bfd16

3,def87a

2,f7a9ba

Head revision

ADD i=0, items=[A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

List: A, B, C

List: B, A, C

List: A, C

Typical requests“Give me all changes since rev 2”

“Give me the latest snapshot of the playlist”

Page 10: Playlists at Spotify

Playlist changes● Column family playlist_change stores

changes● Row key = playlist ID● Column name = revision ID

Row key 1,2bfd16 2,f7a9ba 3,def87a

spotify:user:mbetter:playlist:1234

ADD i=0, [A,B,C] MOV from=1, to=0, len=1 REM from=0, len=1

Page 11: Playlists at Spotify

Head pointers● Column family playlist_head stores head

pointers

Row key 3,def87a

spotify:user:mbetter:playlist:1234 <empty>

Page 12: Playlists at Spotify

Snapshot cache● playlist_change works well for syncing● Not so well for fetching new playlists● Snapshot cache

Row key snapshot

spotify:user:mbetter:playlist:1234 [A, C]

Page 13: Playlists at Spotify

Full data model

playlist_snapshot snapshot

playlist:1234 [A, C]

playlist_change 1,2bfd16 2,f7a9ba 3,def87a

playlist:1234ADD i=0, [A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

playlist_head 3,def87a

playlist:1234 <empty>

Page 14: Playlists at Spotify

The playlist cluster‣ 90 Cassandra nodes‣ 18 service hosts‣ Uses FusionIO solid-state drives‣ 30 TB of data‣ 1.5 billion playlists‣ 170k reqs/s at peak globally‣ 50 playlists created every second

Page 15: Playlists at Spotify

Pain points (ouch!)‣ Repairs‣ JVM garbage collection‣ Tombstones‣ Bulk ingestion

Page 16: Playlists at Spotify

Open source from SpotifGet yours on spotify.github.io!– Cassandra Reaper – automates repairs– Cassandra Ops Tools– hdfs2cass – bulk load data into Cassandra– Heroic – time series database backed by Cassandra

Other contributions:– Date-tiered compaction strategy (DTCS)

Page 17: Playlists at Spotify

Thank you!Questions?

We're hiring!https://www.spotify.com/jobsTwitter: @SpotifyEng