Top Banner
Real-time Location Based Social Discovery using MongoDB Fredrik Björk Director of Engineering MongoSV, Dec 4th 2012
27

Real-time Location Based Social Discovery using MongoDB

Dec 18, 2014

Download

Technology

Fredrik Björk

The slides from my MongoSV 2012 presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real-time Location Based Social Discovery using MongoDB

Real-time Location Based Social Discovery using MongoDB

Fredrik BjörkDirector of Engineering

MongoSV, Dec 4th 2012

Page 2: Real-time Location Based Social Discovery using MongoDB

What is Banjo?

• The most powerful location based mobile technology that brings you the moments you would otherwise miss

• Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time

Page 3: Real-time Location Based Social Discovery using MongoDB

3

Page 4: Real-time Location Based Social Discovery using MongoDB

Stats

• Launched June 2011• 3 million users• Social graph of 400 million profiles• 50 billion connections• ~200 geo posts created per second

4

Page 5: Real-time Location Based Social Discovery using MongoDB

Why MongoDB?

• Developer friendly• Easy to maintain and scale• Automatic failover• Rapid prototyping of features• Good fit for consuming, storing and

presenting JSON data• Geospatial features out of the box

5

Page 6: Real-time Location Based Social Discovery using MongoDB

Infrastructure

• ~160 EC2 instances (75% MongoDB, 25% Redis)

• SSD drives for low latency• App servers (Sinatra & Rails) hosted on

Heroku• Mongos with authentication running on

dedicated servers

6

Page 7: Real-time Location Based Social Discovery using MongoDB

Geo tagged posts

• Consumed as JSON from social network APIs - streaming, polling & real-time callbacks

• Exposed via REST APIs as JSON to the Banjo iOS and Android apps

7

Page 8: Real-time Location Based Social Discovery using MongoDB

Schema design

8

https://twitter.com/fbjork/status/262989592561606656

Page 9: Real-time Location Based Social Discovery using MongoDB

9

> db.posts.find({ _id: ‘2:262989592561606656’ })

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, ...

}

https://twitter.com/fbjork/status/262989592561606656

• _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness

Page 10: Real-time Location Based Social Discovery using MongoDB

10

• Coordinates are stored inside an array with latitude, longitude

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, coordinates: [37.784234,-122.438212],...

}

Page 11: Real-time Location Based Social Discovery using MongoDB

11

• Friends are stored inside an array

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, coordinates: [37.784234,-122.438212],friend_ids: [8816792, 10324882, 2006261, ...]

}

Page 12: Real-time Location Based Social Discovery using MongoDB

12

Page 13: Real-time Location Based Social Discovery using MongoDB

Geospatial Indexing• Create the geo index:

13

> db.posts.ensureIndex( { coordinates: ‘2d’ } )

Page 14: Real-time Location Based Social Discovery using MongoDB

14

> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } )

{ _id: “2:809438082”, coordinates: [25.792610,-80.226100], username: “Rebecca_Boorsma”, text: “I love Miami!”, ... }

{ _id: “2:1234567”, coordinates: [25.781324,-80.431423], username: “foo”, text: “Another day, another dollar”, ... }

Find nearby posts in Miami:

Page 15: Real-time Location Based Social Discovery using MongoDB

15

Page 16: Real-time Location Based Social Discovery using MongoDB

16

> db.posts.find({ friend_ids: { $in: [2006261] })

{ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...],...

}

Find friend posts globally:

Page 17: Real-time Location Based Social Discovery using MongoDB

17

> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_ids: { $in: [2006261] })

{ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...],...

}

Find friend posts in a location:

Page 18: Real-time Location Based Social Discovery using MongoDB

Compound geo indexes• Create a compound index on coordinates

and friend_ids:

18

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )

Page 19: Real-time Location Based Social Discovery using MongoDB

19

• Fails for compound indexes with large arrays

• Geospatial indexes have a size limit of 1000 bytes

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )

Error: Key too large to index

Page 20: Real-time Location Based Social Discovery using MongoDB

Geospatial query performance

• Do we need a compound index at all?• Geospatial index is usually restrictive

enough• Problem: Array traversal (using $in) is

CPU hungry for large arrays• Solution: Pre-sharded array fields

20

Page 21: Real-time Location Based Social Discovery using MongoDB

Pre-sharded array fields

• When dealing with large arrays, i.e @BarackObama follower ids

• Partition fields using pre-sharding• shard = Hash(key) MOD shard_count• Keep array sizes in the low hundreds

21

Page 22: Real-time Location Based Social Discovery using MongoDB

22

{friends_0: [1000, 1002, 1006],friends_1: [1004],friends_2: [1001, 1003, 1005]

}

# shard_example.rb

SHARDS = 3friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS }0202120

Page 23: Real-time Location Based Social Discovery using MongoDB

23

> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_0: { $in: [1000] })

{friends_0: [1000, 1002, 1006],friends_1: [1004],friends_2: [1001, 1003, 1005]

}

Find friend posts using pre-sharding of the friend arrays:

Page 24: Real-time Location Based Social Discovery using MongoDB

Capped collections

• Good fit for storing a feed of posts for a period of time

• Eliminates need to expire old posts• Documents can’t grow• Documents can’t be deleted• Resizing collections is painful• Can’t be sharded

24

Page 25: Real-time Location Based Social Discovery using MongoDB

TTL collections

• We switched to TTL collections with MongoDB 2.2

• Deleting and growing documents is now possible

• Easier to change expiration times• Can be sharded (not by geo)

25

Page 26: Real-time Location Based Social Discovery using MongoDB

Questions

26

Page 27: Real-time Location Based Social Discovery using MongoDB

Thank you!

Available: iPhone and Android

[email protected]@fbjork