2 App Designs Using MongoDB
May 12, 2015
2 App DesignsUsing MongoDB
Kyle [email protected]
@hwaet
What we'll cover:Brief intro
Feed reader
Website analyitcs
Questions
Intro
MongoDBRich data model
Replication and automatic failover
Sharding
Use casesSocial networking and geolocation
E-commerce
CMS
Analytics
Production deployments: http://is.gd/U0VkG6
Demo
Feed reader
Four collectionsUsers
Feeds
Entries
Buckets
Users{ _id: ObjectId("4e316f236ce9cca7ef17d59d"), username: 'kbanker', feeds: [ { _id: ObjectId("4e316f236ce9cca7ef17d54c"), name: "GigaOM" }, { _id: ObjectId("4e316f236ce9cca7ef17d54d"), name: "Salon.com" } { _id: ObjectId("4e316f236ce9cca7ef17d54e"), name: "Foo News" } ], latest: Date(2011, 7, 25)}
Indexdb.users.ensureIndex( { username: 1 }, { unique: true } )
Feeds{ _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), url: 'http://news.foo.com/feed.xml/', name: 'Foo News', subscriber_count: 2, latest: Date(2011, 7, 25)}
Indexdb.feeds.ensureIndex( { url: 1 }, { unique: true
Adding a feed subscription// Upsertdb.feeds.update( { url: 'http://news.foo.com/feed.xml/', name: 'Foo News'}, { $inc: {subscriber_count: 1} }, true )
Adding a feed subscription// Add this feed to user feed listdb.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $addToSet: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } })
Removing a feedsubscription
db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $pull: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } } )
Removing a feedsubscription
db.feeds.update( {url: 'http://news.foo.com/feed.xml/'}, { $inc: {subscriber_count: -1} } )
Entries (populate inbackground)
{ _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Important person to resign', body: 'A person deemed very important has decided...', reads: 5, date: Date(2011, 7, 27)}
What we need nowPopulate personal feeds (buckets)
Avoid lots of expensive queries
Record what's been read
Without bucketing// Naive query runs every timedb.entries.find( feed_id: {$in: user_feed_ids}).sort({date: 1}).limit(25)
With bucketing// A bit smarter: only runs onceentries = db.entries.find( { date: {$gt: user_latest }, feed_id: { $in: user_feed_ids } ).sort({date: 1})
Indexdb.entries.ensureIndex( { date: 1, feed_id: 1} )
bucket = { _id: ObjectId("4e3185c26ce9cca7ef17d552"), user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: Date( 2011, 7, 27 ) n: 100, entries: [ { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Important person to resign', body: 'A person deemed very important has decided...', date: Date(2011, 7, 27), read: false }, { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Panda bear waves hello', body: 'A panda bear at the local zoo...', date: Date(2011, 7, 27), read: false } ]}
db.buckets.insert( buckets )db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") } { $set: { latest: latest_entry_date } })
Viewing a personal feed// Newestdb.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d") }).sort({date: -1}).limit(1)
Viewing a personal feed// Next newest (that's how we paginate)db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: { $lt: previous_reader_date } }).sort({date: -1}).limit(1)
Indexdb.buckets.ensureIndex( { user_id: 1, date: -1 } )
Marking a feed itemdb.buckets.update( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), 'entries.id': ObjectId("4e316b8a6ce9cca7ef17d5c8")}, { $set: { 'entries.$.read' : true })
Marking a feed itemdb.entries.update( { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8") }, { $inc: { read: 1 } })
Sharding note:Buckets collection is eminently shardable
Shard key: { user_id: 1, date: 1 }
Websiteanalyitcs
Challenges:Real-time reporting.
Efficient use of space.�
Easily wipe unneeded data.
TechniquesPre-aggregate the data.
Pre-construct document structure.
Store emphemeral data in a separatedatabase.
Two collections:Each collection gets its own database.
Collections names are time-scoped.
Clean, fast removal of old data.
// Collections holding totals for each day, stored// in a database per monthdays_2011_5days_2011_6days_2011_7...// Totals for each month...months_2011_1_4months_2011_5_8months_2011_9_12...
Hours and minutes{ _id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), day: '2011-5-1' }, total: 2820, hrs: { 0: 500, 1: 700, 2: 450, 3: 343, // ... 4-23 go here } // Minutes are rolling. This gives real-time // numbers for the last hour. So when you increment // minute n, you need to $set minute n-1 to 0. mins: { 1: 12, 2: 10, 3: 5, 4: 34 // ... 5-60 go here }}
Updating daydocument...
// Update 'days' collection at 5:37 p.m. on 2011-5-1// Might want to queue increments so that $inc is greater// than 1 for each write...id = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), day: '2011-5-1' };update = { $inc: { total: 1, 'hrs.17': 1, 'mins.37': 1 }, $set: { 'mins.36': 0} };// Update collection containing days 1-5db.days_2011_5.update( id, update );
Months and days{ _id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), month: '2011-5' }, total: 34173, months: { 1: 4000, 2: 4300, 3: 4200, 4: 5000, 5: 5100, 6: 5700, 7: 5873 // ... 8-12 go here }}
Updating monthdocument...
// Update 'months' collection at 5:37 p.m. on 2011-5-1query = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), month: '2011-5' };update = { $inc: { total: 1, 'months.5': 1 } };// Update collection containing days 1-5db.month_2011_1_4.update( query, update );
ReportingMust provide data at multiple resolutions(second, minute, etc.).
We have the raw materials for that.
Application assembles the dataintelligently.
Summary
Feed ReaderRich documents
Incremental modifiers�
Bucketing strategy
Website AnalyticsPre-aggregate data
Time-scoped databases and collections
http://www.10gen.com/presentations
http://manning.com/banker
Thank you