Using NoSQL with Yo' SQL
Post on 11-Nov-2014
2174 Views
Preview:
DESCRIPTION
Transcript
Using NoSQL with Yo’ SQLSupplementing your app with a slice of MongoDB
Rich ThornettDribbble
Thursday, June 9, 2011
DribbbleWhat are you working on?
Show and tell for creatives via screenshots
Thursday, June 9, 2011
Your Father's WebappDribbble is a typical web application:
Ruby on Rails + Relational Database
We <3 PostgreSQL
But for certain tasks ...
Thursday, June 9, 2011
Alternative Values
More flexible data structures
Easier horizontal scaling
log | scale | optimize | aggregate | cache
Thursday, June 9, 2011
NoSQLNo == Not Only(but sounds a bit stronger, no?)
• No: Fixed table schemas• No: Joins• Yes: Scale horizontally
ExamplesMemcached, Redis, CouchDB, Cassandra, MongoDB ...
Thursday, June 9, 2011
Exploring MongoDB• Persistent data store• Powerful query language (closest to RDBMs)• Broad feature set• Great community and documentation
Utility belt that fits us?
Thursday, June 9, 2011
What is MongoDB?A document-oriented NoSQL database
Collections & Documentsv.
Tables & Rows
Thursday, June 9, 2011
What's a document?Our old friend JavaScript
{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"
},tags: ['can', 'haz', 'arrays']
}
Documents are BSON (binary encoded JSON)
Thursday, June 9, 2011
Embedded DocumentsAvoid joins for "belongs to" associations
{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"
},tags: ['can', 'haz', 'arrays']
})
Thursday, June 9, 2011
Arrays
{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"
},tags: ['can', 'haz', 'arrays']
})
Avoid joins for "tiny relations"
thing tagsthing_taggings
Relational Cruft
Thursday, June 9, 2011
Googley“With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed
(read parallel execution of) queries/updates/inserts/deletes.”
--Markus Gattolhttp://www.markus-gattol.name/ws/mongodb.html
Thursday, June 9, 2011
Replica Sets
• Read Scaling
• Data Redundancy
• Automated Failover
• Maintenance
• Disaster Recovery
Automate the storing of multiple copies of data
Thursday, June 9, 2011
Dude, who sharded?Relax, not you.
YouSpecify a shard key for a collection
MongoPartitions the collection across machines
ApplicationBlissfully unaware (mostly :)
Auto-sharding
Thursday, June 9, 2011
CoSQL
Cachin
gAnalytics
LoggingScali
ng
Flexibility
MongoDB
MIND THE APP
WEBAPP
RDBMS
Thursday, June 9, 2011
Ads
Let's Mongo!
• Orthogonal to primary app
• Few joins
• Integrity not critical
Thursday, June 9, 2011
From the Console
db.ads.insert({advertiser_id: 1,type: 'text',url: 'http://dribbbler-on-the-roof.com',copy: 'Watch me!',runs: [{start: new Date(2011, 4, 7),end: new Date(2011, 4, 14)
}],created_at: new Date()
})
Create a text ad
But there are drivers for all major languages
Thursday, June 9, 2011
QueryingQuery by match
db.ads.find({advertiser_id: 1})
Paging active ads// Page 2 of text ads running this monthdb.ads.find({ type: 'text',runs: {$elemMatch: {start: {$lte: new Date(2011, 4, 10)},end: {$gte: new Date(2011, 4, 10)}
}}
}).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011
Advanced Queries$gt$lt$gte$lte$all$exists
$size$type$elemMatch$not$where
$mod$ne$in$nin$nor$or
http://www.mongodb.org/display/DOCS/Advanced+Queries
count | distinct | groupGroup does not work across shards, use map/reduce instead.
Thursday, June 9, 2011
Polymorphism// Banner ad has additional fieldsdb.ads.insert({
advertiser_id: 1,type: 'banner',url: 'http://dribbble-me-this.com',copy: 'Buy me!',runs: [],image_file_name: 'ad.png',image_content_type: 'image/png',image_file_size: '33333'
})
Easy inheritance. Document has whatever fields it needs.
Single | Multiple | Joinedtable inheritance all present difficulties
No DB changes to create new subclasses in MongoThursday, June 9, 2011
Logging
• Scale and query horizontally
• Add fields on the fly
• Writes: Fast, asynchronous, atomic
Thursday, June 9, 2011
Volume Logging
• Ad impressions
• Screenshot views
• Profile views
Fast, asynchronous writes and sharding FTW!
Thursday, June 9, 2011
Real-time Analyticsdb.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment 'user.simplebits.likes_received': 1, 'country.us.likes_received': 1, 'city.boston.likes_received': 1 } }, true // upsert)
What people and locations are trending this hour?
upsert: Update document (if present) or insert it$inc: Increment field by amount (if present) or set to amount
Thursday, June 9, 2011
Flex Benefits
• Add/nest new fields to measure with ease
• Atomic upsert with $incReplaces two-step, transactional find-and-update/create
• Live, cached aggregation
Thursday, June 9, 2011
Scouting
Thursday, June 9, 2011
db.users.insert( { name: 'Dan Cederholm',
available: true,skills: ['html', 'css', 'illustration', 'icon design'] }
)
Design a Designer
Thursday, June 9, 2011
db.users.ensureIndex({location: '2d'})db.users.insert( { name: 'Dan Cederholm',
// Salem longitude/latitudelocation: [-70.8972222, 42.5194444],available: true,skills: ['html', 'css', 'illustration', 'icon design'] }
)
Geospatial Indexing
Thursday, June 9, 2011
Search by Location
// Find users in the Boston area who:// are available for work// have expertise in HTML and icon designdb.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: ['html', 'icon design']}})
Within area// $maxDistance: Find users in Boston area (w/in 50 miles)db.users.find({location: {$near: boston, $maxDistance: 0.7234842}})
Within area, matching criteria
boston = [-71.0602778, 42.3583333] // long/lat
Thursday, June 9, 2011
Search PowerFlexible Documents
+Rich Query Language
+Geospatial Indexing
Thursday, June 9, 2011
Stats
Thursday, June 9, 2011
Unique Views
unique = remote_ip address / DAY
a.k.a visitors per day
Thursday, June 9, 2011
CollectionsInput and output
MapReturns 0..N key/value pairs per document
ReduceAggregates values per key
Aggregate by key => GROUP BY in SQL
Map/Reducehttp://www.mongodb.org/display/DOCS/MapReduce
Thursday, June 9, 2011
StrategyTwo-pass map/reduce to calculate unique visitors
Pass 1GROUP BY: profile, visitorCOUNT: visits per visitor per profile
Pass 2GROUP BY: profileCOUNT: visitors
Thursday, June 9, 2011
Profile View Data
// Profile 1{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.2'}
// Profile 2{profile_id: 2, remote_ip: '127.0.0.4'}{profile_id: 2, remote_ip: '127.0.0.4'}
Visits on a given day
Thursday, June 9, 2011
Pass 1: Map Function
map = function() { var key = {
profile_id: this.profile_id,remote_ip: this.remote_ip
};
emit(key, {count: 1});}
Count visits per remote_ip per profileKEY = profile, remote_ip
Thursday, June 9, 2011
Reduce Function
reduce = function(key, values) { var count = 0;
values.forEach(function(v) { count += v.count; });
return {count: count};}
Counts(occurrences of key)
Thursday, June 9, 2011
Pass 1: Run Map/Reduce
db.profile_views.mapReduce(map, reduce, {out: 'profile_views_by_visitor'})
// Results: Unique visitors per profiledb.profile_views_by_visitor.find(){ "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } }{ "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } }{ "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }
Count visits per remote_ip per profile
Thursday, June 9, 2011
Pass 2: Map/Reduce
map = function() { emit(this._id.profile_id, {count: 1});}
Count visitors per profileKEY = profile_id
Thursday, June 9, 2011
Pass 2: Results
// Same reduce function as beforedb.profile_views_by_visitor.mapReduce(map, reduce, {out: 'profile_views_unique'})
// Resultsdb.profile_views_unique.find(){ "_id" : 1, "value" : { "count" : 2 } }{ "_id" : 2, "value" : { "count" : 1 } }
Count visitors per profile
Thursday, June 9, 2011
Map/Deduce
Large data sets, you get:• Horizontal scaling• Parallel processing across cluster
Can be clunkier than GROUP BY in SQL. But ...
JavaScript functions offers flexibility/power
Thursday, June 9, 2011
ActivitySELECT * FROM everything;
Too many tables to JOIN or UNIONThursday, June 9, 2011
Relational solutionDenormalized events table as activity log.
Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone |
We use James Golick’s timeline_fu gem for Rails:https://github.com/jamesgolick/timeline_fu
Thursday, June 9, 2011
DirectionIncoming Activity
(recipients)Generated Activity
(actors)
Thursday, June 9, 2011
ComplicationsMultiple recipients• Subscribe to comments for a shot• Twitter-style @ mentions in comments
Confusing names• Generic names make queries and view logic hard to follow
N+1• Each event may require several lookups to get actor, subject, etc
Thursday, June 9, 2011
Events in Mongo
{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}
Comment on a Screenshot containing an @ mentionScreenshot owner and @user should be recipients.
Mongo version of our timeline_events table
Thursday, June 9, 2011
Mongo Event v.2
{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [
{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}
], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}
Why is a user a recipient?
Thursday, June 9, 2011
Mongo Event v.3
{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [
{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}
], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}
Meaningful names
Thursday, June 9, 2011
Mongo Event v.4
{ event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [
{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}
], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}
Denormalize to eliminate N+1s in view
Thursday, June 9, 2011
Denormalizing?You're giving up RDBMs benefits to optimize.
Optimize your optimizations.
Document flexibility:Data structures can mirror the view
Thursday, June 9, 2011
Caching
• Grabs free memory as needed; no configured cache size• Relies on OS to reclaim memory (LRU)
http://www.mongodb.org/display/DOCS/Caching
MongoDB uses memory-mapped files
Thursday, June 9, 2011
Replace Redis/Memcached?
FREQUENTLY accessed items LIKELY in memory
Good enough for you?One less moving part.
Thursday, June 9, 2011
Cache Namespaces
// Clear collection to expiredb.ads_cache.remove()
'ad_1''ad_2''ad_3'
Memcached keys are flatNo simple way to expire all
Collection
can serve as an expirable namespace
Thursday, June 9, 2011
Time to Mongo?Versatility?
Data structure flexibility worth more than joins?
Easier horizontal scaling?
http://www.mongodb.org
log | scale | optimize | aggregate | cache
Thursday, June 9, 2011
Cheers!
Rich Thornett
Dribbblehttp://dribbble.com
@frogandcode
Thursday, June 9, 2011
top related