Using NoSQL with Yo' SQL

Post on 11-Nov-2014

2174 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Supplementing a relational database application with MongoDB.

Transcript

Using NoSQL with Yo’ SQLSupplementing your app with a slice of MongoDB

Rich ThornettDribbble

Thursday, June 9, 2011

DribbbleWhat are you working on?

Show and tell for creatives via screenshots

Thursday, June 9, 2011

Your Father's WebappDribbble is a typical web application:

Ruby on Rails + Relational Database

We <3 PostgreSQL

But for certain tasks ...

Thursday, June 9, 2011

Alternative Values

More flexible data structures

Easier horizontal scaling

log | scale | optimize | aggregate | cache

Thursday, June 9, 2011

NoSQLNo == Not Only(but sounds a bit stronger, no?)

• No: Fixed table schemas• No: Joins• Yes: Scale horizontally

ExamplesMemcached, Redis, CouchDB, Cassandra, MongoDB ...

Thursday, June 9, 2011

Exploring MongoDB• Persistent data store• Powerful query language (closest to RDBMs)• Broad feature set• Great community and documentation

Utility belt that fits us?

Thursday, June 9, 2011

What is MongoDB?A document-oriented NoSQL database

Collections & Documentsv.

Tables & Rows

Thursday, June 9, 2011

What's a document?Our old friend JavaScript

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

}

Documents are BSON (binary encoded JSON)

Thursday, June 9, 2011

Embedded DocumentsAvoid joins for "belongs to" associations

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

})

Thursday, June 9, 2011

Arrays

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

})

Avoid joins for "tiny relations"

thing tagsthing_taggings

Relational Cruft

Thursday, June 9, 2011

Googley“With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed

(read parallel execution of) queries/updates/inserts/deletes.”

--Markus Gattolhttp://www.markus-gattol.name/ws/mongodb.html

Thursday, June 9, 2011

Replica Sets

• Read Scaling

• Data Redundancy

• Automated Failover

• Maintenance

• Disaster Recovery

Automate the storing of multiple copies of data

Thursday, June 9, 2011

Dude, who sharded?Relax, not you.

YouSpecify a shard key for a collection

MongoPartitions the collection across machines

ApplicationBlissfully unaware (mostly :)

Auto-sharding

Thursday, June 9, 2011

CoSQL

Cachin

gAnalytics

LoggingScali

ng

Flexibility

MongoDB

MIND THE APP

WEBAPP

RDBMS

Thursday, June 9, 2011

Ads

Let's Mongo!

• Orthogonal to primary app

• Few joins

• Integrity not critical

Thursday, June 9, 2011

From the Console

db.ads.insert({advertiser_id: 1,type: 'text',url: 'http://dribbbler-on-the-roof.com',copy: 'Watch me!',runs: [{start: new Date(2011, 4, 7),end: new Date(2011, 4, 14)

}],created_at: new Date()

})

Create a text ad

But there are drivers for all major languages

Thursday, June 9, 2011

QueryingQuery by match

db.ads.find({advertiser_id: 1})

Paging active ads// Page 2 of text ads running this monthdb.ads.find({ type: 'text',runs: {$elemMatch: {start: {$lte: new Date(2011, 4, 10)},end: {$gte: new Date(2011, 4, 10)}

}}

}).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011

Advanced Queries$gt$lt$gte$lte$all$exists

$size$type$elemMatch$not$where

$mod$ne$in$nin$nor$or

http://www.mongodb.org/display/DOCS/Advanced+Queries

count | distinct | groupGroup does not work across shards, use map/reduce instead.

Thursday, June 9, 2011

Polymorphism// Banner ad has additional fieldsdb.ads.insert({

advertiser_id: 1,type: 'banner',url: 'http://dribbble-me-this.com',copy: 'Buy me!',runs: [],image_file_name: 'ad.png',image_content_type: 'image/png',image_file_size: '33333'

})

Easy inheritance. Document has whatever fields it needs.

Single | Multiple | Joinedtable inheritance all present difficulties

No DB changes to create new subclasses in MongoThursday, June 9, 2011

Logging

• Scale and query horizontally

• Add fields on the fly

• Writes: Fast, asynchronous, atomic

Thursday, June 9, 2011

Volume Logging

• Ad impressions

• Screenshot views

• Profile views

Fast, asynchronous writes and sharding FTW!

Thursday, June 9, 2011

Real-time Analyticsdb.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment 'user.simplebits.likes_received': 1, 'country.us.likes_received': 1, 'city.boston.likes_received': 1 } }, true // upsert)

What people and locations are trending this hour?

upsert: Update document (if present) or insert it$inc: Increment field by amount (if present) or set to amount

Thursday, June 9, 2011

Flex Benefits

• Add/nest new fields to measure with ease

• Atomic upsert with $incReplaces two-step, transactional find-and-update/create

• Live, cached aggregation

Thursday, June 9, 2011

Scouting

Thursday, June 9, 2011

db.users.insert( { name: 'Dan Cederholm',

available: true,skills: ['html', 'css', 'illustration', 'icon design'] }

)

Design a Designer

Thursday, June 9, 2011

db.users.ensureIndex({location: '2d'})db.users.insert( { name: 'Dan Cederholm',

// Salem longitude/latitudelocation: [-70.8972222, 42.5194444],available: true,skills: ['html', 'css', 'illustration', 'icon design'] }

)

Geospatial Indexing

Thursday, June 9, 2011

Search by Location

// Find users in the Boston area who:// are available for work// have expertise in HTML and icon designdb.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: ['html', 'icon design']}})

Within area// $maxDistance: Find users in Boston area (w/in 50 miles)db.users.find({location: {$near: boston, $maxDistance: 0.7234842}})

Within area, matching criteria

boston = [-71.0602778, 42.3583333] // long/lat

Thursday, June 9, 2011

Search PowerFlexible Documents

+Rich Query Language

+Geospatial Indexing

Thursday, June 9, 2011

Stats

Thursday, June 9, 2011

Unique Views

unique = remote_ip address / DAY

a.k.a visitors per day

Thursday, June 9, 2011

CollectionsInput and output

MapReturns 0..N key/value pairs per document

ReduceAggregates values per key

Aggregate by key => GROUP BY in SQL

Map/Reducehttp://www.mongodb.org/display/DOCS/MapReduce

Thursday, June 9, 2011

StrategyTwo-pass map/reduce to calculate unique visitors

Pass 1GROUP BY: profile, visitorCOUNT: visits per visitor per profile

Pass 2GROUP BY: profileCOUNT: visitors

Thursday, June 9, 2011

Profile View Data

// Profile 1{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.2'}

// Profile 2{profile_id: 2, remote_ip: '127.0.0.4'}{profile_id: 2, remote_ip: '127.0.0.4'}

Visits on a given day

Thursday, June 9, 2011

Pass 1: Map Function

map = function() { var key = {

profile_id: this.profile_id,remote_ip: this.remote_ip

};

emit(key, {count: 1});}

Count visits per remote_ip per profileKEY = profile, remote_ip

Thursday, June 9, 2011

Reduce Function

reduce = function(key, values) { var count = 0;

values.forEach(function(v) { count += v.count; });

return {count: count};}

Counts(occurrences of key)

Thursday, June 9, 2011

Pass 1: Run Map/Reduce

db.profile_views.mapReduce(map, reduce, {out: 'profile_views_by_visitor'})

// Results: Unique visitors per profiledb.profile_views_by_visitor.find(){ "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } }{ "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } }{ "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }

Count visits per remote_ip per profile

Thursday, June 9, 2011

Pass 2: Map/Reduce

map = function() { emit(this._id.profile_id, {count: 1});}

Count visitors per profileKEY = profile_id

Thursday, June 9, 2011

Pass 2: Results

// Same reduce function as beforedb.profile_views_by_visitor.mapReduce(map, reduce, {out: 'profile_views_unique'})

// Resultsdb.profile_views_unique.find(){ "_id" : 1, "value" : { "count" : 2 } }{ "_id" : 2, "value" : { "count" : 1 } }

Count visitors per profile

Thursday, June 9, 2011

Map/Deduce

Large data sets, you get:• Horizontal scaling• Parallel processing across cluster

Can be clunkier than GROUP BY in SQL. But ...

JavaScript functions offers flexibility/power

Thursday, June 9, 2011

ActivitySELECT * FROM everything;

Too many tables to JOIN or UNIONThursday, June 9, 2011

Relational solutionDenormalized events table as activity log.

Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone |

We use James Golick’s timeline_fu gem for Rails:https://github.com/jamesgolick/timeline_fu

Thursday, June 9, 2011

DirectionIncoming Activity

(recipients)Generated Activity

(actors)

Thursday, June 9, 2011

ComplicationsMultiple recipients• Subscribe to comments for a shot• Twitter-style @ mentions in comments

Confusing names• Generic names make queries and view logic hard to follow

N+1• Each event may require several lookups to get actor, subject, etc

Thursday, June 9, 2011

Events in Mongo

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Comment on a Screenshot containing an @ mentionScreenshot owner and @user should be recipients.

Mongo version of our timeline_events table

Thursday, June 9, 2011

Mongo Event v.2

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Why is a user a recipient?

Thursday, June 9, 2011

Mongo Event v.3

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Meaningful names

Thursday, June 9, 2011

Mongo Event v.4

{ event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Denormalize to eliminate N+1s in view

Thursday, June 9, 2011

Denormalizing?You're giving up RDBMs benefits to optimize.

Optimize your optimizations.

Document flexibility:Data structures can mirror the view

Thursday, June 9, 2011

Caching

• Grabs free memory as needed; no configured cache size• Relies on OS to reclaim memory (LRU)

http://www.mongodb.org/display/DOCS/Caching

MongoDB uses memory-mapped files

Thursday, June 9, 2011

Replace Redis/Memcached?

FREQUENTLY accessed items LIKELY in memory

Good enough for you?One less moving part.

Thursday, June 9, 2011

Cache Namespaces

// Clear collection to expiredb.ads_cache.remove()

'ad_1''ad_2''ad_3'

Memcached keys are flatNo simple way to expire all

Collection

can serve as an expirable namespace

Thursday, June 9, 2011

Time to Mongo?Versatility?

Data structure flexibility worth more than joins?

Easier horizontal scaling?

http://www.mongodb.org

log | scale | optimize | aggregate | cache

Thursday, June 9, 2011

Cheers!

Rich Thornett

Dribbblehttp://dribbble.com

@frogandcode

Thursday, June 9, 2011

top related