YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Schema Design4 Real World Use Cases

Page 2: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Single Table En

Agenda

• Why is schema design important

• 4 Real World Schemas– Inbox– History– Indexed Attributes– Multiple Identities

• Conclusions

Page 3: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Why is Schema Design important?

• Largest factor for a performant system

• Schema design with MongoDB is different

• RBMS – "What answers do I have?"• MongoDB – "What question will I have?"

Page 4: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

#1 - Message Inbox

Page 5: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Let’s getSocial

Page 6: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Sending Messages

?

Page 7: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Design Goals

• Efficiently send new messages to recipients

• Efficiently read inbox

Page 8: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Reading my Inbox

?

Page 9: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

3 Approaches (there are more)• Fan out on Read

• Fan out on Write

• Fan out on Write with Bucketing

Page 10: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )

// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )

msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagedb.inbox.save( msg )

// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on read

Page 11: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 12: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 13: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• 1 document per message sent

• Multiple recipients in an array key

• Reading an inbox is finding all messages with my own name in the recipient field

• Requires scatter-gather on sharded cluster

• Then a lot of random IO on a shard to find everything

Page 14: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )

msg = { from: "Joe”, to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagefor ( recipient in msg.to ) {

msg.recipient = recipientdb.inbox.save( msg );

}

// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )

Fan out on write

Page 15: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 16: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

Page 17: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• 1 document per recipient

• Reading my inbox is just finding all of the messages with me as the recipient

• Can shard on recipient, so inbox reads hit one shard

• But still lots of random IO on the shard

Page 18: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Fan out on write with buckets• Each “inbox” document is an array of

messages

• Append a message onto “inbox” of recipient

• Bucket inbox documents so there’s not too many per document

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

Page 19: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}// Send a messagefor( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50);

db.inbox.update( { owner: msg.to[recipient], sequence: sequence },

{ $push: { "messages": msg } },

{ upsert: true } );}// Read my inboxdb.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )

Fan out on write – with buckets

Page 20: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

Page 21: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 22: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

#2 – History

Page 23: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen
Page 24: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Design Goals

• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX,

DPA)

• Need to query efficiently by – match– ranges

Page 25: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

3 Approaches (there are more)• Bucket by Number of messages

• Fixed size Array

• Bucket by Date + TTL Collections

Page 26: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, …] }

// Query with a date rangedb.inbox.find ( { owner: "friend1", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})

// Remove elements based on a datedb.inbox.update( { owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )

Inbox – Bucket by # messages

Page 27: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• Shrinking documents, space can be reclaimed with– db.runCommand ( { compact: '<collection>' } )

• Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }

Page 28: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

msg = { from: "Your Boss", to: [ "Bob" ],

sent: new Date(), message: "CALL ME NOW!"

}

// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 },

$slice: -50 }

} })

Maintain the latest – Fixed Size Array

Page 29: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• Need to compute the size of the array based on retention period

Page 30: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// messages: one doc per user per day

db.inbox.findOne(){

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

TTL Collections

Page 31: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

#3 – Indexed Attributes

Page 32: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Design Goal

• Application needs to stored a variable number of attributes e.g.– User defined Form– Meta Data tags

• Queries needed– Equality– Range based

• Need to be efficient, regardless of the number of attributes

Page 33: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

2 Approaches (there are more)• Attributes

• Attributes as Objects in an Array

Page 34: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("2013-03-01T09:59:42.689Z" } } )

db.files.insert( { _id:"local.1", attr: { type: "text", size: 128} } )

db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } )

// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )

// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Attributes as a Sub-Document

Page 35: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• Each attribute needs an Index

• Each time you extend, you add an index

• Lots and lots of indexes

Page 36: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

db.files.insert( { _id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("2013-03-01T09:59:42.689Z" } ] } )

db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } )

db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } )

db.files.ensureIndex( { attr: 1 } )

Attributes as Objects in Array

Page 37: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// Range queriesdb.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )

db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )

// Multiple condition – Only the first predicate on the query can use the Index// ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071

db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

// Each $or can use an indexdb.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

Queries

Page 38: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

#4 – Multiple Identities

Page 39: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Design Goal

• Ability to look up by a number of different identities e.g.

• Username• Email address• FB Handle• LinkedIn URL

Page 40: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

2 Approaches (there are more)• Identifiers in a single document

• Separate Identifiers from Content

Page 41: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

db.users.findOne(){ _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…}}

// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )

// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )

Single Document by User

Page 42: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Read by _id (shard key)

Shard 1 Shard 2 Shard 3

find( { _id: "joe"} )

Page 43: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Read by email (non-shard key)

Shard 1 Shard 2 Shard 3

find ( { email: [email protected] } )

Page 44: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Considerations

• Lookup by shard key is routed to 1 shard

• Lookup by other identifier is scatter gathered across all shards

• Secondary keys cannot have a unique index

Page 45: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )

// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )

// Shard collection by _iddb.shardCollection( "mongodbdays.identities", { identifier : 1 } )

// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )

// Create a docuemnt that holds all the other user attributesdb.users.save( { _id: "1200-42", ... } )

// Shard collection by _iddb.shardCollection( "mongodbdays.users", { _id: 1 } )

Document per Identity

Page 46: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Read requires 2 reads

Shard 1 Shard 2 Shard 3

db.identities.find({"identifier" : { "hndl" : "joe" }})

db.users.find( { _id: "1200-42"} )

Page 47: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Solution

• Lookup to Identities is a routed query

• Lookup to Users is a routed query

• Unique indexes available

Page 48: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Conclusion

Page 49: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Summary

• Multiple ways to model a domain problem

• Understand the key uses cases of your app

• Balance between ease of query vs. ease of write

• Random IO should be avoided

Page 50: MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Thank You


Related Documents