Top Banner
Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Schema Design Four Real-World Use Cases
52

Data Modeling Examples from the Real World

Nov 27, 2014

Download

MongoDB

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Modeling Examples from the Real World

Perl Engineer & Evangelist, 10gen

Mike Friedman

#MongoDBdays

Schema DesignFour Real-World Use Cases

Page 2: Data Modeling Examples from the Real World

Single Table En

Agenda

• Why is schema design important

• 4 Real World Schemas– Inbox– History– Indexed Attributes– Multiple Identities

• Conclusions

Page 3: Data Modeling Examples from the Real World

Why is Schema Design important?

• Largest factor for a performant system

• Schema design with MongoDB is different

• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"

Page 4: Data Modeling Examples from the Real World

#1 - Message Inbox

Page 5: Data Modeling Examples from the Real World

Let’s getSocial

Page 6: Data Modeling Examples from the Real World

Sending Messages

?

Page 7: Data Modeling Examples from the Real World

Design Goals

• Efficiently send new messages to recipients

• Efficiently read inbox

Page 8: Data Modeling Examples from the Real World

Reading my Inbox

?

Page 9: Data Modeling Examples from the Real World

3 Approaches (there are more)• Fan out on Read

• Fan out on Write

• Fan out on Write with Bucketing

Page 10: Data Modeling Examples from the Real World

// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )

// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )

msg = {from: "Joe",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagedb.inbox.save( msg )

// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on read

Page 11: Data Modeling Examples from the Real World

Fan out on read – I/O

Shard 1

Shard 2

Shard 3

Send Message

Page 12: Data Modeling Examples from the Real World

Fan out on read – I/O

Shard 1

Shard 2

Shard 3

Read Inbox

Send Message

Page 13: Data Modeling Examples from the Real World

Considerations

• Write: One document per message sent

• Read: Find all messages with my own name in the recipient field

• Read: Requires scatter-gather on sharded cluster

• A lot of random I/O on a shard to find everything

Page 14: Data Modeling Examples from the Real World

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )

msg = {from: "Joe",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagefor ( recipient in msg.to ) {

msg.recipient = msg.to[recipient]db.inbox.save( msg );

}

// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )

Fan out on write

Page 15: Data Modeling Examples from the Real World

Fan out on write – I/O

Shard 1

Shard 2

Shard 3

Send Message

Page 16: Data Modeling Examples from the Real World

Fan out on write – I/O

Read Inbox

Send Message

Shard 1

Shard 2

Shard 3

Page 17: Data Modeling Examples from the Real World

Considerations

• Write: One document per recipient

• Read: Find all of the messages with me as the recipient

• Can shard on recipient, so inbox reads hit one shard

• But still lots of random I/O on the shard

Page 18: Data Modeling Examples from the Real World

// Shard on "owner / sequence"db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )

msg = {from: "Joe",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

Fan out on write with buckets

Page 19: Data Modeling Examples from the Real World

// Send a messagefor( recipient in msg.to) { count = db.users.findAndModify({

query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count;

sequence = Math.floor(count / 50);

db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } },{ upsert: true } );

}

// Read my inboxdb.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 )

Fan out on write with buckets

Page 20: Data Modeling Examples from the Real World

Fan out on write with buckets• Each “inbox” document is an array of

messages

• Append a message onto “inbox” of recipient

• Bucket inboxes so there’s not too many messages per document

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

Page 21: Data Modeling Examples from the Real World

Fan out on write with buckets – I/O

Shard 1

Shard 2

Shard 3

Send Message

Page 22: Data Modeling Examples from the Real World

Shard 1

Shard 2

Shard 3

Fan out on write with buckets – I/O

Read Inbox

Send Message

Page 23: Data Modeling Examples from the Real World

#2 – History

Page 24: Data Modeling Examples from the Real World
Page 25: Data Modeling Examples from the Real World

Design Goals

• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX,

DPA)

• Need to query efficiently by – match– ranges

Page 26: Data Modeling Examples from the Real World

3 Approaches (there are more)• Bucket by Number of messages

• Fixed size array

• Bucket by date + TTL collections

Page 27: Data Modeling Examples from the Real World

db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, …] }

// Query with a date rangedb.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}})

// Remove elements based on a datedb.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } )

Bucket by number of messages

Page 28: Data Modeling Examples from the Real World

Considerations

• Shrinking documents, space can be reclaimed with– db.runCommand ( { compact: '<collection>' } )

• Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }

Page 29: Data Modeling Examples from the Real World

msg = { from: "Your Boss", to: [ "Bob" ],

sent: new Date(), message: "CALL ME NOW!"

}

// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 },

$slice: -50 }

} })

Fixed Size Array

Page 30: Data Modeling Examples from the Real World

Considerations

• Need to compute the size of the array based on retention period

Page 31: Data Modeling Examples from the Real World

// messages: one doc per user per day

db.inbox.findOne(){

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

TTL Collections

Page 32: Data Modeling Examples from the Real World

#3 – Indexed Attributes

Page 33: Data Modeling Examples from the Real World

Design Goal

• Application needs to stored a variable number of attributes e.g.– User defined Form– Meta Data tags

• Queries needed– Equality– Range based

• Need to be efficient, regardless of the number of attributes

Page 34: Data Modeling Examples from the Real World

2 Approaches (there are more)• Attributes as Embedded Document

• Attributes as Objects in an Array

Page 35: Data Modeling Examples from the Real World

db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } )

db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } )

db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } )

// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )

// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Attributes as a Sub-Document

Page 36: Data Modeling Examples from the Real World

Considerations

• Each attribute needs an Index

• Each time you extend, you add an index

• Lots and lots of indexes

Page 37: Data Modeling Examples from the Real World

db.files.insert( {_id: "local.0", attr: [ { type: "text" },

{ size: 64 },

{ created: ISODate("...") } ] } )

db.files.insert( { _id: "local.1", attr: [ { type: "text" },

{ size: 128 } ] } )

db.files.insert( { _id: "mongod", attr: [ { type: "binary" },

{ size: 256 }, { created: ISODate("...") } ] } )

db.files.ensureIndex( { attr: 1 } )

Attributes as Objects in Array

Page 38: Data Modeling Examples from the Real World

Considerations

• Only one index needed on attr

• Can support range queries, etc.

• Index can be used only once per query

Page 39: Data Modeling Examples from the Real World

#4 – Multiple Identities

Page 40: Data Modeling Examples from the Real World

Design Goal

• Ability to look up by a number of different identities e.g.

• Username• Email address• FB Handle• LinkedIn URL

Page 41: Data Modeling Examples from the Real World

2 Approaches (there are more)• Identifiers in a single document

• Separate Identifiers from Content

Page 42: Data Modeling Examples from the Real World

db.users.findOne(){ _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…}}

// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )

// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )

Single Document by User

Page 43: Data Modeling Examples from the Real World

Read by _id (shard key)

Shard 1 Shard 2 Shard 3

find( { _id: "joe"} )

Page 44: Data Modeling Examples from the Real World

Read by email (non-shard key)

Shard 1 Shard 2 Shard 3

find ( { email: [email protected] } )

Page 45: Data Modeling Examples from the Real World

Considerations

• Lookup by shard key is routed to 1 shard

• Lookup by other identifier is scatter gathered across all shards

• Secondary keys cannot have a unique index

Page 46: Data Modeling Examples from the Real World

// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )

// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )

// Shard collection by _iddb.shardCollection( "mydb.identities", { identifier : 1 } )

// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )

// Shard collection by _iddb.shardCollection( "mydb.users", { _id: 1 } )

Document per Identity

Page 47: Data Modeling Examples from the Real World

Read requires 2 reads

Shard 1 Shard 2 Shard 3

db.identities.find({"identifier" : { "hndl" : "joe" }})

db.users.find( { _id: "1200-42"} )

Page 48: Data Modeling Examples from the Real World

Considerations

• Lookup to Identities is a routed query

• Lookup to Users is a routed query

• Unique indexes available

• Must do two queries per lookup

Page 49: Data Modeling Examples from the Real World

Conclusion

Page 50: Data Modeling Examples from the Real World

Summary

• Multiple ways to model a domain problem

• Understand the key uses cases of your app

• Balance between ease of query vs. ease of write

• Random I/O should be avoided

Page 51: Data Modeling Examples from the Real World

Perl Engineer & Evangelist, 10gen

Mike Friedman

#MongoDBdays

Thank You

Page 52: Data Modeling Examples from the Real World

Next Sessions at 3:405th Floor:

West Side Ballroom 3&4: Advanced Replication Internals

West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB

Juilliard Complex: WhiteBoard Q&A

Lyceum Complex: Ask the Experts

7th Floor:

Empire Complex: Managing a Maturing MongoDB Ecosystem

SoHo Complex: MongoDB Indexing Constraints and Creative Schemas