Emily Stolfo #mongodbdays Schema Design Ruby Engineer/Evangelist, 10gen @EmStolfo Tuesday, January 29, 13
Emily Stolfo
#mongodbdays
Schema Design
Ruby Engineer/Evangelist, 10gen
@EmStolfo
Tuesday, January 29, 13
Agenda
• Working with documents• Common patterns• Evolving a Schema• Queries and Indexes
Tuesday, January 29, 13
Terminology
RDBMS MongoDB
Database ➜ DatabaseTable ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ Reference
Tuesday, January 29, 13
Working with Documents
Tuesday, January 29, 13
DocumentsProvide flexibility and performance
Tuesday, January 29, 13
Example Schema (MongoDB)
Tuesday, January 29, 13
Embedding
Example Schema (MongoDB)
Tuesday, January 29, 13
Embedding
Linking
Example Schema (MongoDB)
Tuesday, January 29, 13
Relational Schema DesignFocuses on data storage
Tuesday, January 29, 13
Document Schema DesignFocuses on data use
Tuesday, January 29, 13
Schema Design Considerations• What is a priority?
– High consistency– High read performance– High write performance
• How does the application access and manipulate data?– Read/Write Ratio– Types of Queries / Updates– Data life-cycle and growth– Analytics (Map Reduce, Aggregation)
Tuesday, January 29, 13
Tools for Data Access
• Flexible Schemas• Embedded data structures• Secondary Indexes• Multi-Key Indexes• Aggregation Framework
– Pipeline operators: $project, $match, $limit, $skip, $sort, $group, $unwind
• No Joins
Tuesday, January 29, 13
Data Manipulation
• Conditional Query Operators– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne– Vector: $in, $nin, $all, $size
• Atomic Update Operators– Scalar: $inc, $set, $unset– Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet
Tuesday, January 29, 13
Schema Design Example
Tuesday, January 29, 13
Library Management Application
• Patrons• Books • Authors• Publishers
Tuesday, January 29, 13
One to One Relationsexample
Tuesday, January 29, 13
patron = { _id: "joe" name: "Joe Bookreader”}
address = { patron_id = "joe", street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345}
Modeling Patrons
patron = { _id: "joe" name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}
Tuesday, January 29, 13
One to One Relations
• “Contains” relationships are often embedded.• Document provides a holistic representation of
objects with embedded entities.• Optimized read performance.
Tuesday, January 29, 13
examplesOne To Many Relations
Tuesday, January 29, 13
patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …}, ]}
Patrons with many addresses
Tuesday, January 29, 13
example 2Publishers and Books
One to Many Relations
Tuesday, January 29, 13
Publishers and Books relation
• Publishers put out many books• Books have one publisher
Tuesday, January 29, 13
MongoDB: The Definitive Guide,By Kristina Chodorow and Mike DirolfPublished: 9/24/2010Pages: 216Language: English
Publisher: O’Reilly Media, CA
Book Data
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}
Book Model with Embedded Publisher
Tuesday, January 29, 13
publisher = { name: "O’Reilly Media", founded: "1980", location: "CA"}
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
Book Model with Normalized Publisher
Tuesday, January 29, 13
publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA"}
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly"}
Link with Publisher _id as a Reference
Tuesday, January 29, 13
publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ]}
book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
Link with Book _ids as a Reference
Tuesday, January 29, 13
Where do you put the reference?
• Reference to single publisher on books– Use when items have unbounded growth (unlimited # of
books)
• Array of books in publisher document– Optimal when many means a handful of items– Use when there is a bound on potential growth
Tuesday, January 29, 13
example 3Books and Patrons
One to Many Relations
Tuesday, January 29, 13
Books and Patrons
• Book can be checked out by one Patron at a time• Patrons can check out many books (but not 1000s)
Tuesday, January 29, 13
patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }}
book = { _id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ...}
Modeling Checkouts
Tuesday, January 29, 13
patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", checked_out: "2012-10-15" }, { _id: "987654321", checked_out: "2012-09-12" }, ... ]}
Modeling Checkouts
Tuesday, January 29, 13
De-normalizationProvides data locality
Tuesday, January 29, 13
patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ]}
Modeling Checkouts - de-normalized
Tuesday, January 29, 13
Referencing vs. Embedding• Embedding is a bit like pre-joining data• Document level operations are easy for the server
to handle• Embed when the “many” objects always appear
with (viewed in the context of) their parents.• Reference when you need more flexibility
How does your application access and manipulate data?
Tuesday, January 29, 13
exampleMany to Many Relations
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}
author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}
Books and Authors
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}
author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}
Relation stored in Book document
Tuesday, January 29, 13
book = { _id: 123456789 title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {book_id: 123456789, title : "MongoDB: The Definitive Guide" }]}
Relation stored in Author document
Tuesday, January 29, 13
book = { _id: 123456789 title: "MongoDB: The Definitive Guide", authors = [ kchodorow, mdirolf ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York", books: [ 123456789, ... ]}
author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany", books: [ 123456789, ... ]}
Relation stored in both documents
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}
db.books.find( { authors.name : "Kristina Chodorow" } )
Where do you put the reference?Think about common queries
Tuesday, January 29, 13
Where do you put the reference?Think about indexes
book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}
db.books.createIndex( { authors.name : 1 } )
Tuesday, January 29, 13
exampleTrees
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", category: "MongoDB"}
category = { _id: MongoDB, parent: Databases }category = { _id: Databases, parent: Programming }
Parent References
Tuesday, January 29, 13
book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English"}
category = { _id: MongoDB, children: [ 123456789, … ] }category = { _id: Databases, children: [ MongoDB, Postgres }category = { _id: Programming, children: [ Databases, Languages ] }
Child References
Tuesday, January 29, 13
Modeling Trees
• Parent References
- Each node is stored as a document
- Contains the id of the parent• Child References
- Each node contains ids of its children
- Can support graphs (multiple parents / child)
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", categories: [ Programming, Databases, MongoDB ]}
book = { title: "MySQL: The Definitive Guide", authors: [ "Michael Kofler" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", parent: "MySQL", ancestors: [ Programming, Databases, MySQL ]}
Array of Ancestors
Tuesday, January 29, 13
exampleSingle Table Inheritance
Tuesday, January 29, 13
book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), kind: loanable locations: [ ... ] pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}
Single Table Inheritance
Tuesday, January 29, 13
exampleQueues
Tuesday, January 29, 13
db.loans.insert({ _id: 123456789, book_id: 987654321, pending: false, approved: false, priority: 3})
//Find the highest priority request and mark as pending approval
request = db.loans.findAndModify({ query: { pending: false, book_id: 987654321 }, sort: { priority: -1}, update: { $set: { pending : true, started: new ISODate() } }})
Update highest priority request
Tuesday, January 29, 13
Summary
• Schema design is different in MongoDB• Basic data design principals apply• Focus on how application accesses and
manipulates data• Evolve schema to meet changing requirements
• Application-level logic is important!
Tuesday, January 29, 13
Emily Stolfo
#mongodbdays
Thank You
Ruby Engineer/Evangelist, 10gen
@EmStolfo
Tuesday, January 29, 13