Transcript

Emily Stolfo

#mongodbdays

Schema Design

Ruby Engineer/Evangelist, 10gen

@EmStolfo

Tuesday, January 29, 13

Agenda

• Working with documents• Common patterns• Evolving a Schema• Queries and Indexes

Tuesday, January 29, 13

Terminology

RDBMS MongoDB

Database ➜ DatabaseTable ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ Reference

Tuesday, January 29, 13

Working with Documents

Tuesday, January 29, 13

DocumentsProvide flexibility and performance

Tuesday, January 29, 13

Example Schema (MongoDB)

Tuesday, January 29, 13

Embedding

Example Schema (MongoDB)

Tuesday, January 29, 13

Embedding

Linking

Example Schema (MongoDB)

Tuesday, January 29, 13

Relational Schema DesignFocuses on data storage

Tuesday, January 29, 13

Document Schema DesignFocuses on data use

Tuesday, January 29, 13

Schema Design Considerations• What is a priority?

– High consistency– High read performance– High write performance

• How does the application access and manipulate data?– Read/Write Ratio– Types of Queries / Updates– Data life-cycle and growth– Analytics (Map Reduce, Aggregation)

Tuesday, January 29, 13

Tools for Data Access

• Flexible Schemas• Embedded data structures• Secondary Indexes• Multi-Key Indexes• Aggregation Framework

– Pipeline operators: $project, $match, $limit, $skip, $sort, $group, $unwind

• No Joins

Tuesday, January 29, 13

Data Manipulation

• Conditional Query Operators– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne– Vector: $in, $nin, $all, $size

• Atomic Update Operators– Scalar: $inc, $set, $unset– Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet

Tuesday, January 29, 13

Schema Design Example

Tuesday, January 29, 13

Library Management Application

• Patrons• Books • Authors• Publishers

Tuesday, January 29, 13

One to One Relationsexample

Tuesday, January 29, 13

patron = { _id: "joe" name: "Joe Bookreader”}

address = { patron_id = "joe", street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345}

Modeling Patrons

patron = { _id: "joe" name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}

Tuesday, January 29, 13

One to One Relations

• “Contains” relationships are often embedded.• Document provides a holistic representation of

objects with embedded entities.• Optimized read performance.

Tuesday, January 29, 13

examplesOne To Many Relations

Tuesday, January 29, 13

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …}, ]}

Patrons with many addresses

Tuesday, January 29, 13

example 2Publishers and Books

One to Many Relations

Tuesday, January 29, 13

Publishers and Books relation

• Publishers put out many books• Books have one publisher

Tuesday, January 29, 13

MongoDB: The Definitive Guide,By Kristina Chodorow and Mike DirolfPublished: 9/24/2010Pages: 216Language: English

Publisher: O’Reilly Media, CA

Book Data

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Book Model with Embedded Publisher

Tuesday, January 29, 13

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Book Model with Normalized Publisher

Tuesday, January 29, 13

publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly"}

Link with Publisher _id as a Reference

Tuesday, January 29, 13

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ]}

book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Link with Book _ids as a Reference

Tuesday, January 29, 13

Where do you put the reference?

• Reference to single publisher on books– Use when items have unbounded growth (unlimited # of

books)

• Array of books in publisher document– Optimal when many means a handful of items– Use when there is a bound on potential growth

Tuesday, January 29, 13

example 3Books and Patrons

One to Many Relations

Tuesday, January 29, 13

Books and Patrons

• Book can be checked out by one Patron at a time• Patrons can check out many books (but not 1000s)

Tuesday, January 29, 13

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }}

book = { _id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ...}

Modeling Checkouts

Tuesday, January 29, 13

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", checked_out: "2012-10-15" }, { _id: "987654321", checked_out: "2012-09-12" }, ... ]}

Modeling Checkouts

Tuesday, January 29, 13

De-normalizationProvides data locality

Tuesday, January 29, 13

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ]}

Modeling Checkouts - de-normalized

Tuesday, January 29, 13

Referencing vs. Embedding• Embedding is a bit like pre-joining data• Document level operations are easy for the server

to handle• Embed when the “many” objects always appear

with (viewed in the context of) their parents.• Reference when you need more flexibility

How does your application access and manipulate data?

Tuesday, January 29, 13

exampleMany to Many Relations

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}

Books and Authors

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}

Relation stored in Book document

Tuesday, January 29, 13

book = { _id: 123456789 title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {book_id: 123456789, title : "MongoDB: The Definitive Guide" }]}

Relation stored in Author document

Tuesday, January 29, 13

book = { _id: 123456789 title: "MongoDB: The Definitive Guide", authors = [ kchodorow, mdirolf ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York", books: [ 123456789, ... ]}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany", books: [ 123456789, ... ]}

Relation stored in both documents

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

db.books.find( { authors.name : "Kristina Chodorow" } )

Where do you put the reference?Think about common queries

Tuesday, January 29, 13

Where do you put the reference?Think about indexes

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

db.books.createIndex( { authors.name : 1 } )

Tuesday, January 29, 13

exampleTrees

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", category: "MongoDB"}

category = { _id: MongoDB, parent: Databases }category = { _id: Databases, parent: Programming }

Parent References

Tuesday, January 29, 13

book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

category = { _id: MongoDB, children: [ 123456789, … ] }category = { _id: Databases, children: [ MongoDB, Postgres }category = { _id: Programming, children: [ Databases, Languages ] }

Child References

Tuesday, January 29, 13

Modeling Trees

• Parent References

- Each node is stored as a document

- Contains the id of the parent• Child References

- Each node contains ids of its children

- Can support graphs (multiple parents / child)

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", categories: [ Programming, Databases, MongoDB ]}

book = { title: "MySQL: The Definitive Guide", authors: [ "Michael Kofler" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", parent: "MySQL", ancestors: [ Programming, Databases, MySQL ]}

Array of Ancestors

Tuesday, January 29, 13

exampleSingle Table Inheritance

Tuesday, January 29, 13

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), kind: loanable locations: [ ... ] pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Single Table Inheritance

Tuesday, January 29, 13

exampleQueues

Tuesday, January 29, 13

db.loans.insert({ _id: 123456789, book_id: 987654321, pending: false, approved: false, priority: 3})

//Find the highest priority request and mark as pending approval

request = db.loans.findAndModify({ query: { pending: false, book_id: 987654321 }, sort: { priority: -1}, update: { $set: { pending : true, started: new ISODate() } }})

Update highest priority request

Tuesday, January 29, 13

Summary

• Schema design is different in MongoDB• Basic data design principals apply• Focus on how application accesses and

manipulates data• Evolve schema to meet changing requirements

• Application-level logic is important!

Tuesday, January 29, 13

Emily Stolfo

#mongodbdays

Thank You

Ruby Engineer/Evangelist, 10gen

@EmStolfo

Tuesday, January 29, 13

top related