Top Banner
Emily Stolfo #mongodbdays Schema Design Ruby Engineer/Evangelist, 10gen @EmStolfo Tuesday, January 29, 13
53
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Schema Design

Emily Stolfo

#mongodbdays

Schema Design

Ruby Engineer/Evangelist, 10gen

@EmStolfo

Tuesday, January 29, 13

Page 2: Schema Design

Agenda

• Working with documents• Common patterns• Evolving a Schema• Queries and Indexes

Tuesday, January 29, 13

Page 3: Schema Design

Terminology

RDBMS MongoDB

Database ➜ DatabaseTable ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ Reference

Tuesday, January 29, 13

Page 4: Schema Design

Working with Documents

Tuesday, January 29, 13

Page 5: Schema Design

DocumentsProvide flexibility and performance

Tuesday, January 29, 13

Page 6: Schema Design

Example Schema (MongoDB)

Tuesday, January 29, 13

Page 7: Schema Design

Embedding

Example Schema (MongoDB)

Tuesday, January 29, 13

Page 8: Schema Design

Embedding

Linking

Example Schema (MongoDB)

Tuesday, January 29, 13

Page 9: Schema Design

Relational Schema DesignFocuses on data storage

Tuesday, January 29, 13

Page 10: Schema Design

Document Schema DesignFocuses on data use

Tuesday, January 29, 13

Page 11: Schema Design

Schema Design Considerations• What is a priority?

– High consistency– High read performance– High write performance

• How does the application access and manipulate data?– Read/Write Ratio– Types of Queries / Updates– Data life-cycle and growth– Analytics (Map Reduce, Aggregation)

Tuesday, January 29, 13

Page 12: Schema Design

Tools for Data Access

• Flexible Schemas• Embedded data structures• Secondary Indexes• Multi-Key Indexes• Aggregation Framework

– Pipeline operators: $project, $match, $limit, $skip, $sort, $group, $unwind

• No Joins

Tuesday, January 29, 13

Page 13: Schema Design

Data Manipulation

• Conditional Query Operators– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne– Vector: $in, $nin, $all, $size

• Atomic Update Operators– Scalar: $inc, $set, $unset– Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet

Tuesday, January 29, 13

Page 14: Schema Design

Schema Design Example

Tuesday, January 29, 13

Page 15: Schema Design

Library Management Application

• Patrons• Books • Authors• Publishers

Tuesday, January 29, 13

Page 16: Schema Design

One to One Relationsexample

Tuesday, January 29, 13

Page 17: Schema Design

patron = { _id: "joe" name: "Joe Bookreader”}

address = { patron_id = "joe", street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345}

Modeling Patrons

patron = { _id: "joe" name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}

Tuesday, January 29, 13

Page 18: Schema Design

One to One Relations

• “Contains” relationships are often embedded.• Document provides a holistic representation of

objects with embedded entities.• Optimized read performance.

Tuesday, January 29, 13

Page 19: Schema Design

examplesOne To Many Relations

Tuesday, January 29, 13

Page 20: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …}, ]}

Patrons with many addresses

Tuesday, January 29, 13

Page 21: Schema Design

example 2Publishers and Books

One to Many Relations

Tuesday, January 29, 13

Page 22: Schema Design

Publishers and Books relation

• Publishers put out many books• Books have one publisher

Tuesday, January 29, 13

Page 23: Schema Design

MongoDB: The Definitive Guide,By Kristina Chodorow and Mike DirolfPublished: 9/24/2010Pages: 216Language: English

Publisher: O’Reilly Media, CA

Book Data

Tuesday, January 29, 13

Page 24: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Book Model with Embedded Publisher

Tuesday, January 29, 13

Page 25: Schema Design

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Book Model with Normalized Publisher

Tuesday, January 29, 13

Page 26: Schema Design

publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly"}

Link with Publisher _id as a Reference

Tuesday, January 29, 13

Page 27: Schema Design

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ]}

book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Link with Book _ids as a Reference

Tuesday, January 29, 13

Page 28: Schema Design

Where do you put the reference?

• Reference to single publisher on books– Use when items have unbounded growth (unlimited # of

books)

• Array of books in publisher document– Optimal when many means a handful of items– Use when there is a bound on potential growth

Tuesday, January 29, 13

Page 29: Schema Design

example 3Books and Patrons

One to Many Relations

Tuesday, January 29, 13

Page 30: Schema Design

Books and Patrons

• Book can be checked out by one Patron at a time• Patrons can check out many books (but not 1000s)

Tuesday, January 29, 13

Page 31: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }}

book = { _id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ...}

Modeling Checkouts

Tuesday, January 29, 13

Page 32: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", checked_out: "2012-10-15" }, { _id: "987654321", checked_out: "2012-09-12" }, ... ]}

Modeling Checkouts

Tuesday, January 29, 13

Page 33: Schema Design

De-normalizationProvides data locality

Tuesday, January 29, 13

Page 34: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ]}

Modeling Checkouts - de-normalized

Tuesday, January 29, 13

Page 35: Schema Design

Referencing vs. Embedding• Embedding is a bit like pre-joining data• Document level operations are easy for the server

to handle• Embed when the “many” objects always appear

with (viewed in the context of) their parents.• Reference when you need more flexibility

How does your application access and manipulate data?

Tuesday, January 29, 13

Page 36: Schema Design

exampleMany to Many Relations

Tuesday, January 29, 13

Page 37: Schema Design

book = { title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}

Books and Authors

Tuesday, January 29, 13

Page 38: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany"}

Relation stored in Book document

Tuesday, January 29, 13

Page 39: Schema Design

book = { _id: 123456789 title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {book_id: 123456789, title : "MongoDB: The Definitive Guide" }]}

Relation stored in Author document

Tuesday, January 29, 13

Page 40: Schema Design

book = { _id: 123456789 title: "MongoDB: The Definitive Guide", authors = [ kchodorow, mdirolf ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York", books: [ 123456789, ... ]}

author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany", books: [ 123456789, ... ]}

Relation stored in both documents

Tuesday, January 29, 13

Page 41: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

db.books.find( { authors.name : "Kristina Chodorow" } )

Where do you put the reference?Think about common queries

Tuesday, January 29, 13

Page 42: Schema Design

Where do you put the reference?Think about indexes

book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

db.books.createIndex( { authors.name : 1 } )

Tuesday, January 29, 13

Page 43: Schema Design

exampleTrees

Tuesday, January 29, 13

Page 44: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", category: "MongoDB"}

category = { _id: MongoDB, parent: Databases }category = { _id: Databases, parent: Programming }

Parent References

Tuesday, January 29, 13

Page 45: Schema Design

book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

category = { _id: MongoDB, children: [ 123456789, … ] }category = { _id: Databases, children: [ MongoDB, Postgres }category = { _id: Programming, children: [ Databases, Languages ] }

Child References

Tuesday, January 29, 13

Page 46: Schema Design

Modeling Trees

• Parent References

- Each node is stored as a document

- Contains the id of the parent• Child References

- Each node contains ids of its children

- Can support graphs (multiple parents / child)

Tuesday, January 29, 13

Page 47: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", categories: [ Programming, Databases, MongoDB ]}

book = { title: "MySQL: The Definitive Guide", authors: [ "Michael Kofler" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", parent: "MySQL", ancestors: [ Programming, Databases, MySQL ]}

Array of Ancestors

Tuesday, January 29, 13

Page 48: Schema Design

exampleSingle Table Inheritance

Tuesday, January 29, 13

Page 49: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), kind: loanable locations: [ ... ] pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Single Table Inheritance

Tuesday, January 29, 13

Page 50: Schema Design

exampleQueues

Tuesday, January 29, 13

Page 51: Schema Design

db.loans.insert({ _id: 123456789, book_id: 987654321, pending: false, approved: false, priority: 3})

//Find the highest priority request and mark as pending approval

request = db.loans.findAndModify({ query: { pending: false, book_id: 987654321 }, sort: { priority: -1}, update: { $set: { pending : true, started: new ISODate() } }})

Update highest priority request

Tuesday, January 29, 13

Page 52: Schema Design

Summary

• Schema design is different in MongoDB• Basic data design principals apply• Focus on how application accesses and

manipulates data• Evolve schema to meet changing requirements

• Application-level logic is important!

Tuesday, January 29, 13

Page 53: Schema Design

Emily Stolfo

#mongodbdays

Thank You

Ruby Engineer/Evangelist, 10gen

@EmStolfo

Tuesday, January 29, 13