Top Banner
Jumpstart: Schema Design Buzz Moschetti Enterprise Architect, MongoDB [email protected] @buzzmoschetti
45

Dev Jumpstart: Schema Design Best Practices

Jul 25, 2015

Download

Technology

MongoDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dev Jumpstart: Schema Design Best Practices

Jumpstart:Schema Design

Buzz MoschettiEnterprise Architect, MongoDB

[email protected]

@buzzmoschetti

Page 2: Dev Jumpstart: Schema Design Best Practices

Theme #1: Great Schema Design involves much more than the database

• Easily understood structures• Harmonized with software• Acknowledging legacy issues

Page 3: Dev Jumpstart: Schema Design Best Practices

Theme #2: Today’s solutions need to accommodate tomorrow’s needs

• End of “Requirements Complete”• Ability to economically scale• Shorter solutions lifecycles

Page 4: Dev Jumpstart: Schema Design Best Practices

Theme #3: MongoDB offers you choice

Page 5: Dev Jumpstart: Schema Design Best Practices

RDBMS MongoDB

Database Database

Table Collection

Index Index

Row Document

Join Embedding & Linking

Terminology

Page 6: Dev Jumpstart: Schema Design Best Practices

{ _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ { _id: "kchodorow", name: "Kristina Chodorow“ }, { _id: "mdirold", name: “Mike Dirolf“ } ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", thumbnail: BinData(0,"AREhMQ=="), publisher: { name: "O’Reilly Media", founded: 1980, locations: ["CA”, ”NY” ] }}

What is a Document?

Page 7: Dev Jumpstart: Schema Design Best Practices

// Java: mapsDBObject query = new BasicDBObject(”publisher.founded”, 1980));Map m = collection.findOne(query);Date pubDate = (Date)m.get(”published_date”); // java.util.Date

// Javascript: objectsm = collection.findOne({”publisher.founded” : 1980});pubDate = m.published_date; // ISODateyear = pubDate.getUTCFullYear();

# Python: dictionariesm = coll.find_one({”publisher.founded” : 1980 });pubDate = m[”pubDate”].year # datetime.datetime

Documents Map to Language Constructs

Page 8: Dev Jumpstart: Schema Design Best Practices

8

Traditional Schema Design• Static, Uniform Scalar Data• Rectangles• Low-level, physical

representation

Page 9: Dev Jumpstart: Schema Design Best Practices

9

Document Schema Design• Flexible, Rich Shapes• Objects• Higher-level, business

representation

Page 10: Dev Jumpstart: Schema Design Best Practices

Schema Design By Example

Page 11: Dev Jumpstart: Schema Design Best Practices

11

Library Management Application

• Patrons/Users• Books• Authors• Publishers

Page 12: Dev Jumpstart: Schema Design Best Practices

12

Question:What is a Patron’s Address?

Page 13: Dev Jumpstart: Schema Design Best Practices

A Patron and their Address

> patron = db.patrons.find({ _id : “joe” }){ _id: "joe“, name: "Joe Bookreader”, favoriteGenres: [ ”mystery”, ”programming” ]}

> address = db.addresses.find({ _id : “joe” }){ _id: "joe“, street: "123 Fake St.", city: "Faketon", state: "MA", zip: 12345}

Page 14: Dev Jumpstart: Schema Design Best Practices

A Patron and their Address

> patron = db.patrons.find({ _id : “joe” }){ _id: "joe", name: "Joe Bookreader", favoriteGenres: [ ”mystery”, ”programming” ] address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}

Page 15: Dev Jumpstart: Schema Design Best Practices

Projection: Return only what you need

> patron = db.patrons.find({ _id : “joe” }, {“_id”: 0, ”address”:1}){ address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}> patron = db.patrons.find({ _id : “joe” }, {“_id”: 0, “name”:1, ”address.state”:1}){ name: "Joe Bookreader", address: { state: "MA” }}

Page 16: Dev Jumpstart: Schema Design Best Practices

16

One-to-One Relationships

• “Belongs to” relationships are often embedded• Holistic representation of entities with their

embedded attributes and relationships.• Great read performance

Most important: • Keeps simple things simple• Frees up time to tackle harder schema

design issues

Page 17: Dev Jumpstart: Schema Design Best Practices

17

Question:What are a Patron’s Addresses?

Page 18: Dev Jumpstart: Schema Design Best Practices

A Patron and their Addresses

> patron = db.patrons.find({ _id : “bob” }){ _id: “bob", name: “Bob Knowitall", addresses: [ {street: "1 Vernon St.", city: "Newton", …}, {street: "52 Main St.", city: "Boston", …} ]}

Page 19: Dev Jumpstart: Schema Design Best Practices

A Patron and their Addresses

> patron = db.patrons.find({ _id : “bob” }){ _id: “bob", name: “Bob Knowitall", addresses: [ {street: "1 Vernon St.", city: "Newton", …}, {street: "52 Main St.", city: "Boston", …} ]}

> patron = db.patrons.find({ _id : “joe” }){ _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", …}}

Page 20: Dev Jumpstart: Schema Design Best Practices

20

Migration Options

• Migrate all documents when the schema changes.• Migrate On-Demand

– As we pull up a patron’s document, we make the change.

– Any patrons that never come into the library never get updated.

• Leave it alone– The code layer knows about both address and

addresses

Page 21: Dev Jumpstart: Schema Design Best Practices

21

The Utility of Substructure

Map d = collection.find(new BasicDBObject(”_id”,”Bob”));

Map addr = (Map) d.get(”address”);If(addr == null) { List<Map> addrl = (List) d.get(”addresses”); addr = addrl.get(0);}

doSomethingWithOneAddress(addr);

/**If later we add “region” to the address substructure, none of the queries have to change! Another value will appear in the Map (or not -- and that can be interrogated) and be processed.**/

Page 22: Dev Jumpstart: Schema Design Best Practices

22

Question:Who is the publisher of this book?

Page 23: Dev Jumpstart: Schema Design Best Practices

23

Book

• MongoDB: The Definitive Guide,• By Kristina Chodorow and Mike Dirolf• Published: 9/24/2010• Pages: 216• Language: English

• Publisher: O’Reilly Media, CA

Page 24: Dev Jumpstart: Schema Design Best Practices

Book with embedded Publisher

> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: 1980, locations: ["CA”, ”NY” ] }}

Page 25: Dev Jumpstart: Schema Design Best Practices

Don’t Forget the Substructure!

> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ { first: "Kristina”, last: “Chodorow” }, { first: ”Mike”, last: “Dirolf” } ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: 1980, locations: ["CA”, ”NY” ] }}

Page 26: Dev Jumpstart: Schema Design Best Practices

26

Book with embedded Publisher

• Optimized for read performance of Books• We accept data duplication• An index on “publisher.name” provides:

– Efficient lookup of all books for given publisher name– Efficient way to find all publisher names (distinct)

Page 27: Dev Jumpstart: Schema Design Best Practices

Publishers as a Separate Entity

> publishers = db.publishers.find(){ _id: “oreilly”, name: "O’Reilly Media", founded: 1980, locations: ["CA”, ”NY” ]}{ _id: “penguin”, name: “Penguin”, founded: 1983, locations: [ ”IL” ]}

Page 28: Dev Jumpstart: Schema Design Best Practices

Book with Linked Publisher

> book = db.books.find({ _id: “123” }){ _id: “123”, publisher_id: “oreilly”, title: "MongoDB: The Definitive Guide", …}

> db.publishers.find({ _id : book.publisher_id }){ _id: “oreilly”, name: "O’Reilly Media", founded: 1980, locations: ["CA”, ”NY” ]}

Page 29: Dev Jumpstart: Schema Design Best Practices

Books with Linked Publisher

db.books.find({ criteria } ).forEach(function(r) { m[r.publisher.name] = 1; // Capture publisher ID });

uniqueIDs = Object.keys(m);

cursor = db.publishers.find({"_id": {"$in": uniqueIDs } });

Page 30: Dev Jumpstart: Schema Design Best Practices

30

Question:What are all the books a publisher has published?

Page 31: Dev Jumpstart: Schema Design Best Practices

Publisher with linked Books

> publisher = db.publishers.find({ _id : “oreilly” }){ _id: “oreilly”, name: "O’Reilly Media", founded: 1980, locations: [ "CA“, ”NY” ], books: [“123”, “456”, “789”, “10112”, …]}

> books = db.books.find({ _id: { $in : publisher.books } })

NOT

RECOMMENDED

Page 32: Dev Jumpstart: Schema Design Best Practices

32

Question:Who are the authors of a given book?

Page 33: Dev Jumpstart: Schema Design Best Practices

Books with linked Authors

> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", … authors: [ { _id: “X12”, first: "Kristina”, last: “Chodorow” }, { _id: “Y45”, first: ”Mike”, last: “Dirolf” } ],}

> a2 = book.authors.map(function(r) { return r._id; });> authors = db.authors.find({ _id : { $in : a2}})

{_id:”X12”,name:{first:"Kristina”,last:”Chodorow”},hometown: … }{_id:“Y45”,name:{first:”Mike”,last:”Dirolf”}, hometown: … }

Page 34: Dev Jumpstart: Schema Design Best Practices

34

Question:What are all the books an author has written?

Page 35: Dev Jumpstart: Schema Design Best Practices

> authors = db.authors.find({ _id : “X12” }){ _id: ”X12", name: { first: "Kristina”, last: “Chodorow” } , hometown: "Cincinnati", books: [ {id: “123”, title : "MongoDB: The Definitive Guide“ } ]}> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", … authors: [ { _id: “X12”, first: "Kristina”, last: “Chodorow” }, { _id: “Y45”, first: ”Mike”, last: “Dirolf” } ],}

Double Link Books and Authors

Page 36: Dev Jumpstart: Schema Design Best Practices

> book = db.books.find({ _id : “123” }){ authors: [ { _id: “X12”, first: "Kristina”, last: “Chodorow” }, { _id: “Y45”, first: ”Mike”, last: “Dirolf” } ],}

> db.books.ensureIndex({“authors._id”: 1});

> db.books.find({ “authors._id” : “X12” }).explain();{

"cursor" : "BtreeCursor authors.id_1",…

"millis" : 0,}

…or Use Indexes

Page 37: Dev Jumpstart: Schema Design Best Practices

37

Embedding vs. Linking

• Embedding– Terrific for read performance

• Webapp “front pages” and pre-aggregated material• Complex structures

– Inserts might be slower than linking– Data integrity needs to be managed

• Linking– Flexible– Data integrity is built-in– Work is done during reads

• But not necessarily more work than RDBMS

Page 38: Dev Jumpstart: Schema Design Best Practices

38

Question:What are the personalized attributes for each author?

Page 39: Dev Jumpstart: Schema Design Best Practices

> db.authors.find(){ _id: ”X12", name: { first: "Kristina”, last: “Chodorow” }, personalData: {

favoritePets: [ “bird”, “dog” ], awards: [ {name: “Hugo”, when: 1983}, {name: “SSFX”, when: 1992} ] }}{ _id: ”Y45", name: { first: ”Mike”, last: “Dirolf” } , personalData: {

dob: ISODate(“1970-04-05”) }}

Assign Dynamic Structure to a Known Name

Page 40: Dev Jumpstart: Schema Design Best Practices

> db.events.find(){ type: ”click", ts: ISODate(“2015-03-03T12:34:56.789Z”, data: { x: 123, y: 625, adId: “AE23A” } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:01.003Z”, data: { x: 456, y: 611, adId: “FA213” } }

{ type: ”view", ts: ISODate(“2015-03-03T12:35:04.102Z”, data: { scn: 2, reset: false, … } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:05.312Z”, data: { x: 23, y: 32, adId: “BB512” } }

{ type: ”close", ts: ISODate(“2015-03-03T12:35:08.774Z”, data: { snc: 2, … } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:10.114Z”, data: { x: 881, y: 913, adId: “F430” } }

Polymorphism: Worth an Extra Slide

Page 41: Dev Jumpstart: Schema Design Best Practices

41

Question:What are all the books about databases?

Page 42: Dev Jumpstart: Schema Design Best Practices

Categories as an Array

> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", categories: [“MongoDB”, “Databases”, “Programming”]}

> db.books.find({ categories: “Databases” })

Page 43: Dev Jumpstart: Schema Design Best Practices

Categories as a Path

> book = db.books.find({ _id : “123” }){ _id: “123”, title: "MongoDB: The Definitive Guide", category: “Programming/Databases/MongoDB”}

> db.books.find({ category: ^Programming/Databases/* })

Page 44: Dev Jumpstart: Schema Design Best Practices

44

Summary

• Schema design is different in MongoDB– But basic data design principles stay the same

• Focus on how an application accesses/manipulates data• Seek out and capture belongs-to 1:1 relationships• Use substructure to better align to code objects• Be polymorphic!• Evolve the schema to meet requirements as they change

Page 45: Dev Jumpstart: Schema Design Best Practices

Questions & Answers