FEBRUARY 15, 2018 | BELL HARBOR #MDBlocal Advanced Schema Design Patterns
FEBRUARY 15, 2018 | BELL HARBOR
#MDBlocal
Advanced Schema
Design Patterns
#MDBlocal
{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }
],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"
],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "[email protected]"
}
Who Am I?
#MDBlocal
The "Gang of Four":
A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems
MongoDB systems can also be built using its own
patterns
PATTERNPattern
#MDBlocal
• 10 years with the document model
• Use of a common methodology and vocabulary when designing schemas for MongoDB
• Ability to model schemas using building blocks
• Less art and more methodology
Why this Talk?
#MDBlocal
Ensure:
• Good performance
• Scalability
despite constraints
• Hardware• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server• Maximum size for a document
• Atomicity of a write
• Data set• Size of data
Why do we Create Models?
#MDBlocal
However don't Over Design!
#MDBlocal
WMDB -
World Movie Database
Any events, characters and entities depicted in this presentation are fictional.
Any resemblance or similarity to reality is entirely coincidental
#MDBlocal
WMDB -
World Movie Database
First iteration3 collections:
A. moviesB. moviegoersC. screenings
#MDBlocal
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible
#MDBlocal
#MDBlocal
• Frequency of Access• Subset ✔️
• Approximation ✔️
• Extended Reference
Patterns by Category
• Grouping• Computed ✔️
• Bucket
• Outlier
• Representation• Attribute ✔️
• Schema Versioning ✔️
• Document Versioning
• Tree
• Polymorphism
• Pre-Allocation
#MDBlocal
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:"2017/07/22"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_San_Jose: 1 }...
Issue #1: Big Documents, Many Fields
and Many Indexes
#MDBlocal
Pattern #1: Attribute
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:"2017/07/22"
}
#MDBlocal
Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern
#MDBlocal
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
#MDBlocal
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM
#MDBlocal
WMDB -
World Movie Database
First iteration3 collections:
A. moviesB. moviegoersC. screenings
#MDBlocal
In this example, we can:
• Limit the list of actors and crew to 20
• Limit the embedded reviews to the top 20
• …
Pattern #2: Subset
#MDBlocal
Problem:
• There is a 1-N or N-N relationship, and only a few documents always need to be shown
• Only infrequently do you need to pull all of the depending documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern
#MDBlocal
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
#MDBlocal
Question:
• Which new MongoDB 3.6 feature will allow me to notify an application if the name of an actor is changed?
Quiz A
Subset Pattern
#MDBlocal
• CPU is on fire!
Issue #3: Lot of CPU Usage
#MDBlocal
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated calculations
#MDBlocal
For example:
• Apply a sum, count, ...
• rollup data by minute, hour, day
• As long as you don’t mess with your source, you can recreate the rollups
Pattern #3: Computed
#MDBlocal
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
#MDBlocal
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution
#MDBlocal
Question:
• Which Relational Database feature is typically used to mimic the computed pattern?
Quiz B
Computed Pattern
#MDBlocal
Issue #4: Lots of Writes
#MDBlocal
Issue #4: … for non critical data
#MDBlocal
• Only increment once in X iterations
• Increment by X
Pattern #4: Approximation
#MDBlocal
#MDBlocal
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
Approximation Pattern
#MDBlocal
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents
Approximation Pattern –
Solution
#MDBlocal
• Keeping track of the schema version of a document
Issue #5: Need to change the list of fields in the
documents
#MDBlocal
Add a field to track the schema version number, per document
Does not have to exist for version 1
Pattern #5: Schema Versioning
#MDBlocal
Problem:
• Updating the schema of a database is:• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
#MDBlocal
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
#MDBlocal
BACK to reality
#MDBlocal
• How duplication is handledA. Update both source and target in real time
B. Update target from source at regular intervals. Examples:• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
#MDBlocal
What our Patterns did for us
Problem Pattern
Messy and Large Documents Attribute
Too much RAM Subset
Too much CPU Computed
Too many disk accesses Approximation
No downtime to upgrade schema Schema Versioning
#MDBlocal
• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• External Reference
• Tree(s)
• Polymorphism
• Pre-allocation
Other Patterns
#MDBlocal
A. Simple grouping from tables to collections is not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways
#MDBlocal
A full design example for a given problem:
• E-commerce site
• Contents Management System
• Social Networking
• Single view
• …
References for complete Solutions
#MDBlocal
• More patterns in a follow up to this presentation
• MongoDB in-person training courses on Schema Design
• Upcoming Online course atMongoDB University:
• https://university.mongodb.com
• Data Modeling
How Can I Learn More About Schema Design?
#MDBlocal
Question:
• Which Pattern is used in the following document?
{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }
],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"
],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "[email protected]"
}
Quiz C
Which Pattern is used
#MDBlocal
Thank You for using MongoDB !