Page 1
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Taking care about
your schema in the
MongoDB’s
schemaless worldAlessandro Palumbo
[email protected] http://it.linkedin.com/in/alessandropalumbo/
http://www.byte-code.com
Page 2
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
MongoDB
from humongous “huge; enormous”
NoSql
OPEN-source
Document-OrientedJSON-style documents
Page 3
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
JSON-style documents
{ "_id" : "6c85fa4c-fa64-44e2-89c9-e5eb7f306ed7", "code" : "CRS0001", "name" : "Test", "description" : "Test description", "active" : true, "scheduledDate" : { "from" : ISODate("2013-09-12T00:00:00.000Z"), "to" : ISODate("2013-10-31T00:00:00.000Z") }, "version" : NumberLong(1) }
Page 4
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
don’t be relationaL
no joins
NO FULL transactions
no SCHEMA
WE CAN EMBED
IS IT REALLY AN ISSUE?
DOCUMENT LEVELTRANSACTIONS
Page 5
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
DESIGN
DESIGN
FOR
QUERYEMBEDDED
DATA
vs
References
DYNAMIC
SCHEMA
VS
static
languages
friendly fire(aka RTFM)
AVOID
NATURAL
KEYS AS
IDENTIFIERS
PERFORMANCE
PREALLOCATE
FIELDS?
TUNING
UPDATES
AND
INSERTS
DOCUMENT
MOVING
SLOWS
YOU
Page 6
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
FRIENDLY FIRE
Page 7
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
All collections have an index on the id field that exists by default. If ID IS NOT PROVIDED the driver or the mongod will create an _id field with an ObjectID value.
AVOID
NATURAL
KEYS AS
IDENTIFIERS
ADD AN UNIQUE INDEX ON THE NATURAL KEY, SOMETIMES THE APPLICATION REALM CAN EVOLVE IN AN UNEXPECTED WAY
REMEMBER THAT UNIQUE INDEXES FIELDS MUST BE PART OF THE SHARD KEY IF SHARDING IS ENABLED
Page 8
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
DESIGN
Page 9
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
DOCUMENT DESIGN IS FUNCTIONAL TO THE QUERIES THAT WILL EXISTS IN THE APPLICATION
DESIGN
FOR
QUERY
REFERENCE OR EMBED DOCUMENTS,
“denormalized” is not always
a bad word
your document design will affect what kind of OPERATIONS will be safe or not
Page 10
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
Embedded data models allow applications to store related pieces of information in the same database record
EMBEDDED
DATA
vs
References
The maximum BSON document size is 16 megabytes and embedding may lead to performance issues if not correctly used
USUALLY there is a “contains” relation
between the embedding and the embedded object
Page 11
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
Normalized data models describe relationships using references between documents
EMBEDDED
DATA
vs
References
NO Referential integrity is supported, references could point to a not existing object
References provides more flexibility than embedding but remember that client-side applications will have to lookup for referenced objects with multiple queries
Page 12
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
why use dynamic schema if we are not using a dynamic programming language?
DYNAMIC
SCHEMA
VS
static
languages
inheritance is not only a matter of hierarchy, it could be also a matter of composition
composition is the key to introduce dynamic schema in a static programming language
Page 13
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
PERFORMANCE
Page 14
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
MONGODB handle the space allocation of a RECORD considering also a PADDING FACTOR
DOCUMENT
MOVING
SLOWS
YOU
WHEN AN UPDATED DOCUMENT DOES NOT FIT IN THE RECORD SPACE IT WILL BE MOVED
DYNAMIC SCHEMA IS THE FIRST CAUSE OF DOCUMENT MOVING
Page 15
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
FIELDS PREALLOCATION CAN FIX THE DOCUMENT MOVING ISSUES IN SOME USE CASES
PREALLOCATE
FIELDS?
Default values must be used to preallocate, this MUST BE HANDLEDin the application
NULL is not a default value :-) as it has its own type
Page 16
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Alessandro Palumbo - [email protected] - http://www.byte-code.com
MongoDB stores BSON documents as a sequence of fields and values, not as aN hash table
TUNING
UPDATES
AND
INSERTS
WRITING THE FIRST FIELD OF A DOCUMENT (OR A NESTED DOCUMENT) is considerably faster than writing THE LAST
Intra-Document Hierarchy could help to handle the issue
Page 17
Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by/3.0/
Any questions?