Nuxeo: from SQL to MongoDB Florent Guillaume — Director of R&D, Nuxeo 2014-07-03
Nuxeo: from SQL to MongoDBFlorent Guillaume — Director of R&D, Nuxeo 2014-07-03
The Nuxeo Model
Nuxeo Platform
SQL DB
Document
BLOBS
<META>
<META>
<META>
Repository
BlobStore
Store
Read
Cache
Persistence Engine
Insert Update
Select
FS
MongoDB
VCS DBS
Nuxeo Core — Rich Documents
• Scalars
• Strings, Integers, Floats, Booleans, Dates
• Binary blobs (stored using separate BinaryStore service)
• Arrays of scalars
• Complex properties (sub-documents)
• Lists of complex properties
• System properties
• Id, type, facets, lifecycle state, ACL, version flags...
Nuxeo Core — Rich Documents
• Scalar properties and arrays
• dc:title = "My Document"
• dc:contributors = ["bob", "pete", "mary"]
• dc:created = 2014-07-03T12:15:07+0200
• ecm:uuid = 52a7352b-041e-49ed-8676-328ce90cc103
• ecm:primaryType = "MyFile"
• ecm:majorVersion = 2, ecm:minorVersion = 0
• ecm:isLatestMajorVersion = true, ecm:isLatestVersion = false
Nuxeo Core — Rich Documents
• Complex properties and lists of them
• primaryAddress = { street = "1 rue René Clair", zip = "75018",city = "Paris", country = "France" }
• files = [
• { name = "doc.txt", length = 1234, mime-type = "plain/text",data = 0111fefdc8b14738067e54f30e568115 }
• { name = "doc.pdf", length = 29344, mime-type = "application/pdf", data = 20f42df3221d61cb3e6ab8916b248216 }
]
Nuxeo Core — Rich Operations
• CRUD
• Create
• Retrieve
• Update
• Delete
• Move
• Copy
• ... but in a Hierarchy
Nuxeo Core — Rich Features
• Security based on ACLs and inheritance
• block bob for Write, allow members for Read
• Proxies (multi-filing)
• Versioning
• Placeless documents (versions, tags, relations...)
• Facets (dynamic typing)
• Locking
• Search (NXQL)SELECT * FROM File WHERE files/*/name = 'doc.txt'
Nuxeo Core — Hierarchy
• Parent-child relationship
• Recursion
• Find all the children to change something
• Lifecycle state
• Security
• Search on a subset of the hierarchy
• ... AND ecm:path STARTSWITH '/workspaces/receipts'
SQL vs DBS/MongoDB
Storage — SQL
• Stores data in a set of JOINed tables
• Star schema, around the main hierarchy
• Lists as JOINed table with item/pos
• Complex properties as sub-documents (children)
• Lists of complex properties as ordered sub-documents
• Id generated by application or database
• String / native UUID / serial integer
Storage — SQL (base hierarchy)
Storage — SQL (simple props)
Storage — SQL (complex props)
Storage — MongoDB
• Standard JSON documents
• Property names fully prefixed
• Lists as arrays of scalars
• Complex properties as sub-documents
• Complex lists as arrays of sub-documents
• Id generated by MongoDB
• Counter using findAndModify, $inc and returnNew
Storage — MongoDB
"ecm:id": "52a7352b-041e-49ed-8676-328ce90cc103","dc:title": "My Document","dc:contributors": ["bob", "pete", "mary"],"dc:created": ISODate("2014-07-03T12:15:07+0200"), "ecm:primaryType": "MyFile","ecm:majorVersion": NumberLong(2),"ecm:minorVersion": NumberLong(0),"ecm:isLatestMajorVersion": true,"ecm:isLatestVersion": false,
Storage — MongoDB
primaryAddress: { street: "1 rue René Clair", zip: "75018", city: "Paris", country: "France" },files: [{ name: "doc.txt", length: 1234, mime-type: "plain/text", data: "0111fefdc8b14738067e54f30e568115" }, { name: "doc.pdf", length: 29344, mime-type: "application/pdf", data: "20f42df3221d61cb3e6ab8916b248216" }] "ecm:acp": [{ name: "local", acl: [{ grant: false, perm: "Write", user: "bob" }, { grant: true, perm: "Read", user: "pete" }, { grant: true, perm: "Read", user: "members" }] }]
Hierarchy — SQL
• Parent-child relationship
• hierarchy.parentid column
• Recursion optimized through ancestors table
• For each document list all its ancestors
• Maintained by database triggers (create, delete, move, copy)
• Alternative for PostgreSQL: array column with all ancestors
Hierarchy — SQL
Hierarchy — MongoDB
• Parent-child relationship
• ecm:parentId field
• Recursion optimized through ecm:ancestorIds array
• Maintained by framework (create, delete, move, copy)
Hierarchy — MongoDB
"ecm:parentId": "afb488e7",
"ecm:ancestorIds": ["00000000", "18ba9e90", "afb488e7"],
Proxies — SQL
• Reference to target document
• proxies.targetid column
• Holds only hierarchy-based information, no content
• Parent, name, ACL...
• Additional JOIN during search
Proxies — MongoDB
• Copy of the target document
• ecm:proxyTargetId field
• Target document knows who's pointing to it
• ecm:proxyIds field
• Maintained by framework
• Copy needs to be kept up to date when target changes
• Maintained by framework
Proxies — Semantics
• What to do when:
• Target removed (→ forbid)
• Proxy removed
• Proxy + target removed at the same time (→ ok)
• Target copied
• Proxy copied (→ new proxy to original target)
• Proxy + target copied at the same time (todo)
Security — SQL
• Generic ACP stored in acls table
• Precomputed Read ACLs needed for search
• Ordered list of identities having access, with blocking["Management", "Supervisors", "-Temps", "bob"]
• Read ACLs are given an identifier
• Identities having access to which Read ACL is precomputed
• Maintained by database triggers
• Search matches using JOIN
Security — SQL
Security — SQL
Security — MongoDB
• Generic ACP stored in ecm:acp field
• Precomputed Read ACLs needed for search
• Simple set of identities having accessecm:racl: ["Management", "Supervisors", "bob"]!
• Semantic restrictions on blocking
• Maintained by framework
• Search matches if intersection{"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}
Search — SQL
• Translated from NXQL to SQL
• JOIN of all required star/list/complex properties tables
• Additional UNION + JOINs for proxies
• Additional JOIN for security
• Can have correlations (reuse same JOIN)
• Fulltext index(es) on fulltext.simpletext / fulltext.binarytext columns
• Translated from NXQL to MongoDB syntax
• Proxies queried directly
• Security queried by set intersection
• One fulltext index for ecm:fulltextSimple / ecm:fulltextBinary fields
• Some limitations
Search — MongoDB
Search — MongoDB Limitations
• Only one fulltext search per query, restrictions on position
• No generic boolean NOT, must be pushed down as negative operators
• Search is field/value based
• No multi-field operators (title = description, expirationDate > modificationDate)
• No multi-field arithmetic (amount + bonus < 1000)
• Subdocument correlation with $elemMatch is less generic than full JOINs
Transactions — SQL
• Standard SQL database capabilities
• Atomic commit
• Two-phase commit (prepare/commit) also useable, although costly
• Rollback
• Transient data is data modified in the database but not yet committed
• Transient data is visible along committed data for retrieval and search
Transactions — MongoDB
• No atomic commit beyond a single document
• Commit using a big batch of create/delete/update accumulated in-memory
• Not atomic, others can see partial state
• No transient space
• Emulate transient space in-memory, flush at commit time
• All accesses and searches must check the transient space as well as MongoDB
Transactions — MongoDB
• No rollback
• Rollback by dropping the in-memory transient space
• Operations involving several documents in relation
• Move, delete, copy, ancestors or recursion checks
• Using transient space + MongoDB for them is too complex
• Flush to MongoDB before doing them (commit)
• Must be able to be rolled back if needed (transaction compensation)
• Others can see state that's eventually invalid
MongoDB — Restrictions
• Eventual consistency and no transactions
• Prevents strong checks
• Duplicate name in a folder
• Move creating cycles
• Remove target before proxy
• Create document in a deleted folder
• Prevents full consistency of hierarchical processing
• Read ACLs, quotas
• Needs background jobs that check consistency
MongoDB — Features
• Bulk operations
• Map-reduce for aggregations
• Quotas / count / folder content last modified
• Conditional updates
• Locks
• Prevent dirty writes
• GridFS to store binaries
• Sharding
DBS — Future Work
Future Work
• DBS used for more services
• Directories / Vocabularies / User database
• Audit log
• DBS for other backends
• Elasticsearch
• Redis
• PostgreSQL / JSON
• Other...
Thanks!
We're Hiring!