Top Banner
37

ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

Apr 15, 2017

Download

Technology

Big Data Spain
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015
Page 2: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDBSCALING PostgreSQL

LIKE MongoDB

@NoSQLonSQL

Álvaro Hernández <[email protected]>

Page 3: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

About *8Kdata*

● Research & Development in databases

● Consulting, Training and Support in PostgreSQL

● Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today)

● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3

www.8kdata.com

Page 4: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB in brief

Page 5: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB in one slide

● Document-oriented, JSON, NoSQL db

● Open source (AGPL)

● MongoDB compatibility (wire protocol level)

● Uses PostgreSQL as a storage backend

Page 6: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage

● Data is stored in tables. No blobs

● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately

● Subdocuments are classified by “type”. Each “type” maps to a different table

Page 7: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage (II)

● A “structure” table keeps the subdocument “schema”

● Keys in JSON are mapped to attributes, which retain the original name

● Tables are created dinamically and transparently to match the exact types of the documents

Page 8: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage internals

{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}

Page 9: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage internals

The document is split into the following subdocuments:

{ "name": "ToroDB", "data": {}, "nested": {} }

{ "a": 42, "b": "hello world!"}

{ "j": 42, "deeper": {}}

{ "a": 21, "b": "hello"}

Page 10: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘

Page 11: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘

select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘

Page 12: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB storage and I/O savings

29% - 68% storage required,compared to Mongo 2.6

Page 13: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB: query “by structure”

● ToroDB is effectively partitioning by type

● Structures (schemas, partitioning types) are cached in ToroDB memory

● Queries only scan a subset of the data

● Negative queries are served directly from memory

Page 14: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ScalingToroDB like MongoDB

Page 15: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Big Data: NoSQL vs SQL

vs

http://www.networkworld.com/article/2226514/tech-debates/what-s-better-for-your-big-data-application--sql-or-nosql-.html

Page 16: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Scalability?

Page 17: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Ways to scale

● Vertical scalability➔ Concurrency scalability➔ Hardware scalability➔ Query scalability

● Read scalability (replication)

● Write scalability (horizontal, sharding)

Page 18: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Vertical scalability

Concurrency scalability

● SQL is usually better (e.g. PostgreSQL):➔ Finer locking➔ MVCC➔ better caching

● NoSQL often needs sharding within the same host to scale

Page 19: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Vertical scalability

Hardware scalability● Scaling with the number of cores?● Process/threading model?

Query scalability● Use of indexes? Use of more than one?● Table/collection partitioning?● ToroDB “by-type” partitioning

Page 20: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Read scalability: replication

● Replicate data to slave nodes, available read-only: scale-out reads

● Both NoSQL and SQL support it

● Binary replication usually faster (e.g. PostgreSQL's Streaming Replication)

● Not free from undesirable phenomena

Page 21: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

Dirty and stale reads(“call me maybe”)

Page 22: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

MongoDB write acknowledge

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

Page 23: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

MongoDB's dirty and stale reads

Dirty readsA primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?)

Stale readsA primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority

Page 24: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Write scalability(sharding)

● NoSQL better prepared than SQL

● But many compromises in data modeling (schema design): no FKs

● There are also solutions for SQL:➔ Shared-disk, limited scalability (RAC)➔ Sharding (like pg_shard)➔ PostgreSQL's FDWs

Page 25: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Read scalability (replication)in MongoDB and ToroDB

Page 26: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Replication protocol choice

● ToroDB is based on PostgreSQL

● PostgreSQL has either binary streaming replication (async or sync) or logical replication

● MongoDB has logical replication

● ToroDB uses MongoDB's protocol

Page 27: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

MongoDB's replication protocol

● Every change is recorded in JSON documents, idempotent format (collection: local.oplog.rs)

● Slaves pull these documents from master (or other slaves) asynchronously

● Changes are applied and feedback is sent upstream

Page 28: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

MongoDB slave's states

● Secondary: slave is more or less up to date and pulling “diffs” from other nodes

● InitialSync: copy * from all databases, all collections. Used to init slaves or when sync is lost (rollback didn't find common root; resync is requested)

● Rollback: there is data to DELETE

Page 29: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

MongoDB replication implementation

https://github.com/stripe/mosql

Page 30: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Announcing ToroDB v0.4 (snap)Supporting MongoDB replication

Page 31: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave)

● Implements the full replication protocol (not an oplog tailable query)

● Replicates from Mongo to a PostgreSQL

● Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)

Page 32: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

Advantages of ToroDB w/ replication

● Native SQL

● Query “by type”

● Better SQL scaling● Less concurrency contention● Better hardware utilization

● No need for ETL from Mongo to PG!

Page 33: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

● NoSQL is trying to get back to SQL

● ToroDB is SQL native!

● Insert with Mongo, query with SQL!

● Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc

ToroDB: native SQL

Page 34: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

ToroDB: native SQL

Page 35: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

db.bds15.insert({id:5, person: {

name: "Alvaro", surname: "Hernandez",contact: { email: "[email protected]", verified: true }}

})

db.bds15.insert({id:6, person: {

name: "Name", surname: "Surname", age: 31,contact: { email: "[email protected]" }}

})

Introducing ToroDB VIEWs

Page 36: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB @NoSQLonSQL

torodb$ select * from bds15.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)

torodb$ select * from bds15."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)

Introducing ToroDB VIEWs

Page 37: ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015