Top Banner
MongoDB R eplica tion
60

Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Jul 05, 2015

Download

Technology

One of the strongest points for using a NoSQL database is their focus on distribution — both for replication and sharding. This talks takes a short look at what replication is, why you should use it, and what is so difficult about it. We then take a look at MongoDB’s implementation in general and finally focus on what can go wrong. In a practical demo you see how to find the right balance between performance versus data safety and how to use it in your Java application.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MongoDBReplication

Page 2: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Philipp Krenn

@xeraa

Page 3: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MotivationAvailability & data safety

Read scalability

Helping backups

Page 4: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Data migration

Delayed members

Oplog Tailing (Meteor. js)

https://meteorhacks.com/mongodb-oplog-and-meteor.html

Page 5: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Basics

Page 6: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

TerminologyPrimary + Secondaries

Master + Slaves problematic — renamed

Arbiter

Page 7: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 8: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 9: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

> rs.addArb("arbiter.example.com:3000")

Page 10: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 11: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Limits50 replica set members

12 before 2.7.8

7 voting members

Page 12: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Example

Page 13: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Single instance$ mkdir 1$ mongod --dbpath 1 --port 27001 --logpath log1$ mongo --port 27001> db.test.insert({ name: "Philipp", city: "Wien" })> db.test.find()

Stop instance

Page 14: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Add replication$ mkdir 2$ mkdir 3$ mongod --replSet javantura --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongod --replSet javantura --dbpath 2 --port 27002 --logpath log2 --oplogSize 20$ mongod --replSet javantura --dbpath 3 --port 27003 --logpath log3 --oplogSize 20

Page 15: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Connect

$ hostname$ mongo --port 27001> db.test.find()

Page 16: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Configure replicationStart on the old instance, otherwise data lostrs.initiate()rs.status()rs.add("PK-MBP:27002")rs.add("PK-MBP:27003")rs.status()db.isMaster()db.test.find()db.test.insert({ name: "Peter", city: "Steyr" })db.test.find()

Page 17: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Read from secondaries$ mongo --port 27002> db.test.find()> rs.slaveOk()> db.test.find()> db.test.insert({ name: "Dieter", city: "Graz" })

slaveOk only valid for the current connection

Page 18: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

FailoverKill primary with [Ctrl]+[C]Write to new primary> rs.status()> db.test.insert({ name: "Dieter", city: "Graz" })> db.test.find()

Page 19: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Restart old primary$ mongod --replSet name --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongo --port 27001> rs.status()> rs.slaveOk()> db.test.find()

Page 20: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Inner detailsCapped collection in oplog.rs of the local database> use local> show collectionsme 0.000MB / 0.008MBoplog.rs 0.000MB / 20.000MBreplset.minvalid 0.000MB / 0.008MBslaves 0.000MB / 0.008MBstartup_log 0.003MB / 10.000MBsystem.indexes 0.001MB / 0.008MBsystem.replset 0.000MB / 0.008MB

Page 21: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Inner details> db.oplog.rs.find(){ "h": NumberLong("-265486071808715859"), "ns": "test.test", "o": { "_id": ObjectId("541a8ed285ea5f8ae059d530"), "name": "Dieter" "city": "Graz" }, "op": "i", "ts": Timestamp(1411026642, 1), "v": 2}...

Page 22: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election

Page 23: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Heartbeat2s interval

10s until election

Page 24: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election rules1. Priority

2. Optime

3. Connections

Page 25: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Prioritycfg = rs.conf()cfg.members[0].priority = 0cfg.members[1].priority = 1cfg.members[2].priority = 2rs.reconfig(cfg)

Page 26: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Optime

Page 27: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Connections

Page 28: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

ElectionCandidate node asks for a vote

Others can veto

Page 29: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

ElectionOne yes for one node within 30s

Majority yes elects a new primary

Page 30: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Page 31: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Issues

Page 32: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

CAPSelect Availability or Consistency

Partition-tolerance is a prerequisite for distributed systems

"The network is reliable":http://aphyr.com/posts/288-the-network-is-reliable

Page 33: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

RollbackOld primary rolls back unreplicated changes once it rejoins the replica set

Page 34: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Rollback filerollback/ in data folder

File name: <database>.<collection>.

<timestamp>.bson

Page 35: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election timeAt times 5 to 7 minutes

http://www.tokutek.com/2014/07/explaining-ark-part-2-how-elections-and-failover-currently-work/

Page 36: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Missing synchronization during election

Old primary sends last changes to a single node

If not new primary: rollback

Page 37: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Remember

Replication is asynchronous

Page 38: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Multiple primariesUnlikely but possible

Bugs: https://jira.mongodb.org/browse/SERVER-9765

Test script with no replies: https://groups.google.com/forum/#!topic/mongodb-dev/-mH6BOYyzeI

Page 39: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Kyle Kingsbury @aphyr: Call Me Maybehttp://aphyr.com/tags/jepsen

PostgreSQL, Redis, MongoDB, Riak, Zookeeper, RabbitMQ, etcd + Consul,

ElasticSearch

Page 40: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://aphyr.com/posts/284-call-me-maybe-mongodb

05/2013 version 2.4

Up to 42% data lost

Data written to old primary: rollback

Page 41: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Page 42: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcernConfigure durability vs performance

https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/WriteConcern. java

Page 43: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. UNACKNOWLEDGED

w=0, j=0

Fire and forget

Default until 11/2012

Page 44: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. ACKNOWLEDGED

w=1, j=0

Current default

Operation completed successfully in memory

Page 45: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. JOURNALED

w=1, j=1

Operation written to the journal file

Since 1.8, single server durability

Page 46: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern.FSYNCEDw=1, fsync=true

Operation written to disk

Page 47: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. REPLICA_ACKNOWLEDGED

w=2, j=0

Acknowledged by primary and at least one secondary

w is the server number

Page 48: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. MAJORITY

w=majority, j=0

Acknowledgement by the majority of nodes

wtimeout recommended

Page 49: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. MAJORITY

Nearly no data lost, but high overhead

Page 50: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Write concern performancehttps://blog.serverdensity.com/mongodb-on-google-

compute-engine-tips-and-benchmarks/

3 x 1,000 inserts on GCE

Local 10GB system diskDedicated 200GB disk

Dedicated 200GB for data and journal

Page 51: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

n1-standard-2

Page 52: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

n1-highmem-8

Page 53: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Thanks! Questions?Now, later today, or @xeraa

Page 54: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Backup Slides

Page 55: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Oplog

Page 56: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Replication via logsMongoDB: Operations log (Oplog)

MySQL: Binary log (Binlog)

Page 57: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Naiv approach: Transmit original queryStatement Based Replication (SBR)DELETE FROM test.table WHERE quantity > 20 LIMIT 1

db.collection.remove({ quantity: { $gt: 20 }}, true)//justOne: true

Page 58: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Unambiguous representation

Row-Based Replication (RBR): Oplog

Page 59: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MongoDBAsynchronous replication

Secondaries can get the Oplog from...

their primary

another secondary with more recent data

Page 60: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Oplog size32bit: 48MB

64bit OS X: 183MB

64bit *nix, Windows: 1GB to 50GB (5% free disk)