OCTOBER 11-14, 2016 • BOSTON, MA
Apr 16, 2017
3
01Who am I?
I’m Mark Miller I’m a Lucene junkie (2006) I’m a Lucene committer (2008) And a Solr committer (2009) And a member of the ASF (2011) And a former Lucene PMC Chair (2014-2015) I’ve done a lot of core Solr work and co-created SolrCloud
This talk is about how SolrCloud tries to protect your data.
And about some things that should change.
6
03Failure Cases (Shards of index can be treated independently)
• A Leader dies (loses ZK connection) • A Replica dies or update from leader to replica fails. • A Replica is partitioned (eg can talk to ZK, but not a shard leader)
R
L
ZK
7
01Replica Recovery
• A replica will recover from the leader on startup. • A replica will recover if an update from the leader to the replica fails. • A replica may recover from the leader in the leader election sync up dance. R
L
ZK
8
01Replica Recovery Dance
• Start Buffering Updates from Leader • Publish Recovering to ZK • Wait for leader to see Recovering State • On first Recovery try, PeerSync • Otherwise full index replication • Commit on leader • Replicate Index • Replay Buffered DocumentsR
L
ZK
RecoveryStrategy
9
01A Replica is Partitioned
• In the early days we half punted on this • Now, when a leader cannot reach a replica, it will put it in LIR in ZK. • A replica in LIR will realize that it must recover before clearing it’s LIR status. • We worked through some bugs, but this is very solid now.
R
L
ZK X
10
01Leader Recovery
• The ‘best effort’ leader recovery dance • If it’s after startup and the last published state is not active, can’t be leader. • Otherwise, try to peer sync with shard. • If success, try to peer sync from replicas to leader. • If any of those sync fails, ask replicas to recover from leader.
R
L
ZK
SyncStrategy / ElectionContext
11
01Leader Election Forward Progress Stall…
• Each replica decides for itself if it thinks it should be leader. • Everyone may think they are unfit. • Only replicas that have last published ACTIVE will attempt to be leader after the first election.
12
01Leader Election Forward Progress Stall…
• While rare, if all replicas in a shard lose their connection to ZK at the same time, no replica will become leader without intervention. • There is a manual API to intervene, but this should be done automatically. • In practice, this tends to happen for reasons that can be ‘tuned’ out of. • Still needs to be improved.
13
01User chooses durability requirements
• You can specify how many replicas you want to see success from to consider an update successful. minRf param. • This won’t fail based on that criteria though - simply flag you in the response. • If you replicate factor is not achieved, that also does not mean the update is rolled back.
14
01User chooses durability requirements
• If we improve some of this… • We can stop trying so hard. • And put it on the user to specify a replication factor that controls how ‘safe’ updates are.
16
01Handeling Cluster Shutdown / Startup
• What if an old replica returns? • How to ensure every replica participates in election? • What if no replica thinks it should be leader? • Staggered shutdowns? • Explicit cluster commands might help