Top Banner
ZooKeeper (and other things) @jhalt
28

ZooKeeper (and other things)

Feb 21, 2017

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ZooKeeper (and other things)

ZooKeeper (and other things)

@jhalt

Page 2: ZooKeeper (and other things)

Part 1: The User’s Perspective

Page 3: ZooKeeper (and other things)

What is ZooKeeper?

A highly available, scalable, consistent, distributed, configuration, consensus, group membership, leader election, queuing, naming, file system, and coordination service.

Page 4: ZooKeeper (and other things)

…but for real

• It’s just a datastore• That’s durable– (data syncs to disk)

• And replicated– (disks on different servers)

• And ordered– (data is ordered the same on all servers)

• And consistent *– (clients see the same data at the same time) *

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Page 5: ZooKeeper (and other things)

ZK Guarantees

• Reliability– Durable writes, network partition handling

• Atomicity– No partial failures

• Sequential Consistency– Updates applied in order they were sent

• Single System Image *– All clients see the same view

• Timeliness– Timely eventing, failure detection

http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#ch_zkGuarantees

Page 6: ZooKeeper (and other things)

ZK Features / Concepts

• Znode– Data object– Hierarchical– Identified by a path– Version tagged

• Watchers– Watch for Znode creation / changes / deletion

• Emphemeral Znodes– Go away when a client session ends

Page 7: ZooKeeper (and other things)

ZooKeeper API

• create (path, nodeType)• get (path, watcher)• set (path, value, version)• delete (path)• exists (path, watcher)• getChildren (path, watcher)• multi (ops)• sync (path)

Page 8: ZooKeeper (and other things)

Anatomy of a Service

• Writes go through a single, elected leader• Reads happen anywhere

Page 9: ZooKeeper (and other things)

Anatomy of a Client Session

• Client connects to one of a list of servers• Client heartbeats regularly to maintain session• If client fails to send heartbeat, ZK cluster

deletes client’s ephemeral Znodes• If server fails to respond to heartbeat, client

will attempt to migrate session to another server

Page 10: ZooKeeper (and other things)

Anatomy of a Write

1. Client sends write to node2. Node forwards write to leader3. Leader sends write to all followers4. Followers commit write5. After W followers respond:

1. Leader commits write2. Leader responds to client

Page 11: ZooKeeper (and other things)

Common Use Cases• Configuration Management– Storm, Cassandra, S4

• Leader Election– Hadoop, Hbase, Kafka, Spark, Solr, Pig, Hive, Neo4j

• Cluster Management– ElasticSearch, Mesos, Akka, Juju, Flume, Accumulo

• Work Queues– TaskFlow

• Load Balancing• Sharding• Distributed Locks

Page 12: ZooKeeper (and other things)

Configuration Management

Administrator ProcesssetData(“/config/param1”, “value”)

Client ProcessgetData(“config/param1”, watcher)

Page 13: ZooKeeper (and other things)

Leader Election

Candidate Process// Check for a leadergetData(“/servers/leader”, watch)

// If empty, create a leadercreate(“/servers/leader”, hostname, EPHEMERAL)

Page 14: ZooKeeper (and other things)

Cluster Management

Master process1. Create watch on “/nodes”2. On alert, getChildren(“/nodes”)3. Check for changes

Node processes4. Create ephemeral “/nodes/node-[i]”5. Update periodically with status changes

Page 15: ZooKeeper (and other things)

Work QueuesMaster process1. Create watch on “/tasks”2. On alert, getChildren(“/tasks”)3. Assign tasks to workers via:

1. create “/workers/worker-[i]/task-[j]”4. Watch for deletion indicating task complete

Worker process5. Create watch on “/workers/worker-[i]”6. On alert, getChildren for worker and do work7. Delete tasks when complete

Page 16: ZooKeeper (and other things)

Part 2: The Scientist’s Perspective

Page 17: ZooKeeper (and other things)

Fallacies of Distributed Computing

• 1. The network is reliable– Partitions can cause diverging data (inconsistency)

• 2. Latency is zero– Clients can see stale data (eventual consistency)

• 3. Bandwidth is infinite• 4. The network is secure• 5. Topology doesn't change• 6. There is one administrator• 7. Transport cost is zero• 8. The network is homogeneoushttps://blogs.oracle.com/jag/resource/Fallacies.html - Conceived by Peter Deutschhttps://aphyr.com/posts/288-the-network-is-reliable

Page 18: ZooKeeper (and other things)

Achieving Consistency

• It’s all about CONSENSUS– Multiple processes agreeing on a value

• Distributed consensus is hard• ZK uses consensus to ensure consistency• With consensus we can safely do– Atomic commits– Distributed locks– Group membership– Leader election– Service discovery

Page 19: ZooKeeper (and other things)

Consensus in ZooKeeper

• ZooKeeper uses leader election• All writes go through the leader• Leader is elected via consensus

Page 20: ZooKeeper (and other things)

Availability in ZooKeeper

• ZooKeeper uses majority quorums• “Ensembles” should be even numbers– 1 – Fine for testing– 3 – Can survive 1 failure– 5 – Can survive 2 failures

• Multi-way partitions can cause other havoc

Page 21: ZooKeeper (and other things)

Consistency Models

• Linearizable consistency– Writes are processed in order– Writes are immediately visible to all participants– Allows safe state mutation, CAS

• Sequential consistency– Writes are processed in order– Writes may not be visible to other participants till later

• Serializable consistency– Writes are processed in any order, at any time

http://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdfhttp://www.bailis.org/blog/linearizability-versus-serializability/https://aphyr.com/posts/313-strong-consistency-models

Page 22: ZooKeeper (and other things)

Consistency Tradeoffs

Page 23: ZooKeeper (and other things)

Consistency in ZooKeeper

• Writes are Linearizable– Threading through leader ensures linearizability

• Reads are Sequential– Followers may not have latest writes, but writes

are seen in order• Linearizable reads via sync/read?– No. No. No.– Could only work if operations were executed

atomically by the leaderhttps://ramcloud.stanford.edu/~ongaro/thesis.pdf - Section 6.3 covers linearizable reads from followers

Page 24: ZooKeeper (and other things)

What about CAP?

• CP or AP?– AP – Every request receives a response– CP – All nodes see the same data at the same time

• ZooKeeper requires majority quorum– It’s not AP

• ZooKeeper reads are not linearizable– It’s not CP

• 3.4 adds read-only mode– It can be AP!

https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.htmlhttp://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf

Page 25: ZooKeeper (and other things)

ZooKeeper Pitfalls

• Log purging• Session timeouts– Causes ephemeral data is deletion– Client side GC or swapping?

• Server side disk/NIC IO latency– Lack of dedicated disks for ZK servers

Page 26: ZooKeeper (and other things)

ZooKeeper Limitations

• Consistency comes at a cost• Not Horizontally scalable– Data is not partitioned – fully replicated• 1mb data entries by default• Entire dataset lives in memory

– All writes go through leader and require quorum• Max ops/sec

– All sessions are replicated• Max sessions

Page 27: ZooKeeper (and other things)

Takeaways

• ZooKeeper is a distributed datastore• It provides sequential consistency• It is resilient to (N/2)-1 failures• It is not horizontally scalable• It can be highly available for reads only

Page 28: ZooKeeper (and other things)

fin.

Questions?