Top Banner
Jonathan Boulle @baronboulle | [email protected] etcd - overview and future
78

Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Apr 16, 2017

Download

Technology

Xebia France
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Jonathan Boulle@baronboulle | [email protected]

etcd - overview and future

Page 2: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Why etcd?

Page 3: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Uncoordinated Upgrades

Page 4: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ... ...

... ... ...

Unavailable

Uncoordinated Upgrades

Page 5: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

MotivationCoreOS cluster reboot lock

- Decrement a semaphore key atomically

- Reboot and wait...

- After reboot increment the semaphore key

Page 6: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

3

CoreOS updates coordination

Page 7: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

CoreOS updates coordination

3

Page 8: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

...

CoreOS updates coordination

2

Page 9: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ... ...

CoreOS updates coordination

0

Page 10: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ... ...

CoreOS updates coordination

0

Page 11: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ...

CoreOS updates coordination

0

Page 12: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ...

CoreOS updates coordination

0

Page 13: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ...

CoreOS updates coordination

1

Page 14: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

... ...

...

CoreOS updates coordination

0

Page 15: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

CoreOS updates coordination

Page 16: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Store Application Configuration

config

Page 17: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

config

Start / RestartStart / Restart

Store Application Configuration

Page 18: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

config

Update

Store Application Configuration

Page 19: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

config

Unavailable

Store Application Configuration

Page 20: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

RequirementsStrong Consistency

- mutual exclusive at any time for locking purpose

Highly Available- resilient to single points of failure & network partitions

Watchable- push configuration updates to application

Page 21: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

RequirementsCAP

- Consistency, Availability, Partition Tolerance: choose 2 - We want CP- We want something like Paxos

Page 22: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Common problem

GFS

Paxos

Big Table

Spanner

CFS

Chubby

Google - “All” infrastructure relies on Paxos

Page 23: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Common problem

Amazon - Replicated log powers ec2

Microsoft - Boxwood powers storage infrastructure

Hadoop - ZooKeeper is the heart of the ecosystem

Page 24: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

COMMON PROBLEM

#GIFEE and Cloud Native Solution

Page 25: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

10,000 Stars on Github

250 contributors

Google, Red Hat, EMC, Cisco, Huawei, Baidu, Alibaba...

Page 26: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

THE HEART OF CLOUD NATIVE

Kubernetes, Cloud Foundry's Diego,Docker's SwarmKit, many others

Page 27: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD KEY VALUE STORE

Fully Replicated, Highly Available,Consistent

Page 28: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

PUT(foo, bar), GET(foo), DELETE(foo)

Watch(foo)

CAS(foo, bar, bar1)

Key-value Operations

Page 29: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

DEMO

play.etcd.io

Page 30: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Runtime ReconfigurationPoint-in-time Backup

Extensive Metrics

etcd Operationality

Page 31: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3

Successor of etcd v2

Page 32: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3

Better Performance

Page 33: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3

Massively Scalable

Page 34: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3

More Efficient & Powerful APIs

Page 35: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

gRPC Based API

~4x Faster vs JSON

HTTP/2 Improves Efficiency

Page 36: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Multi-VersionPut(foo, bar)

Put(foo, bar1)

Put(foo, bar2)

Get(foo) -> bar2

Page 37: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Multi-VersionPut(foo, bar)

Put(foo, bar1)

Put(foo, bar2)

Get(foo, 1) -> bar

Page 38: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Tx.If(Compare(Value("foo"), ">", "bar"),Compare(Version("foo"), "=", 2),...).Then(Put("ok","true")...).Else( Put("ok","false")...).Commit()

Mini-Transactions

Page 39: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

l = CreateLease(15 * second)

Put(foo, bar, l)

l.KeepAlive()

l.Revoke()

Leases

Page 40: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

w = Watch(foo)for {r = w.Recv()print(r.Event) // PUTprint(r.KV) // foo,bar

}

Streaming Watch

Page 41: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Synchronization LoC

Page 42: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v2

machine coordination -> O(10k)

Page 43: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3

app/container coordination -> O(1M)

Page 44: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Reliability99% at small scale is easy

- Failure is infrequent and human manageable

99% at large scale is not enough- Not manageable by humans

99.99% at large scale- Reliable systems at bottom layer

Page 45: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

HOW DO WE ACHIEVE RELIABILITY

WAL, Snapshots, Testing

Page 46: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Write Ahead Log

Append only- Simple is good

Rolling CRC protected- Storage & OSes can be unreliable

Page 47: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Snapshots

Torturing DBs for Fun and Profit (OSDI2014)- The simpler database is safer- LMDB was the winner

Boltdb an append only B+Tree- A simpler LMDB written in Go

Page 48: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Testing Clusters FailureInject failures into running clusters

White box runtime checking- Hash state of the system- Progress of the system

Page 49: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Testing Cluster Health with Failures

Issue lock operations across clusterEnsure the correctness of client library

Page 50: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

TESTING CLUSTER

dash.etcd.io

Page 51: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Punishing Functional Tests

Page 52: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Punishing Functional Tests

Page 53: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd/raft Reliability

Designed for testability and flexibility

Used by large scale db systems and others- Cockroachdb, TiKV, Dgraph

Page 54: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd vs others

Do one thing

Page 55: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd vs others

Only do the One Thing

Page 56: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd vs others

Do it Really Well

Page 57: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd Reliability

Do it Really Well

Page 58: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD v3.0 BETA

Efficient and Scalable

Page 59: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

BETA AVAILABLE TODAY

github.com/coreos/etcd

Page 60: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

FUTURE WORK

Proxy, Caching, Watch Coalescing, Secondary Index

Page 61: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ETCD and KUBERNETES

The Data Store

Page 62: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

workerkubelet

workerkubelet

workerkubelet

scheduler& API

workerkubelet

workerkubelet

workerkubelet

workerkubelet

Page 63: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd and Kubernetes

- Kubernetes currently uses the V2 API

- Work very actively in process to migrate to V3

- Opt-in currently, default in future

Page 64: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd v3 and Kubernetes

- Follow along:https://github.com/kubernetes/kubernetes/issues/22448

- Try it out!

Page 65: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

etcd v3 will support Kubernetesas it scales to 5.000 nodes and beyond

Page 66: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance 1K keys

Page 67: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance

Snapshot caused performance degradation

etcd2 - 600K keys

Page 68: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance etcd2 - 600K keys

Snapshot triggered elections

Page 69: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

ZooKeeper Performance

Non-blocking full snapshotEfficient memory management

Page 70: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance ZooKeeper default

Page 71: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance

Snapshot triggeredelection

ZooKeeper default

Page 72: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance

Snapshot

ZooKeeper default

Page 73: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance

GC

ZooKeeper snapshot disabled

Page 74: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Reliable Performance

- Similar to ZooKeeper with snapshot disabled- Incremental snapshot

- No Garbage Collection Pauses- Off-heap storage

Page 75: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance etcd3 /ZooKeeper snapshot disabled

Page 76: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Performance etcd3 /ZooKeeper snapshot disabled

Page 77: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

Memory

10GB

2.4GB

0.8GB

512MB data - 2M 256B keys

Page 78: Paris Container Day 2016 : Etcd - overview and future (CoreOS)

GET INVOLVED

github.com/coreos/etcd