Top Banner
David Vossel <[email protected]> PACEMAKER OpenStack Summit May 21, 2015 David Vossel <[email protected]> OpenStack's PID 1
110

Pacemaker: OpenStack's Pid 1

Jul 28, 2015

Download

Software

David Vossel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

PACEMAKER

OpenStack SummitMay 21, 2015David Vossel <[email protected]>

OpenStack's PID 1

Page 2: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Story Time

Page 3: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

And how Pacemaker Saved the DayThe Future of HA

Page 4: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

There once was a database

Page 5: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

There once was a database

DB

Page 6: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Not like the other databases

There once was a database

DB

Page 7: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

DB

Not like the other databases

There once was a database

A distributed self replicating database

Page 8: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Not like the other databases

There once was a database

A distributed self replicating database

Active/Active Replicated Database

DB DB DB DB

Page 9: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Everyone: HOORAY!

Page 10: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Everyone: Load Balance?

Clients

????????????????????

Page 11: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

HAProxy enters the scene.

Page 12: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

HA Proxy: No Problem

Clients

Page 13: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

HA Proxy: No Problem

Clients

Everyone:Hooray!

Page 14: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

Clients

* Everyone: ummmmm

Page 15: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Everyone: What if a node dies?

Active/Active Replicated Database

DB DB DB DB

proxy

Clients

Page 16: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

HA Proxy: No Problem

Everyone: What if a node dies?

Page 17: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

* Everyone: ummmmm

Page 18: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone: But... What if proxy dies?

Page 19: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

HA Proxy:...

Everyone: But... What if proxy dies?

Page 20: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone:What?????

HA Proxy:...

Page 21: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone:Uhhh???!!!!??

HA Proxy:...

????????????????????

Page 22: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

Everyone:Hello????

HA Proxy:...

????????????????????

Page 23: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Clients

Everyone:Anyone?

????????????????????

Page 24: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Everyone:I knew this cloud thing wouldn't work...

Page 25: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

KeepaliveD: Wait! Guys... I've got an idea!!!

Everyone:I knew this cloud thing wouldn't work...

Page 26: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

KeepaliveD enters the scene.

Page 27: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Whoa, Proxy Failover!

Page 28: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Awesome!!

Page 29: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Awesome!!

Keepalived:: I know right?!!

Page 30: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveD

Clients

Everyone: No Way!!!

Keepalived: I Rock!

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Page 31: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveD

Clients

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Everyone:Wait... Hold up

Page 32: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveD

Clients

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Everyone: But.... What actually happens to the “other” nodes?

Page 33: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Peripeteia - per·i·pe·tei·a

a sudden reversal of fortune or change in circumstances, especially in reference to fictional narrative.

Page 34: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

ClientsKeepalived:What Other Nodes? Keepalived:What Other Nodes?

Page 35: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Clients: HEY?! How come the thing I just wrote isn't in the database?

Page 36: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Clients: HEY?! How come the thing I just wrote isn't in the database?

Clients:Yeah, what's going on?!

Page 37: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

* Everyone: I've made a huge mistake....

Page 38: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Missing Piece?

Page 39: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Page 40: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker: System Level HA

● System level HA is holistic.

Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Page 41: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker

Pacemaker: System Level HA

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

● System level HA is holistic.

● Defines the policy of how to recover a set of applications

Page 42: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker

Pacemaker: System Level HA

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

● System level HA is holistic.

● Defines the policy of how to recover a set of applications

● Enforces the policy to achieve system wide deterministic behavior.

Page 43: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

● We don't question what happened to the other nodes... We know.

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Back to the Story... How does Pacemaker Help?

Page 44: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

● We don't question what happened to the other nodes... We know.

● And how do we know this?

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Back to the Story... How does Pacemaker Help?

Page 45: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Introducing STONITH

Page 46: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Introducing STONITH

Shoot the Other Node in the Head

Page 47: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

STONITH = Pacemaker's Fencing Daemon.

● Pacemaker knows the state of lost/misbehaving nodes

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Page 48: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

STONITH = Pacemaker's Fencing Daemon.

● Pacemaker knows the state of lost/misbehaving nodes

● Because with fencing via STONITH... that state is dead.

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Page 49: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Quick Recap...

Page 50: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Without Pacemaker+STONITH

keepaliveDkeepaliveD keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

ClientsKeepalived:I dunno. Who cares? Keepalived:I dunno. Who cares?

Page 51: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

With Pacemaker+STONITH...

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Pacemaker:They are dead because I killed them..

Clients

Page 52: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

Page 53: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

● Pacemaker and HAProxy are meant for one another.

Page 54: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

● Pacemaker and HAProxy are meant for one another.

HAProxy

Pacemaker

Page 55: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

PacemakerThe Distributed PID 1

Page 56: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Modern PID 1's role.

● SystemD:

● Launch services parallel● yet observe strict ordering between dependent services.● Monitor/recover failed resources.

systemd

galerarabbitmqStart Order Unrelated dependencies

start in parallel.

Ordering is enforced. These services can start in parallel only after their dependencies start.Nova

redis

ceilometer

Page 57: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Problem

● OpenStack services are not isolated to a local machine.

systemd

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

NODE1 NODE2 NODE3 NODE4

systemd systemd systemd

Page 58: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Problem

● OpenStack services are not isolated to a local machine.

● SystemD Can't coordinate this.

systemd

NODE1 NODE2 NODE3 NODE4

systemd systemd systemd

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

Page 59: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Fix.

● But Pacemaker can

● because Pacemaker is distributed.

Pacemaker

NODE1 NODE2 NODE3 NODE4

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

Page 60: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Fix.

● Pacemaker, just like systemd, can...

● Launch services parallel● yet observe strict ordering between dependent services.● Monitor/recover failed resources.

Pacemaker

galerarabbitmqStart Order Unrelated dependencies

start in parallel.

Ordering is enforced. These services can start in parallel only after their dependencies start.Nova

redis

ceilometer

Page 61: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Fix.● Except pacemaker can coordinate this across any number of nodes.

Pacemaker

NODE1 NODE2 NODE3 NODE4

RabbitMQ Cluster

NODE5 NODE6 NODE7 NODE8 NODE9

Galera Cluster Redis Cluster

Ceilometer ClusterNova Cluster

Page 62: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Fix.● Except pacemaker can coordinate this across any number of nodes.

● With any number of resources

Pacemaker

NODE1 NODE2 NODE3 NODE4 NODE5 NODE6 NODE7 NODE8 NODE9

HAProxy HAProxy HAProxy

VIP-Galera VIP-RedisVIP-Rabbit VIP-Nova

HAProxy

VIP-keys

HAProxy

VIP-cinder VIP-celio

Keystone ClusterCeilometer Cluster

Nova Cluster

Cinder Cluster

VIP-glance

HAProxy

VIP-neutron

HAProxy

Glance Cluster

Horizon Cluster

HAProxy HAProxy

RabbitMQ ClusterGalera Cluster Redis Cluster

Neutron Cluster

Swift Cluster

Heat Cluster

Page 63: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Resource Constraints

● Pacemaker has unique capabilities for managing resources and modeling complex resource dependencies.

● Examples:

● Start resource X then start resource Y● Colocate resource X with resource Y● Resource X prefers node A over node B● Resource X prefers node A between 8am-5pm

Page 64: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

ArchitectureHA OpenStack Controller Nodes

Page 65: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

Page 66: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Each distributed service has a front end Virtual IP tied to a Load Balancer.

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

HA-proxyHA-proxy

Page 67: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

HA-proxyHA-proxy

Page 68: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

How it works... in one sentence

HA-proxy

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

HA-proxy

Page 69: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

Active/Active OpenStack Controller service... like Nova, Glance, Keystone, Galera, Redis, Rabbitmq, ect...

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

HA-proxyHA-proxy

Page 70: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

PROXY CLONE

SERVICE CLONE

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Scaling with Resource Clones

● Pacemaker's ability to clone services makes scaling trivial.

● Want more instance?

HA-proxy HA-proxy

Page 71: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

PROXY CLONE

SERVICE CLONE

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Scaling with Resource Clones

NODE4 NODE5 NODE6

Service Service Service

● Increment the number of clone instances pacemaker is allowed to run for a service to scale service instances.

HA-proxy HA-proxy HA-proxy HA-proxy HA-proxy

Page 72: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker

HA-proxy

Galera VIP

Galera Galera Galera

NODE1 NODE2 NODE3

How it works, continued... ● Services interact with one another using each service's Virtual IP● Example: Both Glance and Nova need access to Galera... Galera is

accessed via the front end Virtual IP and those requests are distributed to the backend galera cluster.

Glance NovaDB Request DB Request

HA-proxyHA-proxy

Page 73: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Deployment StrategiesHA OpenStack Controller Nodes

Page 74: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

● All controller nodes run the same services.

● VIP+load balancers distribute access to APIs across cloned nodes.

Page 75: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

NOVA VIP

Clients

Page 76: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Glance VIP

Clients

Page 77: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Collapsed Architecture Scaling

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

NODE4

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE5

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE6

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Page 78: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Startup Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Bootstrap and start Start Galera Cluster across multiple nodes.

Page 79: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Startup Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Bootstrap and start Start Galera Cluster across multiple nodes.

Then Start Nova instances, which depend on an active Galera cluster.

Page 80: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

If we're shutting down galera cluster

Page 81: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Stop everything that depends on Galera. Like Nova...

Page 82: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Then Shutdown Galera cluster.

Page 83: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis SlaveNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Start Redis Clone.

Page 84: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis MasterNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Promote one instance of Redisclone to be Master Instance.

Page 85: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis MasterNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Then start Ceilometer cluster whichdepends on Redis

Page 86: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Segregated Architecture

Pacemaker

● Each service runs on its own dedicated hardware.

● Scales much further

● Add capacity where capacity makes sense.

● Requires lots and lots of nodes.

Page 87: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Segregated Architecture

Pacemaker

● Take a closer look.

Page 88: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Segregated Architecture

PacemakerNODE1

Galera

NODE2

Galera

NODE3

Galera

● Possible to have have an entire set of nodes just for Load balancing

● Dedicated cluster for galera

NODE4

Proxy

NODE5

Proxy

NODE6

Proxy

NODE7

Nova

NODE8

Nova

NODE9

Nova VIP VIP VIP

More services that way.

Glance

NODE9

VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP

Page 89: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Mixed Architecture

● Mixture of collapsed and segregated.

● Break some components into separate hardware

● Most of cluster is collapsed

PacemakerNODE1

Galera

NODE2

Galera

NODE3

Galera

NODE4

Proxy

NODE5

Proxy

NODE6

Proxy

NODE7 NODE8 NODE9

VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

Page 90: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker Advantages● Automates bootstrap of services that previously required hand holding.

● Example: Automate Galera bootstrap

1. Find out which galera instance is most up-to-date

2. Bootstrap most current galera instance first.

3. Then sync other galera instances

Page 91: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker Advantages. Continued... ● start/stop distributed services in a graceful ordered manner.

● Gracefully and controller node into standby for maintenance.

● Dynamically grow capacity by adding more pacemaker nodes

● Centralized view of distributed service state.

Page 92: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

ArchitectureHA OpenStack Compute Nodes

Page 93: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

HA for Cattle

● Both Pets and Cattle need High Availability.● Recognize the techniques used for each are different.

Page 94: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker for Pets and small Herds.

● No limits in the number of resources.

● Pacemaker supports “n-node” clusters.

● Cluster are limited by the Corosync messaging layer to 16 nodes.

Pacemaker

Page 95: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker Remote for the Cattle.

● Pacemaker Remote allows clusters to scale beyond corosync membership layer limitations.

● Pacemaker Remote can scale clusters to 100s possibly 1000s of nodes.

Pacemaker + Pacemaker Remote

Page 96: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Solution: Pacemaker Remote

● Pacemaker Remote is a single daemon, pacemaker_remoted

Pacemaker Remote

Remote Node

Page 97: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Solution: Pacemaker Remote

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Pacemaker Remote is a single daemon, pacemaker_remoted● This daemon is a lightweight way of integrating nodes into the cluster.

Page 98: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Solution: Pacemaker Remote

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Pacemaker Remote is a single daemon, pacemaker_remoted● This daemon is a lightweight way of integrating nodes into the cluster.

● Cluster services spread out across pacemaker and pacemaker_remote nodes as a single cluster partition.

Cluster Services

Page 99: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker Remote use-case

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Management of services on Pacemaker Remote can work just like Pacemaker

● But thrives in the cattle use case where every remote instance is identical.

Cluster Services

Cloned service

Pacemaker RemoteRemote Node

Cloned service

Pacemaker RemoteRemote Node

Cloned service

Page 100: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker Remote use case

Pacemaker

Node 2 Node 3Node 1

● Cloned services scale quite well on pacemaker remote

Cluster Services

Cloned services

Remote Node

Pacemaker Remote

Page 101: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Compute Node HA Strategy.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Page 102: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Compute Node HA Strategy.

Compute Service Group

Remote Node

Pacemaker Remote

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per serviceLibvirtdD

Neutron Open vSwitch

Ceilometer Compute Nova Compute

Page 103: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Why HA Compute Nodes?

● Maximize the Availability of Compute Instances.● detection of dead Cattle instances● Automate recovery of Cattle instances

Page 104: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Why HA Compute Nodes?

● Maximize the Availability of Compute Instances.● detection of dead Cattle instances● Automate recovery of Cattle instances

● Pacemaker Remote also has a secret weapon.

Page 105: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

STONITH + Pacemaker Remote.

Page 106: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

The Future

Page 107: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Limits?

Pacemaker + Pacemaker Remote

Page 108: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

How manyservices?

Pacemaker + Pacemaker Remote

Page 109: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Pacemaker + Pacemaker Remote

How manyservices?

How manynodes?

Page 110: Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

Questions?

Visit us at

clusterlabs.org