Top Banner
OpenStack HA - Theory to Reality GERD PRÜßMANN SHAMAIL TAHIR SRIRAM SUBRAMANIAN KALIN NIKOLOV
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open stack HA -  Theory to Reality

OpenStack HA -Theory to Reality

GERD PRÜßMANN SHAMAIL TAHIRSRIRAM SUBRAMANIAN KALIN NIKOLOV

Page 2: Open stack HA -  Theory to Reality

Gerd Prüßmann Shamail TahirCloud Architect Cloud Architect Deutsche Telekom AG EMC Office of the CTO

Sriram Subramanian Kalin NikolovFounder & Cloud Specialist Cloud EngineerCloudDon PayPal

@2digitsleft @ShamailXD

@sriramhere

Page 3: Open stack HA -  Theory to Reality

Agenda

OpenStack HA - IntroductionActive/ ActiveActive/ PassiveDT ImplementationeBay/PayPal ImplementationSummary

Page 4: Open stack HA -  Theory to Reality

OpenStack HA - Introduction

What does it mean?Why is it not by default?Stateless vs StatefulChallengesMore than one way

Active/ PassiveActive/ Active

Page 5: Open stack HA -  Theory to Reality

Is This?

Page 6: Open stack HA -  Theory to Reality

Or This?

Page 7: Open stack HA -  Theory to Reality

Active/ Active

API Service EndpointsDatabaseNetworking

Page 8: Open stack HA -  Theory to Reality

Active/ Active● OS High Availability (HA) concept depends on components used for

i.e. network virtualization, storage backend, database system etc.● Various technologies available to realize HA:

Vendors use combinations: i.e. Pacemaker, Corosync, Galera, Keepalived, HAProxy, VRRP, DRBD … or their own tools

The following description is derived from the generic proposal from the OpenStack HA guide:http://docs.openstack.org/high-availability-guide/content/index.html

Page 9: Open stack HA -  Theory to Reality

Active/ Active● Target: Try to have all services of the platform highly available

Redundancy and resiliency against single service / node failure

● stateless services are load balanced (HAproxy + keepalived)

o i.e. API endpoints / nova-scheduler

● stateful services use individual HA technologies

o i.e. RabbitMQ, MySQL DB etc.

o might be load balanced as well

● some services/agents where no built in HA feature is available

Page 10: Open stack HA -  Theory to Reality

Active/ Active - API service endpoints

API endpoints● deploy on multiple nodes● configure load balancing with virtual IPs in HAproxy● use HAproxy’s VIPs to configure respective identity endpoints● all service configuration files refer to these VIPs only

schedulers ● nova-scheduler, nova-conductor, cinder-scheduler, neutron-server,

ceilometer-collector, heat-engine● schedulers will be configured with clustered RabbitMQ nodes

Page 11: Open stack HA -  Theory to Reality

Active/ Active - Databases

● MySQL or MariaDB with Galera cluster (wsrep) library extensiono transaction commit level replication

● synchronous multiple master nodes setupo min. 3 nodes to get quorum in

case of network partition● Write and read to any node● other databases options possible:

Percona XtraDB, PostgreSQL etc.

Page 12: Open stack HA -  Theory to Reality

Active/ Active - RabbitMQ

● RabbitMQ nodes clustered● mirrored queues configured via policy (i.e. ha-mode all)● all services use the RabbitMQ nodes

Page 13: Open stack HA -  Theory to Reality

Active/ Active - Networking

Network ● deploy multiple network nodes● Neutron DHCP agent – configure multiple DHCP agents

(dhcp_agents_per_network)● Neutron L3 agent

o Automatic L3 agent HA (allow_automatic_l3agent_failover)o VRRP (l3_ha, max_l3_agents_per_router, min_l3_agents_per_router)

● Neutron L2 agent - no HA available● Neutron metadata agent – no HA availailable● Neutron LBaaS agent – no HA available

● no HA feature available: active/passive pacemaker / corosync solution

Page 14: Open stack HA -  Theory to Reality

Active/ Active - ExampleDeployment example

Page 15: Open stack HA -  Theory to Reality

Active/ Passive

GeneralTools OverviewControllers Overview

Page 16: Open stack HA -  Theory to Reality

Active/ Passive: General

● Components should leverage a Virtual IP● The primary tools used for Active/Passive

OpenStack configurations are general (non-OpenStack specific): Pacemaker + Corosync, and DRBD

Page 17: Open stack HA -  Theory to Reality

Corosync

● Messaging Layer used by Cluster● Responsibilities include cluster membership and

messaging● Leverages RRP (Redundant Ring Protocol)

o Rings can be set up as A/A or A/Po UDP Onlyo mcastport specifies rcv port; mcastport minus 1 is

send port

Page 18: Open stack HA -  Theory to Reality

Pacemaker ● Cluster Resource Manager

● Cluster Information Base (CIB)

o Represents current state of resources and cluster configuration (XML)

● Cluster Resource Management Daemon (CRMd)

o Acts as decision maker (one master)

● Policy Engine (PEngine)

o Send instructions to LRMd and CRMd

● STONITHd

o Fencing mechanism

● Resource Agents

o Standardized interfaces for resource

CRMd

STONITHd CIB

PEngine

LRMd

Page 19: Open stack HA -  Theory to Reality

DRBD

● Distributed Replicated Block Device● Creates logical block devices (e.g. /dev/drbdX) that

having backing volumes● Reads serviced locally● Primary node writes are sent to secondary node

Page 20: Open stack HA -  Theory to Reality

Host1

Active/Passive: Database

MySQL

Host2

MySQL

DRBD DRBD

Pacemaker Pacemaker

Corosync Corosync

● Use DRBD to back MySQL

● Leverage VIP that can float between hosts

● Manage all resources (including MySQL Daemon) with Pacemaker

● MySQL/Galera is an alternative but current version of HA Guide does not recommend it

Page 21: Open stack HA -  Theory to Reality

Host1

Active/Passive: RabbitMQ

RabbitMQ

Host2

RabbitMQ

DRBD DRBD

Pacemaker Pacemaker

Corosync Corosync

● Use DRBD to back RabbitMQ

● Leverage VIP that can float between hosts

● Ensure erlang.cookie are identical on all nodes

o Enables ability to communicate with each other

● RabbitMQ clustering does not tolerate network partitions well

Page 22: Open stack HA -  Theory to Reality

Active/Passive: Overview (From Guide)

● Leverage DB, RabbitMQ VIP in configuration files

● Configure Pacemaker Resources for OpenStack Services

o Image API

o Identity

o Block Storage API

o Telemetry Central Agent

o Networking

o L3-Agent

o DHCP

Page 23: Open stack HA -  Theory to Reality

DT Implementation - Overview

● Business Market Place (BMP)● SaaS offering● https://portal.telekomcloud.com/● SaaS Applications from Software Partners

(ISVs) and DT offered to SME customers ● Platform based on Open Source technologies only

(OpenStack, CEPH, Linux)● Project started in 2012 with OS Essex, CEPH● In production since 3/13

Page 24: Open stack HA -  Theory to Reality

DT Implementation

DTAG scale out project (ongoing)

Target: Migrate production to a new DC and scale out

Requirements:● scale out compute by 30%, storage by 40%● eliminate all SPOFs● Setup in two fire protection areas / physically separated DC rooms

Page 25: Open stack HA -  Theory to Reality

DT Implementation

● single region HA OS instance● all services distributed over two DC rooms

o Compute and Storage distributed equallyo All OpenStack services HA (as far as possible)

OSS (DNS, NTP, puppet master, Mirror etc., redundant perimeter firewall)

● Instance distribution: 4 Availability Zones, multiple host aggregates and scheduler filters

Page 26: Open stack HA -  Theory to Reality

DT Implementation● Load Balancing

o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test)● MySQL

o Galera Multi Master Node replication (3 nodes)● RabbitMQ

o 2 nodes cluster / mirrored queues● Neutron

o DHCP multiple agents started; Pacemaker/Corosync● API Endpoints

o Loadbalancing with round robin distribution● Storage

o 2 shared, distributed CEPH clusters (RBD/S3)

Page 27: Open stack HA -  Theory to Reality

DT ImplementationTests/Experiences so far

● Load balancing works well● Database: OpenStack multi-node write issues

o 1 node write / 2 nodes backup: diminishes Galera HA efficiency (monitoring)● Specific issues with deployment in 2 DC rooms / uneven distribution of services (Galera)

o if the “wrong” room fails Galera: quorum requires majority!

room with 2 nodes goes down → 3rd node will deactivate itself → DB outage Storage specific:

CEPH may lose 2/3 of the replicas → heavy replication load on CEPH cluster danger of losing data (OSD/disk failure) → raise replica level / adapt crush

map Network: recovering from a neutron / L3 failure: <15 minutes to recover

o pet applications vulnerable – may suffer from hick-ups at disasters anyway● DHCP agent failures

Page 28: Open stack HA -  Theory to Reality

DT Implementation

Plans for the future

● use DVR / VRRP in the futureo make network more resilient and elastic

● a third DC room would be desirable :-)o CEPH replicas / MONs, MySQL Galera

Page 29: Open stack HA -  Theory to Reality

eBay/PayPal Implementation

The scope of Ebay/PayPal OpenStack Clouds● 100% of PayPal web/mid tier● Most of Dev/QA● Number of HVs: 8,500● Number of Virtual Machines: 70,000● Number of users: Several thousands● Availability zones: 10

Page 30: Open stack HA -  Theory to Reality

eBay/PayPal Implementation● Database

MySQL MMM replication, VIP with FailoverPersistence / Galera● RabbitMQ

VIP with SingleNode FailoverPersistence or 3 nodes with mirrored queues● NeutronDHCP / LBaaS

Corosync/Pacemaker● API Endpoints

LB VIPs for every service with either RR or least connection● Storage

Shared storage with nfs/iscsi

Page 31: Open stack HA -  Theory to Reality

eBay/PayPal Implementation

Successful HA Implementations● LoadBalanced HA - VIPs for every service● LB Single Node Failover Persistence Profile● Galera/Percona for Identity Service● Global Identity Service using GLB

Page 32: Open stack HA -  Theory to Reality

eBay/PayPal Implementation

HA Failures● Corosync/Pacemaker

NeutronDHCP and LBaaS - missing advanced health checks ● RabbitMQ

Single Node Failover Persistence● MySQL Replication

Single Node Failover Persistence sometimes doesn't work well Implemented external monitoring and disabling of the failed member.● VIPs without ECV health checks

Page 33: Open stack HA -  Theory to Reality

eBay/PayPal Implementation

Future direction● HA on Global or Regional Services

One leg in each Availability Zone (Keystone, LBaaS, Swift)● RabbitMQ with 3 node/mirrored queues

LB VIP with least connections● No shared NFS for Glance

Page 34: Open stack HA -  Theory to Reality

eBay/PayPal Global Identity Service

Page 35: Open stack HA -  Theory to Reality

eBay/PayPal Implementation

Lessons Learned● Try not to overcomplicate● Simulate Failures

Before placing in production make sure HA works● Place your services in different Availability zones

or at least different FaultZones● Always make backups

No matter how robust your HA solution is

Page 36: Open stack HA -  Theory to Reality

● OpenStack HA Guide Update Efforts● WTE Work Group (now known as ‘Enterprise’)

● Share Best Practices

Call to Action

Page 37: Open stack HA -  Theory to Reality

Reference

OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.htmlPercona Resourceshttps://www.percona.com/resources/mysql-webinars/high-availability-using-mysql-cloud-today-tomorrow-and-keys-your-successHA Proxy Documentation:http://www.haproxy.org/