Nova states summit

Moving to structured state management in OpenStack

Yahoo! and NTT Data

Deployer use cases

• As a deployer I want to ensure that an instance is reserved & provisioned without falling back and/or reporting to users internal OpenStackerrors.

• As a deployer I want to be able to allocate, schedule and reserve resources before they are consumed so that I can make advanced/complex/custom scheduling decisions using the combination of those resources as a whole.

• I want to convey to my users that OpenStack is a reliable and dependable system that is resilient to API outages, resource failures…

Developer use cases

• I want to be able to add new (and improved!) states to OpenStack and know what the impacts will be on the other states in OpenStack in a easy to understand manner.

• I want to be able to undo (and redo) resource allocation decisions in a transactional and verifiablycorrect manner on errors or on other ‘smart’ algorithmic placement logic.

• I want to be able to quickly and easily understand an API request from start to finish & I want other developers to have a single place to understand the same.

User use cases

• I want to ensure that my instances are reliablybrought up without involving myself to resolve(or raise to support) errors inside of OpenStack.

• I want to ensure that my instances (and associated resources) are optimally scheduled in a reliable and correct manner or not have them scheduled to begin with.

• I want my resources to be fully utilized, and not have zombie resources being ‘locked’ due to the lack of transactional semantics (and recovery) in the underlying code.

The problem

• Hard to [follow, recover from, debug, ensure reliability, correctness, extend, audit…] ad-hoc distributed state transitions.– Created by continual placement of new features

without revisiting the underlying state management system.• The never ending battle between new hotness vs. stability

– Majority of focus (understandably) on getting OpenStack operational.

– Typical technical debt.• Acceptable for a new project like OpenStack to get off the

ground, but now is the time to focus on features that addstability/scalability...

The problem

• Inter-state ‘cutting’ results in instances which require manual or periodic tasks to recover.– Distributed systems should always be able to

automatically recover from failures, and not require manual/periodic intervention.

• Continually adding local [solutions,fixes,patches]• Lack of [focus,time,desire] to fix the system as a whole?

• How many inter-state race conditions are hiding underneath the covers??– Can verification even be done with the current

codebase (in a reasonable time period)?

request nova-api

Libvirt

RabbitMQkeystone

glance

nova-compute

nova-scheduler

VolumeService

NetworkService

10,14 16

CREATE SERVER API (admin/user)

Create Server - Transitions and States

ID Service Operation vm_state task_state power_state

1 Nova API Initial State - - -

2 Keystone Authenticate user - - -

3 Nova API/Glance Show image - - -

4 Nova API/MySQL Create entry BUILDING SCHEDULING -

5 Nova API/RabbitMQ Cast to Scheduler BUILDING SCHEDULING -

6 Scheduler Received at Scheduler BUILDING SCHEDULING -

7 Scheduler/RabbitMQ Cast to Compute BUILDING SCHEDULING -

8 Compute Received at Compute BUILDING SCHEDULING -

9 Compute/Glance Show image BUILDING SCHEDULING -

10 Compute/MySQL Update DB BUILDING NETWORKING -

11 Compute/RabbitMQ Call on Network BUILDING NETWORKING -

12 Network Allocate Network BUILDING NETWORKING -

13 Compute/Volume Attach volume BUILDING BLOCK_DEVICE_MAPPING

14 Compute/MySQL Update DB BUILDING SPAWNING -

15 Compute/Libvirt Spawn instance BUILDING SPAWNING -

16 Compute/MySQL Update DB ACTIVE None RUNNING

What happensif we cut here??

Or here??

Solutions solutions solutions

• Nova has mostly stabilized (code-wise)

– It appears to be a good time to rethink some of the foundations. And rework some of the foundations (with as minimal of an impact as we can)

– Eventually as other core components (quantum) stabilize similar analysis can be done there (if needed)

• Prototyping a potential solution and discuss with community on next steps.

– That’s why we are here folks

Create request without orchestration

https://docs.google.com/document/d/1xpUszQFEtKmRAf1Wz_XpwyJslhI5X6siM29amPnKifE

Create request with orchestration

https://docs.google.com/document/d/1xpUszQFEtKmRAf1Wz_XpwyJslhI5X6siM29amPnKifE

Key Benefits

• Less scattering of state management– Makes it easier to understand…

• Less scattering of recovery scenarios – Clearly defined rollbacks…

• Faster and more dependable resource acquisition– Compute node will perform initialization and final acquisition of resources. – Reservations and initial acquisitions will be done before request to provision

instances, hence faster VM spawns.

• Scheduler can be make better ‘overall’ scheduling decisions.– Ex. no need for compute <-> scheduler retry hacks– Can make advanced scheduling decisions based on volume choices, locality,

network choices... When you are able to acquire/release resources before there use, anything is possible…

– No more need for 'hinting'...

• Creates a single place where others can extend or alter nova state transitions to plug-in there own ‘custom/internal’ state transitions.

DEMOAND

DISCUSSION

https://etherpad.openstack.org/the-future-of-orch

Nova states summit

openstack operational

distributed state transitions

structured state management

reserve resources

associated resources

zombie resources

state1nova apiinitial

new hotness

Documents

Parent Advocacy Summit · Parent Advocacy Summit Definition...

Mauritius hosts SADC annual summit · 2020-03-02 · r .......

Smarter Analytics Leadership Summit - IBM - United States

National Aging and Disability Summit - ADvancing States...

Summit of the States on Interstate Cooperation

United States Department of the Interior National Park ...s....

Under the Hood with Nova, Libvirt and KVM...Under the Hood.....

AMPLE - UncommonGoodsAMPLE. AMPLE AMPLE. Blá//.wwá AS st/....

Helsinki Summit 1992 · 2018-01-30 · HELSINKI SUMMIT...

Nova Scotia Solar Summit 2018 Nova Scotia Residential...

G7 in Figures - Summit of the G7 states in Elmau 2015

Michigan Nursing Summit · Overview Michigan Center for...

11th Annual Strategic Leaders Global Summit on Graduate .......

OpenStack Atlanta Summit Report: Neutron, Nova and design...

6th ARON DIOXIDE UTILIZATION SUMMIT -...

SUMMIT PROGRAM - courts.state.ny.us Region Summit/Capital...