Top Banner
Replacing the Engine while the Airplane is Flying Modifying and replacing software that cannot be taken offline
43

Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Mar 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Replacing the Engine while the Airplane is 

Flying Modifying and replacing software that cannot be taken 

offline

Page 2: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Dreaded Downtime Window

Page 3: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

How do you avoid the downtime window?

Load Balancing and Redundancy

Automated, Rolling Rollouts

Page 4: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Load Balancing

Load Balancer

ncc.com

Application Server 1

Application Server 2

Application Server N

Page 5: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated, Rolling Rollouts

Load Balancer

ncc.com

Application Server 1

Application Server 2

Application Server N

Page 6: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated, Rolling Rollouts

Load Balancer

ncc.com

Updating… Application Server 2

Application Server N

Page 7: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated, Rolling Rollouts

Load Balancer

ncc.com

Application Server 1* Updating Application 

Server N

Page 8: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated, Rolling Rollouts

Load Balancer

ncc.com

Application Server 1

Application Server 2* Updating…

Page 9: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated, Rolling Rollouts

Load Balancer

ncc.com

Application Server 1

Application Server 2

Application Server N*

Page 10: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 11: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 12: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 13: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Query

Page 14: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 15: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow QueryApp 

ServerApp Server

App Server

App Server

DB Server (SLOW)

Page 16: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow QueryApp 

ServerApp Server

App Server

App Server

DB Server (SLOW)

Page 17: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow QueryApp 

ServerApp Server

App Server

App Server

DB Server (SLOW)

Page 18: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow QueryApp 

ServerApp Server

App Server

App Server

DB Server (SLOW)

Page 19: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow QueryApp 

ServerApp Server

App Server

App Server

DB Server (SLOW)

Page 20: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 21: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: The Case of the Slow Query

Page 22: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What were some things we did right? 

Page 23: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What were some things we did right? 

• Software Versioning: Because we versioned our code, rolling back was possible. 

• Basic Monitoring: At least we knew it was happening and could see the database server was where things were slowing down. 

Page 24: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What could we have done better? 

Page 25: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What could we have done better? 

• Canary Deployment: Introduce new software on one server• Cross‐Training: Only CTO familiar with handling production issues

• Don’t Repeat Yourself: Centralize queries that do the same thing rather than spreading them all over the application. 

• Performance Testing: Automated performance tests of a similar scale to production would have likely caught this issue

• Better Metrics/Logs: This would have made it far easier to triage and identify what query was the problem

• Deprecation: Should have made it possible to roll back without losing data

Page 26: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Canary Deployments

Load Balancer

ncc.com

Application Server 1

Application Server 2

App Server Running New Version of Software

Page 27: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Automated Performance Testing

• Simulate User Load on the software

• Measure success/failure rate, responsive time, etc to validate that new major performance issues are found

Page 28: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Better Logs/Metrics

• Metrics that break down how long each query is taking

• Aggregating Logs across multiple servers to make it easy to search through errors

• Could use products like ELK (Elasticsearch + Logstash + Kibana) or Splunk

Page 29: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What about a Simple Rollback? 

Rollout: Convert old data to new format

Rollback: Convert new data back to old data

But what do you do if the formats are incompatible or the conversion to/from the new data format takes hours?

Old Format New Format

New Format Old Format

Page 30: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Deprecation Patterns

1. Old Data Format2. New Data Format, but with new features 

disabled

• Validate Software Works• New Feature is not enabled, so there is still a 

path to roll back• Software needs to be programmed to save/read 

in both the old and new data format to enable co‐existence with old systems

3. Stop Using Old Data Format Entirely

4. Toggle On New Feature

• Roll Out New Version of Software Completely that no longer uses the old data format

• Toggle on the new feature (which is causing you to change the data format) knowing that backwards compatibility with the old data format is no longer required

Page 31: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Story: Too Smart for our own good

Page 32: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

User Management Service Organization Management Service

Page 33: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

User Management Service Organization Management Service

Page 34: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Dependent Healthchecks

User Management Service Organization Management Service

Page 35: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Rolling it Out

User Management Service Organization Management Service

Page 36: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

User Management Service Organization Management Service

Page 37: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

User Management Service Organization Management Service

Page 38: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

User Management Service Organization Management Service

Page 39: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

How would you fix this? 

User Management Service Organization Management Service

Page 40: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Emergency patching

User Management Service Organization Management Service

Page 41: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What could we learn? 

Page 42: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

What could we learn? 

• Healthchecks are good, but sometimes extra automation can hurt you

• Circular Dependencies should be avoided• Many architecture problems are actually people problems• Sometimes issues only surface after running in production

Page 43: Modifying the Engine while the Airplane is Flying - 3-12-2018otto.normandale.edu/events/Modifying the Engine while the Airplane is Flying.pdf · path to roll back • Software needs

Questions?