Top Banner
20

We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Aug 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science
Page 2: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

We The Few

Critical Team Composition and Responsibilities For Day 2 Operations

Page 3: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

WhoAmI?

KeithStrini…

FieldFacingSolutionsArchitectthatservesasatechnologyanalystfortheUSDepartmentofDefenseandIntelligencecommunities.Iarchitect,developandfieldinformationsystemsacrosstheJointServicesbothCONUSandOCONUS(Korea,Japan,Europe,andtheMiddleEast)andNATO

Page 4: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

End Vision

Page 5: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Not End Vision

Page 6: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Options Getting to End Vision

Page 7: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Release Engineering Stratification

AppOperator

DeveloperEnablement Release PlatformReliability

AppProfilesProdManifestUnit/SmokeTestPipelines

ReleaseRepository

CadenceCalendar

PlatformProductManager

SelfServiceDeployment

NoobDevTeam

unit/smoke security uat

pass

fail

Page 8: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Platform Reliability

Coordination Point For All Platform Environment Changes.

■  Creates/Coordinates Cadence Meeting ■  Continuously Develops Resiliency

Probes based on Post Mortems ■  Maintains Environment Parity ■  Enforces Strict Runtime Version Control ■  Communicates Environment Adversity ■  Creates/Coordinate Resiliency Exercise ■  Instruments Distributed Tracing in Ops

Release Engineering

Coordination Point For All New Releases.

■  Attends Final Pre-Release Demo

of Apps ■  Verifies Release Artifacts ■  Coordinates Initial Release Date ■  Collaborates on Downstream

Environment Triage

Developer Enablement

Coordination Point For All New Development Efforts.

■  Creates/Coordinates Platform on

boarding meeting ■  Provides Latest Information about

Platform Environments ■  Provides tooling around Dev

Services, CI/CD environment, governs code repositories, etc

Coordination Execution Resiliency

PRACTICES PRACTICES PRACTICES

Page 9: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Troubleshooting Complex System Failure

Page 10: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Simple System Failure (Daddy I want to build a car)

Page 11: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Simple Complexity

Page 12: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Failure is Inevitable, Hope is Not a Strategy

Care And Feeding Of New Releases To Ensure Early Intervention

Decides on Feature Maturity From a Stability Perspective

Been There, Done That, Got the swag!

Deep Understanding of How New Efforts Deploy Into Operations

Unit/Smoke

Focuses Developers on Contract Based Testing For Integration

Contract Feature Flagging Canary Distributed Tracing

Developers DevEnablement Release Operations Operations

Page 13: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Decoupled Integration – Contract Testing

•  IfSpeedIsWhatyouWant,End-To-EndTestingisnothowyougetthere.

•  Gettingfeedback…thisweek?•  Areyoumockingme?

•  SingleSourceofTruth…•  VerifyingtheGoods

•  IsolationtestingofSingleServices(ProviderorConsumer)

•  Idonotthinkthatmeanswhatyouthinkitmeans(SemanticTesting)

•  ComplexityFromSimplicity,NotComplexityFromComplexity

•  TestData…let’snotignoretheelephantintheroom

•  Stability,Stability,Stability…we’retalkingOperationsnotScienceExperiments

•  AhSunsets…•  PayingoffTechnicaldebtbysubtractionand

addition•  Yougetme…youreallygetme

•  ConsumerdefinedAPIs

Page 14: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Maintaining Operational Velocity – Feature Flagging

•  Idunno.Youtellmewhatyouwant.•  NonTechsgettinginontheAction

•  OksomostofitworksbutIgottasenditback?•  Beautyofcontextencapsulation•  Waitingforafeaturelikeyou

•  Iseehowwedo1appbuthowdoImanage1000s?•  Sowhatifwedon’tknowexactlywhatouruserswant?•  AhSunsets…

•  PayingoffTechnicaldebtbysubtractionandaddition

Page 15: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

•  Almost trust you •  Canaries – Profiling the CPU,

memory, disk usage, cache synch •  Rollback/Roll Forward Strategies

•  Blue/Green Deployments •  The case of stateless •  The case of stateful

(transactions, migrating data) •  Infrastructure Isolation

•  A/B testing

Predictive Fire Fighting In Operations – Canary/Distributed Tracing •  It not you, its me

•  Distributed Tracing •  Yes we are talking scale here •  But that’s a lot of instrumentation •  Correlation is tough

•  Good definitions of SLO/SLIs

•  Threshold tuning

Page 16: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Operations as the Caretaker of Code? •  Yourbabyisugly,Ourbabyiscute

•  PlatformasProducthelpsalignourinterests•  Automationhelpsusberesponsiveasateamtoourendusers

•  Lot’sofupfrontpainisbetterthanchronicpainindefinitely

•  Successasdefinedbyrhythm

Page 17: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

•  Change Inherently Creates Failure

•  Alignment of Values •  One Team One Fight

•  Joint SLOs •  Platform SLIs •  Application SLIs •  Instrumentation

Get Off My Lawn!

•  Growing up is hard to do •  Graduating Product Teams to

Self-Service Deployments

Page 18: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Growing Up Is Hard to Do •  Resiliency Exercises as the litmus test •  Communicating the attitude that Stability is a

team sport •  Starting the Cycle Over

•  Capturing Lessons Learned from every class •  Knowledge Transparency aides the greater

team

Page 19: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

In Conclusion

“When learning something new you have to practice going slow, if you want to eventually go fast forever”

Page 20: We The Few - Linux Foundation Events€¦ · • Test Data…let’s not ignore the elephant in the room • Stability, Stability, Stability… we’re talking Operations not Science

Questions?