Top Banner
184

DevOps Roadtrip Minneapolis

Apr 08, 2017

Download

Technology

VictorOps
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DevOps Roadtrip Minneapolis
Page 2: DevOps Roadtrip Minneapolis

JASON HAND |DevOps Evangelist

• Holds over 15 years of experience as a developer, system administrator, and support specialist

• Fully emerged into the world of agile development and the DevOps movement with Colorado tech startups

#DevOpsRoadTrip

Page 3: DevOps Roadtrip Minneapolis

#DevOpsRoadtrip#DevOpsRoadTrip

Page 4: DevOps Roadtrip Minneapolis
Page 5: DevOps Roadtrip Minneapolis

A little about VictorOps…

VictorOps is the real-time incident management platform that combines the power of people and data to embolden DevOps pros to handle incidents as they occur.

#DevOpsRoadTrip

Page 6: DevOps Roadtrip Minneapolis
Page 7: DevOps Roadtrip Minneapolis

Why AreWe Here?

Page 8: DevOps Roadtrip Minneapolis
Page 9: DevOps Roadtrip Minneapolis
Page 10: DevOps Roadtrip Minneapolis
Page 11: DevOps Roadtrip Minneapolis
Page 12: DevOps Roadtrip Minneapolis
Page 13: DevOps Roadtrip Minneapolis

Culture

Page 14: DevOps Roadtrip Minneapolis
Page 15: DevOps Roadtrip Minneapolis
Page 16: DevOps Roadtrip Minneapolis
Page 17: DevOps Roadtrip Minneapolis
Page 18: DevOps Roadtrip Minneapolis

Culture

Page 19: DevOps Roadtrip Minneapolis

“How Organizations Process Information”

Roy Westrum: A Typology of Organizational Cultures2014 State of DevOps Report shows that in the context of IT, job satisfaction is the biggest predictor of profitability, market share, and productivity. The biggest predictor of job satisfaction, in turn, is how effectively organizations process information, as determined by a model created by sociologist Ron Westrum, shown below. 1

1: https://continuousdelivery.com/implementing/culture/

Page 20: DevOps Roadtrip Minneapolis
Page 21: DevOps Roadtrip Minneapolis
Page 22: DevOps Roadtrip Minneapolis

Words are how we think – stories are how we link.

- Christina Baldwin

Oral narrative is and for a long time has been the

chief basis of culture itself.

- John D. Niles

Stories from the road

Page 23: DevOps Roadtrip Minneapolis
Page 24: DevOps Roadtrip Minneapolis
Page 25: DevOps Roadtrip Minneapolis
Page 26: DevOps Roadtrip Minneapolis

Cynefin

Page 27: DevOps Roadtrip Minneapolis

Unordered OrderedComplicated

Obvious

Complex

ChaoticCause Effect

ObviousFrom Experience

Cause Effect RequiresAnalysis

Cause Effect Only Apparent in Hindsight

Cause & Effect CannotBe Related

Sense – Categorize - Respond

Sense – Analyze - RespondProbe – Sense - Respond

Act – Sense - Respond

Page 28: DevOps Roadtrip Minneapolis
Page 29: DevOps Roadtrip Minneapolis

The systems we engineer, maintain, and improve are

Complicated .. or ..

Known unknowns

Page 30: DevOps Roadtrip Minneapolis

The systems we engineer, maintain, and improve are

ComplexUnknown unknowns

Page 31: DevOps Roadtrip Minneapolis
Page 32: DevOps Roadtrip Minneapolis

What is the

Root Cause?

Page 33: DevOps Roadtrip Minneapolis

What are the..

ContributingFactors?

Page 34: DevOps Roadtrip Minneapolis

Identifying a “root cause” helps us to …

Put it backhow it was

Page 35: DevOps Roadtrip Minneapolis

What we really want is to..

ContinuouslyImprove

Page 36: DevOps Roadtrip Minneapolis

Tim

e To

Rep

air

(TTR

)

Continuous Improvement Efforts

Reactive(chaotic)

Tactical(obvious)

Integrated(complicated)

Strategic(complex)

✓ No automation

✓ No operational stack awareness

✓ Poor collaboration between teams (Dev & Ops)

✓ Documentation not available

✓ No standardized communication

✓ High focus on consistent continuous learning

✓ Uses a NOC

✓ Some monitoring & alerting instrumentation

✓ Collaboration in crisis

✓ "Mission critical" processes are available

✓ Understood crisis communication protocols

✓ Remediation data available to IT Operations

✓ Team rotations, paging policies, role hunting

✓ Continuous improvement of key health indicators

✓ Technical collaboration across all incidents

✓ Docs up to date and easily accessible

✓ Consistent real-time communication practices

✓ Automated docs and remediation✓ Actionable Alerts with full context✓ High collaboration among all

teams✓ Documentation part of

remediation✓ Targeted, proactive crisis comms✓ High focus on continuous learning

Incident Management Maturity

Page 37: DevOps Roadtrip Minneapolis

Reactive(chaotic)

✓No automation

✓No operational stack awareness

✓Poor collaboration between teams (Dev & Ops)

✓Documentation not available

✓No standardized communication

✓High focus on consistent continuous learning

Page 38: DevOps Roadtrip Minneapolis

Tactical(obvious)

✓Uses a NOC

✓Some monitoring & alerting instrumentation

✓Collaboration in crisis

✓"Mission critical" processes are available

✓Understood crisis communication protocols

✓Remediation data available to IT Operations

Page 39: DevOps Roadtrip Minneapolis

Integrated(complicated)

✓Team rotations, paging policies, role hunting

✓Continuous improvement of key health indicators

✓Technical collaboration across all incidents

✓Docs up to date and easily accessible

✓Consistent real-time communication practices

Page 40: DevOps Roadtrip Minneapolis

Strategic(complex)

✓Automated docs and remediation

✓Actionable Alerts with full context

✓High collaboration among all teams

✓Documentation part of remediation

✓Targeted, proactive crisis comms

✓High focus on continuous learning

Page 41: DevOps Roadtrip Minneapolis

“Six Trends Shape DevOps Adoption, Q1 2015” Forrester report

• The Foundation For Success Is In Place . . . Mostly

• Fear Of Failure Will Hamper Advancement

• Monitoring And Analytics Strategies Must Make A Big Leap Forward

• The Focus On Customer Experience Is Not Second Nature . . . Yet

• Change And Release Processes Are Not Delivering Business Needs

• You Must Prioritize And Focus Sourcing Strategies

Page 42: DevOps Roadtrip Minneapolis

Automation

Awareness

Collaboration

Documentation User Empathy

Learning

Page 43: DevOps Roadtrip Minneapolis

Learning

Page 44: DevOps Roadtrip Minneapolis

Failure not seen as opportunity to learn

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 45: DevOps Roadtrip Minneapolis

Awareness

http://blog.vmware.com

Page 46: DevOps Roadtrip Minneapolis

© 2015 Forrester Research, Inc. Reproduction Prohibited 46

Single Source Of Truth Lacking In Many Orgs – 95% only most of the time or less

Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report

Page 47: DevOps Roadtrip Minneapolis

Collaboration

http://neolivemarketing.com/wp-content/uploads/2015/09/Collaboration.jpg

Page 48: DevOps Roadtrip Minneapolis

Teams siloed throughout life cycle

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 49: DevOps Roadtrip Minneapolis

User Empathy

https://open.buffer.com/wp-content/uploads/2015/12/empathy3.jpg

Page 50: DevOps Roadtrip Minneapolis

© 2015 Forrester Research, Inc. Reproduction Prohibited 50

IT teams aren’t measured on customer experience goals.

Page 51: DevOps Roadtrip Minneapolis

Automation

http://thelifedesignproject.com/wp-content/uploads/2009/09/373881476_217d24ef6d.jpg

Page 52: DevOps Roadtrip Minneapolis
Page 53: DevOps Roadtrip Minneapolis

Delays in notifications Leads To Customers Finding the Problem First

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 54: DevOps Roadtrip Minneapolis

Documentation

http://blog.vmware.com

Page 55: DevOps Roadtrip Minneapolis

Reduce MTTRState of DevOps Report (2015) – by Puppet Labs

Page 56: DevOps Roadtrip Minneapolis

Automation

Awareness

Collaboration

Documentation User Empathy

Learning

Page 57: DevOps Roadtrip Minneapolis

jhand.co/DRT_SF

Page 58: DevOps Roadtrip Minneapolis
Page 59: DevOps Roadtrip Minneapolis
Page 60: DevOps Roadtrip Minneapolis
Page 61: DevOps Roadtrip Minneapolis
Page 62: DevOps Roadtrip Minneapolis
Page 63: DevOps Roadtrip Minneapolis
Page 64: DevOps Roadtrip Minneapolis

Bridget Kromhout | Pivotal - Cloud Foundry Principal Technologist • Bridget Kromhout is a Principal Technologist for Cloud Foundry at

Pivotal.

• After years as an operations engineer (most recently at DramaFever), she traded in oncall for more travel.

• A frequent speaker at tech conferences, she helps organize tech meetups at home in Minneapolis, serves on the program committee for Velocity, and acts as a global core organizer for devopsdays.

• She podcasts at Arrested DevOps, occasionally blogs at bridgetkromhout.com, and is active in a Twitterverse near you.

#DevOpsRoadTrip

Page 65: DevOps Roadtrip Minneapolis

@bridgetkromhout

Monitoring

Page 66: DevOps Roadtrip Minneapolis

@bridgetkromhout

lives: Minneapolis,

Minnesota

works: Pivotal

podcasts: Arrested DevOps

organizes: devopsdays

Bridget Kromhout

Page 67: DevOps Roadtrip Minneapolis

@bridgetkromhout

Traded oncall… …for more travel (Similar effect on sleep)

Page 68: DevOps Roadtrip Minneapolis

@bridgetkromhout

Page 69: DevOps Roadtrip Minneapolis

@bridgetkromhout

“…measuring value, throughput, and performance…

revenue rather than cost”

The Art of Monitoring (2016) James Turnbull

artofmonitoring.com

Page 70: DevOps Roadtrip Minneapolis

@bridgetkromhout

Image credit: James Ernest

Page 71: DevOps Roadtrip Minneapolis

@bridgetkromhout

The Art of Monitoring (2016) James Turnbull

Monitoring containers

artofmonitoring.com

Page 72: DevOps Roadtrip Minneapolis

@bridgetkromhout

“Almost every task run under Borg contains a

built-in HTTP server that publishes information

about the health of the task and thousands of performance metrics”

Large-scale cluster management at Google with Borg - Verma et al. 2015

“Almost every task run under Borg contains a

built-in HTTP server that publishes information

about the health of the task and thousands of performance metrics”

Page 73: DevOps Roadtrip Minneapolis

@bridgetkromhout

The Art of Monitoring (2016) — James Turnbull

Monitoring Maturity Model

artofmonitoring.com

Page 74: DevOps Roadtrip Minneapolis

@bridgetkromhout Image credit: Wikipedia

“Any organization that designs a system… will produce a design

whose structure is a copy of the organization's

communication structure.”

Mel Conway

Page 75: DevOps Roadtrip Minneapolis

@bridgetkromhout

silos are for grain

Page 76: DevOps Roadtrip Minneapolis

@bridgetkromhout

three Friday mornings in Minneapolis

removed restored

Page 77: DevOps Roadtrip Minneapolis

@bridgetkromhout

Thank you!

Page 78: DevOps Roadtrip Minneapolis
Page 79: DevOps Roadtrip Minneapolis

Andy Domeier | SPS CommerceDirector System Operations

• Andy has been in Technology Operations leadership with SPS Commerce for the past 11 years.

• Andy spends many mental cycles collaborating to solve effective patterns for monitoring and operating complex changing systems.

• Andy’s also spends time solving for priority organization and alignment and the organization of knowledge.

#DevOpsRoadTrip

Page 80: DevOps Roadtrip Minneapolis

HOW EFFECTIVE IS YOUR INCIDENT RESPONSE?Andy Domeier@ajdomie

Page 81: DevOps Roadtrip Minneapolis

agenda© SPS COMMERCE 2

Styles of Incident ResponseHealthy Incident ResponseTips & Tricks

Page 82: DevOps Roadtrip Minneapolis

STYLE #1 - DENIAL

© SPS COMMERCE 3

That’s not possible!No Wai!

Page 83: DevOps Roadtrip Minneapolis

© SPS COMMERCE 4

Page 84: DevOps Roadtrip Minneapolis

STYLE #2 - CONFUSED

© SPS COMMERCE 5

UmmmmHmmmm

(crickets)

How is thisPossible?

Page 85: DevOps Roadtrip Minneapolis

© SPS COMMERCE 6

Page 86: DevOps Roadtrip Minneapolis

STYLE #3 - LAZY

© SPS COMMERCE 7

It’s the DatabaseIt’s the Network

Just Restart It

Page 87: DevOps Roadtrip Minneapolis

© SPS COMMERCE 8

Page 88: DevOps Roadtrip Minneapolis

STYLE #4 - ANGRY

© SPS COMMERCE 9

Why did

you do that? What did you

change?

#!%& $#!@ #%$! &#!^ #$@

Page 89: DevOps Roadtrip Minneapolis

© SPS COMMERCE 10

Page 90: DevOps Roadtrip Minneapolis

STYLE #5 - FIREDRILL

© SPS COMMERCE 11

OMG WTF FML

“Buckshot”

Page 91: DevOps Roadtrip Minneapolis

© SPS COMMERCE 12

Page 92: DevOps Roadtrip Minneapolis

© SPS COMMERCE 13

LET’S GET REAL

Page 93: DevOps Roadtrip Minneapolis

© SPS COMMERCE 14

Page 94: DevOps Roadtrip Minneapolis

• Good way - Alarm

HOW DO WE KNOW THERE IS A FIRE?

© SPS COMMERCE 15

Page 95: DevOps Roadtrip Minneapolis

• Bad Way – Humans

HOW DO WE KNOW THERE IS A FIRE?

© SPS COMMERCE 16

Page 96: DevOps Roadtrip Minneapolis

• If you catch it right away?

WHO PUTS THE FIRE OUT?

© SPS COMMERCE 17

Page 97: DevOps Roadtrip Minneapolis

• If it’s out of control?

WHO PUTS THE FIRE OUT?

© SPS COMMERCE 18

Page 98: DevOps Roadtrip Minneapolis

INCIDENT RESPONSE TEAM

© SPS COMMERCE 19

Page 99: DevOps Roadtrip Minneapolis

• #monoliths– Familiar, All or None, Less Agility

• #microservices– Complex, semi-isolated, Agile

WHAT’S YOUR SYSTEM?

© SPS COMMERCE 20

Page 100: DevOps Roadtrip Minneapolis

• Monitoring Tools– Base IT

– Logging

– APM

– Metrics

WHERE’S YOUR DATA?

© SPS COMMERCE 21

Page 101: DevOps Roadtrip Minneapolis

RESPOND IN ISOLATION

© SPS COMMERCE 22

Page 102: DevOps Roadtrip Minneapolis

• Hey Danielle, It looks like the site is acting up and when looking around the only outlier I have found so far is a cpu spike on the DB. Can you help me investigate this a bit more?

RESPOND AS A TEAM

© SPS COMMERCE 23

Page 103: DevOps Roadtrip Minneapolis

• Share Screens & Visualize Data• Display Alerts w/ Integrations• Automatic History Retention• Enables Collaboration for All• And my Favorite…...

#CHATOPS

© SPS COMMERCE 24

Page 104: DevOps Roadtrip Minneapolis

#CHATOPS – CELEBRATE WITH GIFS

© SPS COMMERCE 25

Page 105: DevOps Roadtrip Minneapolis

• Make health data as transparent and central as possible– Helps the Team “Know where the fire is”

• Share data in chat– Use the metric from your tools

• “Be Transparent”

• Team Response Nurtures Team Follow Up

TIPS FOR HEALTHY INCIDENT RESPONSE

© SPS COMMERCE 26

Page 106: DevOps Roadtrip Minneapolis

• Always tie things back to the customer– Simple but often over looked

– Opportunity to link the team to the business

TIPS FOR HEALTHY INCIDENT RESPONSE

© SPS COMMERCE 27

Page 107: DevOps Roadtrip Minneapolis

THANK YOU!Andy Domeier

@ajdomie

© SPS COMMERCE 28

Page 108: DevOps Roadtrip Minneapolis
Page 109: DevOps Roadtrip Minneapolis

Ben Overmyer | Star TribuneDigital Manager, Operations • Ben is the Digital Manager of Operations at the Minneapolis Star

Tribune.

• He has over a decade of experience as a back end software engineer, two years of experience as a dedicated operations engineer, and great enthusiasm for the DevOps culture.

• Besides the Star Tribune, he’s worked for an eclectic mix of organizations, including the USGS, a game company in New Zealand, and a beauty products marketing company.

• When not hacking on servers, apps, or people, he acts as art director and author for a tabletop gaming company.

#DevOpsRoadTrip

Page 110: DevOps Roadtrip Minneapolis

EVOLVING INCIDENT MANAGEMENT

STAR TRIBUNE DEVOPS

Page 111: DevOps Roadtrip Minneapolis

IN THE BEGINNING

▸ Forwarded phone line

▸ An on-call list maintained in a wiki

▸ Every week, manually change to the next person on the list

▸ …and overrides or substitutions?

Page 112: DevOps Roadtrip Minneapolis

EARLY MONITORING

▸ Zabbix monitoring set up for a handful of causes

▸ Zabbix alerts sent via email to a distribution list

▸ Sometimes no one would see these alerts until hours or, in rare cases, days later

Page 113: DevOps Roadtrip Minneapolis

THE PAIN POINTS

▸ Manual maintenance of the calling tree data

▸ Manual rotation of the support phone line forwarding

▸ Poor documentation of incident life cycles

▸ No sense of incident frequency beyond “this was a bad couple weeks”

▸ If the on-call person didn’t respond, there was no escalation process other than calling the head of Digital

Page 114: DevOps Roadtrip Minneapolis

PHASE I: VICTOROPS

Page 115: DevOps Roadtrip Minneapolis

ADOPTING VICTOROPS

▸ Automated rotations

▸ Multiple teams

▸ Automatic escalation processes

▸ Easy schedule overrides and changes

▸ APIs for programmatic incident interaction

Page 116: DevOps Roadtrip Minneapolis

THE NATURE OF ALERTS

▸ OK, we can set up programmatic alerts. Now what?

▸ Integrating Zabbix, New Relic, and CloudWatch

▸ Discovering alert floods

▸ Move to alerting on symptoms, not causes

▸ …but still monitoring causes

Page 117: DevOps Roadtrip Minneapolis

PHASE 2: THE STATUS SITE

Page 118: DevOps Roadtrip Minneapolis

THE SPIDEY-SENSE FACTOR

▸ Humans are good at catching certain kinds of problems

▸ “This doesn’t feel right” and gaps in monitoring

▸ The evolution of the Sev incident system

Page 119: DevOps Roadtrip Minneapolis

THE STATUS SITE: MANUAL ALERTING FOR NON-TECH USERS

▸ Want to let certain non-tech users report Sev incidents

▸ Initially just a password-protected form

▸ Uses the VictorOps alert ingestion API for triggering alerts

▸ Uses the VictorOps public API for fetching information

▸ Each Sev alert is created with its own entity_id

▸ Lets admin users share status updates

Page 120: DevOps Roadtrip Minneapolis

MONTHLY INCIDENT REPORTING

▸ Monthly reports include a list of all Sev incidents, when they started, when they ended, what the alert text was, and what the resolution was

▸ Combine automated and chat messages in VictorOps with data gathered from other sources

▸ Present this data as automatically as possible in the Status Site

Page 121: DevOps Roadtrip Minneapolis

PHASE 3: EVOLUTION

Page 122: DevOps Roadtrip Minneapolis

NEXT STEPS

▸ Integration of summarized data collected from Datadog/CloudWatch/etc. into incident reporting

▸ Reports for users that shouldn’t have access to VictorOps

▸ Integration of the Status Site into Slack

Page 123: DevOps Roadtrip Minneapolis

▸ @bovermyer

▸ benovermyer.com

Page 124: DevOps Roadtrip Minneapolis

Q&A

Page 125: DevOps Roadtrip Minneapolis

BREAK TIME#DevOpsRoadTrip

Page 126: DevOps Roadtrip Minneapolis

Breakout Sessions◻ ChatOps - Jason Hand

◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier

◻ Monitoring and Microservices – Bridget Kromhout

◻ Blameless Culture – Heather Mickman

◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer

#DevOpsRoadTrip

Page 127: DevOps Roadtrip Minneapolis

BREAK TIME#DevOpsRoadTrip

Page 128: DevOps Roadtrip Minneapolis

Breakout Sessions◻ ChatOps - Jason Hand

◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier

◻ Monitoring and Microservices – Bridget Kromhout

◻ Blameless Culture – Heather Mickman

◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer

#DevOpsRoadTrip

Page 129: DevOps Roadtrip Minneapolis

BREAK TIME#DevOpsRoadTrip

Page 130: DevOps Roadtrip Minneapolis

Heather Mickman | Target Senior Director of Platform Engineering• Heather Mickman is the Senior Director of Platform Engineering at Target and a

DevOps enthusiast.

• Heather has 20+ years of IT experience in various roles and industries including retail, transportation, and high tech manufacturing.

• She is currently working on building the platforms used by software engineers at Target including a multi-provider cloud platform, API Gateway, telemetry tooling, data stores, and messaging.

• She has a passion for technology, building high performing teams, driving a culture of innovation, and having fun along the way. Heather lives in Minneapolis with her 2 sons and mini dachshund.

#DevOpsRoadTrip

Page 131: DevOps Roadtrip Minneapolis
Page 132: DevOps Roadtrip Minneapolis
Page 133: DevOps Roadtrip Minneapolis
Page 134: DevOps Roadtrip Minneapolis
Page 135: DevOps Roadtrip Minneapolis
Page 136: DevOps Roadtrip Minneapolis
Page 137: DevOps Roadtrip Minneapolis
Page 138: DevOps Roadtrip Minneapolis
Page 139: DevOps Roadtrip Minneapolis
Page 140: DevOps Roadtrip Minneapolis
Page 141: DevOps Roadtrip Minneapolis
Page 142: DevOps Roadtrip Minneapolis
Page 143: DevOps Roadtrip Minneapolis
Page 144: DevOps Roadtrip Minneapolis
Page 145: DevOps Roadtrip Minneapolis
Page 146: DevOps Roadtrip Minneapolis
Page 147: DevOps Roadtrip Minneapolis
Page 148: DevOps Roadtrip Minneapolis
Page 149: DevOps Roadtrip Minneapolis

Q&A

Page 150: DevOps Roadtrip Minneapolis

Automation

Awareness

Collaboration

Documentation User Empathy

Learning

Page 151: DevOps Roadtrip Minneapolis

jhand.co/DRT_MSP

Page 152: DevOps Roadtrip Minneapolis

Cynefin

Page 153: DevOps Roadtrip Minneapolis

Unordered OrderedComplicated

Obvious

Complex

ChaoticCause Effect Obvious

From Experience

Cause Effect RequiresAnalysis

Cause Effect Only Apparent in Hindsight

Cause & Effect CannotBe Related

Sense – Categorize - Respond

Sense – Analyze - RespondProbe – Sense - Respond

Act – Sense - Respond

Page 154: DevOps Roadtrip Minneapolis
Page 155: DevOps Roadtrip Minneapolis

The systems we engineer, maintain, and improve are

Complicated .. or ..

Known unknowns

Page 156: DevOps Roadtrip Minneapolis

The systems we engineer, maintain, and improve are

ComplexUnknown unknowns

Page 157: DevOps Roadtrip Minneapolis
Page 158: DevOps Roadtrip Minneapolis

What is the

Root Cause?

Page 159: DevOps Roadtrip Minneapolis

What are the..

ContributingFactors?

Page 160: DevOps Roadtrip Minneapolis

Identifying a “root cause” helps us to …

Put it backhow it was

Page 161: DevOps Roadtrip Minneapolis

What we really want is to..

ContinuouslyImprove

Page 162: DevOps Roadtrip Minneapolis

Tim

e To

Rep

air

(TTR

)

Continuous Improvement Efforts

Reactive(chaotic)

Tactical(obvious)

Integrated(complicated)

Strategic(complex)

✓ No automation

✓ No operational stack awareness

✓ Poor collaboration between teams (Dev & Ops)

✓ Documentation not available

✓ No standardized communication

✓ High focus on consistent continuous learning

✓ Uses a NOC

✓ Some monitoring & alerting instrumentation

✓ Collaboration in crisis

✓ "Mission critical" processes are available

✓ Understood crisis communication protocols

✓ Remediation data available to IT Operations

✓ Team rotations, paging policies, role hunting

✓ Continuous improvement of key health indicators

✓ Technical collaboration across all incidents

✓ Docs up to date and easily accessible

✓ Consistent real-time communication practices

✓ Automated docs and remediation✓ Actionable Alerts with full context✓ High collaboration among all

teams✓ Documentation part of

remediation✓ Targeted, proactive crisis comms✓ High focus on continuous learning

Incident Management Maturity

Page 163: DevOps Roadtrip Minneapolis

Reactive(chaotic)

✓No automation

✓No operational stack awareness

✓Poor collaboration between teams (Dev & Ops)

✓Documentation not available

✓No standardized communication

✓High focus on consistent continuous learning

Page 164: DevOps Roadtrip Minneapolis

Tactical(obvious)

✓Uses a NOC

✓Some monitoring & alerting instrumentation

✓Collaboration in crisis

✓"Mission critical" processes are available

✓Understood crisis communication protocols

✓Remediation data available to IT Operations

Page 165: DevOps Roadtrip Minneapolis

Integrated(complicated)

✓Team rotations, paging policies, role hunting

✓Continuous improvement of key health indicators

✓Technical collaboration across all incidents

✓Docs up to date and easily accessible

✓Consistent real-time communication practices

Page 166: DevOps Roadtrip Minneapolis

Strategic(complex)

✓Automated docs and remediation

✓Actionable Alerts with full context

✓High collaboration among all teams

✓Documentation part of remediation

✓Targeted, proactive crisis comms

✓High focus on continuous learning

Page 167: DevOps Roadtrip Minneapolis

Automation

Awareness

Collaboration

Documentation User Empathy

Learning

Page 168: DevOps Roadtrip Minneapolis

Learning

Page 169: DevOps Roadtrip Minneapolis

Failure not seen as opportunity to learn

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 170: DevOps Roadtrip Minneapolis

Awareness

http://blog.vmware.com

Page 171: DevOps Roadtrip Minneapolis

© 2015 Forrester Research, Inc. Reproduction Prohibited 23

Single Source Of Truth Lacking In Many Orgs – 95% only most of the time or less

Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report

Page 172: DevOps Roadtrip Minneapolis

Collaboration

http://neolivemarketing.com/wp-content/uploads/2015/09/Collaboration.jpg

Page 173: DevOps Roadtrip Minneapolis

Teams siloed throughout life cycle

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 174: DevOps Roadtrip Minneapolis

User Empathy

https://open.buffer.com/wp-content/uploads/2015/12/empathy3.jpg

Page 175: DevOps Roadtrip Minneapolis

© 2015 Forrester Research, Inc. Reproduction Prohibited 27

IT teams aren’t measured on customer experience goals.

Page 176: DevOps Roadtrip Minneapolis

Automation

http://thelifedesignproject.com/wp-content/uploads/2009/09/373881476_217d24ef6d.jpg

Page 177: DevOps Roadtrip Minneapolis
Page 178: DevOps Roadtrip Minneapolis

Delays in notifications Leads To Customers Finding the Problem First

Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report

Page 179: DevOps Roadtrip Minneapolis

Documentation

http://blog.vmware.com

Page 180: DevOps Roadtrip Minneapolis

Reduce MTTRState of DevOps Report (2015) – by Puppet Labs

Page 181: DevOps Roadtrip Minneapolis

How do youScore?

Page 182: DevOps Roadtrip Minneapolis

Tim

e To

Rep

air (

TTR

)

Continuous Improvement Efforts

Reactive (0 – 4)(chaotic)

Tactical (5 – 9)(obvious)

Integrated (10 -14)(complicated)

Strategic (15 – 18)(complex)

✓ No automation

✓ No operational stack awareness

✓ Poor collaboration between teams (Dev & Ops)

✓ Documentation not available

✓ No standardized communication

✓ High focus on consistent continuous learning

✓ Uses a NOC

✓ Some monitoring & alerting instrumentation

✓ Collaboration in crisis

✓ "Mission critical" processes are available

✓ Understood crisis communication protocols

✓ Remediation data available to IT Operations

✓ Team rotations, paging policies, role hunting

✓ Continuous improvement of key health indicators

✓ Technical collaboration across all incidents

✓ Docs up to date and easily accessible

✓ Consistent real-time communication practices

✓ Automated docs and remediation✓ Actionable Alerts with full context✓ High collaboration among all teams✓ Documentation part of remediation✓ Targeted, proactive crisis comms✓ High focus on continuous learning

Incident ManagementMaturity

Page 183: DevOps Roadtrip Minneapolis

RAFFLE TIME#DevOpsRoadTrip

Page 184: DevOps Roadtrip Minneapolis

DENVER - SEATTLE - SAN FRANCISCO - MINNEAPOLIS - NEW YORK CITY