Top Banner
KEVINA FINN-BRAUN SALESFORCE J. PAUL REED RELEASE ENGINEERING APPROACHES DEVOPS ENTERPRISE SUMMIT, 2015 THE BLAMELESS CLOUD: BRINGING ACTIONABLE RETROSPECTIVES TO SALESFORCE
37

DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Apr 15, 2017

Download

Technology

Gene Kim
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

K E V I N A F I N N - B R A U N S A L E S F O R C E

J . PA U L R E E D R E L E A S E E N G I N E E R I N G A P P R O A C H E S

D E V O P S E N T E R P R I S E S U M M I T, 2 0 1 5

T H E B L A M E L E S S C L O U D : B R I N G I N G A C T I O N A B L E R E T R O S P E C T I V E S T O S A L E S F O R C E

Page 2: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

K E V I N A F I N N - B R A U N

• Director of Site Reliability Service Management at Salesforce

• Business Continuity at Yahoo

• Geeks out on Group Dynamics and Behavior

• @kfinnbraun on

• Prepping for the zombie apocalypse

@kfinnbraun @jpaulreed#DOES15

Page 3: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

J . PA U L R E E D

• @jpaulreed on

• Host of The Ship Show, @shipshowpodcast on

• Principal Consultant, Release Engineering Approaches

• Spend my days talking to organizations about “The DevOps™”

@kfinnbraun @jpaulreed#DOES15

Page 4: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

“ S I T E R E L I A B I L I T Y ” AT S A L E S F O R C E

• Primary operational team supporting availability

• Acceptance and validation activities

• Develop and implement operational improvements for SFDC

• “Game days”@kfinnbraun @jpaulreed#DOES15

Page 5: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

S E R V I C E R E L I A B I L I T Y H U R D L E S AT S F D C

• Inconsistent application of process, leading to inconsistent information collection

• Incident handling/remediation crossing silo boundaries

• Confusion over service ownership, due to restructured responsibilities

• Disjointed, “heavyweight” meetings

• Postmortems centered around “The Old View” of human error

@kfinnbraun @jpaulreed#DOES15

Page 6: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

L A N G U A G E O F T H E “ O L D V I E W ”

• “5 whys”

• “Root cause” analysis

• “Why didn’t you[r team]…”

• “You[r team] should have…”

• “Best practices”

@kfinnbraun @jpaulreed#DOES15

Page 7: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

@kfinnbraun @jpaulreed#DOES15

Page 8: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

T H E T I M E L I N E

• October 2014: First Meeting

• January 2015: “Blow up” HA Forum

• April 2015: Status Check, including assessment shared with senior leaders

• May 2015: Service ownership roles shift

@kfinnbraun @jpaulreed#DOES15

Page 9: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

T H E T I M E L I N E

• October 2014: First Meeting

• January 2015: “Blow up” HA Forum

• April 2015: Status Check, including assessment shared with senior leaders

• May 2015: Service ownership roles shift

• July 2015: Initial Workshop on “The New View”

• August 2015: Identified first group for coaching

• August 2015 — today: Continued focus and deep-dive on WSRR

• August 2015 — today: Weekly sessions with the initial group

@kfinnbraun @jpaulreed#DOES15

Page 10: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Incident, Event, Bug

Initial Analysis

RCKnown?

Root Cause Analysis Workflow Goal: Root cause identified five business days from incident resolution.

Facilitator opens investigations and schedules post mortem

meeting

Request RCA/Failure Analysis N

RC Identified?

Identify corrective actions and

implementation plans; Assign

actions to scrum teams

Y RCM Needed?

RCM Process

Unable to ascertain root cause; update record with “KE

Status”

Engage scrum teams as required.

HA Forum

Y

N

Corrective Actions

complete?

Weekly meetings to follow up with scrum master on

progress

Review @HA?

Y

Y

Additional work items from HA are

assigned.

Update record and set status to

“resolved”Y

NEND

END

HA? Incident Guidelines..Severity 0,1: YESSeverity 2 : Maybe (instance & incident length?)Functional Regression: MaybeIncorrect/Incomplete Release: YESDeployment Delayed or Rolled Back: Maybe

Impact to Customer/Production or ability to release?

Tier 3 support communicate RCM to customer(s)

N

R O O T C A U S E A N A LY S I S W O R K F L O W

• Designed & implemented two years ago

• Anchored the process around the weekly “HA Forum”

• Intended to apply to all incidents…

• In practice, focused on high profile incidents

@kfinnbraun @jpaulreed#DOES15

Page 11: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Incident, Event, Bug

Initial Analysis

RCKnown?

Root Cause Analysis Workflow Goal: Root cause identified five business days from incident resolution.

Facilitator opens investigations and schedules post mortem

meeting

Request RCA/Failure Analysis N

RC Identified?

Identify corrective actions and

implementation plans; Assign

actions to scrum teams

Y RCM Needed?

RCM Process

Unable to ascertain root cause; update record with “KE

Status”

Engage scrum teams as required.

HA Forum

Y

N

Corrective Actions

complete?

Weekly meetings to follow up with scrum master on

progress

Review @HA?

Y

Y

Additional work items from HA are

assigned.

Update record and set status to

“resolved”Y

NEND

END

HA? Incident Guidelines..Severity 0,1: YESSeverity 2 : Maybe (instance & incident length?)Functional Regression: MaybeIncorrect/Incomplete Release: YESDeployment Delayed or Rolled Back: Maybe

Impact to Customer/Production or ability to release?

Tier 3 support communicate RCM to customer(s)

N

@kfinnbraun @jpaulreed#DOES15

Page 12: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Incident, Event, Bug

Initial Analysis

RCKnown?

Root Cause Analysis Workflow Goal: Root cause identified five business days from incident resolution.

Facilitator opens investigations and schedules post mortem

meeting

Request RCA/Failure Analysis N

RC Identified?

Identify corrective actions and

implementation plans; Assign

actions to scrum teams

Y RCM Needed?

RCM Process

Unable to ascertain root cause; update record with “KE

Status”

Engage scrum teams as required.

HA Forum

Y

N

Corrective Actions

complete?

Weekly meetings to follow up with scrum master on

progress

Review @HA?

Y

Y

Additional work items from HA are

assigned.

Update record and set status to

“resolved”Y

NEND

END

HA? Incident Guidelines..Severity 0,1: YESSeverity 2 : Maybe (instance & incident length?)Functional Regression: MaybeIncorrect/Incomplete Release: YESDeployment Delayed or Rolled Back: Maybe

Impact to Customer/Production or ability to release?

Tier 3 support communicate RCM to customer(s)

N

R O O T C A U S E A N A LY S I S W O R K F L O W I N R E A L I T Y• Silo transition boundaries evident

in the workflow

• Some had little/no contact, via the process, with other teams required to perform their job

• Sampling of incident reports uncovered consistent inconsistencies

• The “Bermuda Blob”@kfinnbraun @jpaulreed#DOES15

Page 13: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

G E T T I N G A F E E L F O R T H E W E AT H E R

@kfinnbraun @jpaulreed#DOES15

Page 14: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

@kfinnbraun @jpaulreed#DOES15

Page 15: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

H E A D F I R S T I N T O T H E S T O R M

@kfinnbraun @jpaulreed#DOES15

Page 16: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

L A N G U A G E : M AT T E R S

• “HA Forum” ➡ “WSRR”

• “WAR” (What is it good for?)

• Postmortem versus Retrospective

• Problem Team versus Solution Team

• Root Cause versus Proximate Cause

@kfinnbraun @jpaulreed#DOES15

Page 17: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

B E H AV I O R : M AT T E R S

• Intra-team behavior

• Inter-team behavior

• This is not “#NAFB”

• “People in complex systems create safety. … The occasional human contribution to failure occurs because complex systems need an overwhelming human contribution for safety.” — Sydney Dekker

@kfinnbraun @jpaulreed#DOES15

Page 18: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

S T R U C T U R E : M AT T E R S

@kfinnbraun @jpaulreed#DOES15

Page 19: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

S T R U C T U R E : M AT T E R S

@kfinnbraun @jpaulreed#DOES15

Page 20: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

“ B L A M E L E S S ” “ P O S T M O R T E M S ” ?

• Brené Brown, research sociologist, on vulnerability

• “Blame is a way to discharge pain and discomfort”

• Postmortem has a heavy connotation

• “Awesome postmortems?” Really?!

@kfinnbraun @jpaulreed#DOES15

Page 21: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

@kfinnbraun - #DOES15 - @jpaulreed

Page 22: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

“Incidents are bad; my job is on the line”

“I’m getting sent to the principal’s office because

of this outage”

Completes the

post-incident “paperwork”

No formal retrospective/ hallway retrospectives @kfinnbraun - #DOES15 - @jpaulreed

Page 23: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

“Incidents are bad; my job is on the line”

“I’m getting sent to the principal’s office because

of this outage”

“Let’s fix this as fast as possible”

“What’s the correct fix to avoid this specific issue

in the future?”

Completes the

post-incident “paperwork”

No formal retrospective/ hallway retrospectives

Some information

(inconsistently) recorded

Jump to a focus on why

@kfinnbraun - #DOES15 - @jpaulreed

Page 24: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

“Incidents are bad; my job is on the line”

“I’m getting sent to the principal’s office because

of this outage”

“Let’s fix this as fast as possible”

“What’s the correct fix to avoid this specific issue

in the future?”

“Let’s review the timeline/incident

report to answer that”

“We need to find the root cause of this incident”

Completes the

post-incident “paperwork”

No formal retrospective/ hallway retrospectives

Some information

(inconsistently) recorded

Jump to a focus on why

Follows the prescribed format for retrospectives

Have and incorporate complete dataset for the incident

into the retrospective

@kfinnbraun - #DOES15 - @jpaulreed

Page 25: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

“Incidents are bad; my job is on the line”

“I’m getting sent to the principal’s office because

of this outage”

“Let’s fix this as fast as possible”

“What’s the correct fix to avoid this specific issue

in the future?”

“Let’s review the timeline/incident

report to answer that”

“We need to find the root cause of this incident”

“Now that we’ve established what happened,

how did it happen?”

“How did these multiple factors

influence our complex system?

Completes the

post-incident “paperwork”

No formal retrospective/ hallway retrospectives

Some information

(inconsistently) recorded

Jump to a focus on why

Follows the prescribed format for retrospectives

Have and incorporate complete dataset for the incident

into the retrospective

Identifies inherent bias

in self and others

Perspectives solicited from all involved team members/functional groups

@kfinnbraun - #DOES15 - @jpaulreed

Page 26: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Lang

uage

Beha

viors

Novice Competent Proficient ExpertBeginner

“Incidents are bad; my job is on the line”

“I’m getting sent to the principal’s office because

of this outage”

“Let’s fix this as fast as possible”

“What’s the correct fix to avoid this specific issue

in the future?”

“Let’s review the timeline/incident

report to answer that”

“We need to find the root cause of this incident”

“Now that we’ve established what happened,

how did it happen?”

“How did these multiple factors

influence our complex system?

“How does our team/system contribute to our successes?”

“What can we incorporate from this incident to

better respond next time?”

Completes the

post-incident “paperwork”

No formal retrospective/ hallway retrospectives

Some information

(inconsistently) recorded

Jump to a focus on why

Follows the prescribed format for retrospectives

Have and incorporate complete dataset for the incident

into the retrospective

Identifies inherent bias

in self and others

Perspectives solicited from all involved team members/functional groups

Able to facilitate retrospectives by healthily helping others address

tendency to blame/ personal & systemic bias

Retrospective outcomes are fed back into the system and prioritized

@kfinnbraun - #DOES15 - @jpaulreed

Page 27: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

R E T R O S P E C T I V E S FA C I L I TAT E T H E S E R V I C E ( A N D D E V E L O P M E N T ! )

I M P R O V E M E N T P R O C E S S

@kfinnbraun @jpaulreed#DOES15

Page 28: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

B E I N G “ T O O B U S Y ” T O L E A R N O R I M P R O V E M E A N S Y O U A R E I N

A D O W N W A R D S P I R A L , B Y D E F I N I T I O N

@kfinnbraun @jpaulreed#DOES15

Page 29: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

I T ’ S N O T A B O U T T H E O U T C O M E . I T ’ S A B O U T T H E R E S P O N S E .

@kfinnbraun @jpaulreed#DOES15

Page 30: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

W H Y + H O W I S M O R E I M P O R TA N T T H A N

W H AT

@kfinnbraun @jpaulreed#DOES15

Page 31: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Y O U A R E N E V E R D O N E .

@kfinnbraun @jpaulreed#DOES15

Page 32: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

Y O U . A R E . N E V E R . D O N E .

@kfinnbraun @jpaulreed#DOES15

Page 33: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

O U R F O R E C A S T F O R T H E F U T U R E

• Evolving the concept of Service Ownership

• Salesforce-specific Retrospective Guides

• Global “live-site” coaching

• Refocus on getting the business what it wants

@kfinnbraun @jpaulreed#DOES15

Page 34: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

AV E N U E S F O R C O L L A B O R AT I O N

• How does the described Dreyfus model apply in other organizations?

• Would love to hear stories from other enterprises about their retrospective process, who does them, and where they live within the organization

@kfinnbraun @jpaulreed#DOES15

Page 36: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

P H O T O C R E D I T S

• Slide 1: https://en.wikipedia.org/wiki/File:Golden_Fog,_San_Francisco.jpg

• Slide 4: Courtesy Kevina Finn-Braun/Salesforce

• Slide 6: https://www.flickr.com/photos/hannaneh/6464986121

• Slide 7: https://www.youtube.com/watch?v=_DEToXsgrPc#t=1h5m50s

• Slide 13: http://kathmajp.weebly.com/all-movie-reviews/movie-review-twister

• Slide 14: http://thevane.gawker.com/heres-everything-they-got-wrong-and-right-in-the-movi-1609968202

• Slide 15: https://www.flickr.com/photos/ravedelay/17761863929

@kfinnbraun @jpaulreed#DOES15

Page 37: DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce

P H O T O C R E D I T S

• Slide 16: Screenshot of aviationweather.gov

• Slide 17: https://www.flickr.com/photos/ravedelay/17534032771/

• Slide 18: https://www.youtube.com/watch?v=8veT5QspylE#t=15m30s

• Slide 19: https://www.flickr.com/photos/jkirkhart35/4984385396

• Slide 20: https://www.youtube.com/watch?v=iCvmsMzlF7o

• Slide 33: https://commons.wikimedia.org/wiki/File:Rainbow_background.jpg

• Slide 35: https://en.wikipedia.org/wiki/File:Clouds_spilling_over_San_Francisco.jpg

@kfinnbraun @jpaulreed#DOES15