Transcript

Critical Issue Escalation: Our Process

Evan HamiltonHead of Community, UserVoice

OMG EVERYTHING IS BROKEN

Why do we need a process here?

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

• We don’t work 24/7

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

• We don’t work 24/7

• We don’t want to wake everyone up every time there is an issue

So what is a critical issue?

So what is a critical issue?Work hours:• Interrupting core functionality (Ex: settings not saving consistently)

• OR losing/corrupting data

• OR serious consequences (Ex: loss of a major account)

• AND can be reproduced (or has been reported by enough people that it must be happening)

So what is a critical issue?Work hours:• Interrupting core functionality (Ex: settings not saving consistently)

• OR losing/corrupting data

• OR serious consequences (Ex: loss of a major account)

• AND can be reproduced (or has been reported by enough people that it must be happening)

Off hours:• Blocking core functionality (Ex: can’t access feature)

• AND affecting multiple people

• OR losing/corrupting data

• AND can be reproduced (or has been reported by enough people that it must be happening)

Step 0: Spot the Issue

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Social MediaCommunity team monitors all day, and at least twice each evening. If any of the team is unavailable for an extended period of time, they’ll potentially deputize someone from

Support.

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Social MediaCommunity team monitors all day, and at least twice each evening. If any of the team is unavailable for an extended period of time, they’ll potentially deputize someone from

Support.

The Rest of the TeamThey may not be on the Customer Team, but if they see a

critical issue, it’s their responsibility to report it.

Step 1: Create a Trello bug.

Issue history FTW. Ad-hoc communication FTL.

Step 2: Contact a Developer

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

• Off hours? Call them, don’t text or chat or email.

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

• Off hours? Call them, don’t text or chat or email.

• If they don’t respond to 2 pings within 10 minutes, move down the list.

Dev Escalation List• System dev: Kevin• App devs: Jonathan, Mark, Austin, Joey, Raimo, Rich• Interface devs: Joshua, John, Brad, Rich

For System Issues (site is down/slow, emails don’t work): contact system + app dev

For Interface Issues (the interface looks broken, won’t work, etc): contact interface + app dev

For all other issues: contact app dev

Devs: did you get the call? Then:

Devs: did you get the call? Then:

• Respond affirmatively to the person who contacted you

Devs: did you get the call? Then:

• Respond affirmatively to the person who contacted you

• Join the Engineering room on HipChat and let others know someone is working on it

NO additional customer team members should be communicating with the dev solving the problem – only the one who first reported it.

More voices confuse and distract.

Step 3:Inform the Customer

Team

Email the whole customer team so they know about the issue (and that you’re working with the devs)

Email the whole customer team so they know about the issue (and that you’re working with the devs)

Is it work hours? Also @all everyone in the Support room on HipChat

Step 4:Is it super-critical?

Ask Developer (before they fix the

bug):

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

• (If they’re too busy fixing it, consider calling in a second dev)

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

• (If they’re too busy fixing it, consider calling in a second dev)

• Customer team: it’s your job to ensure this happens

How do I know if it’s Super-Critical?

• Does this affect more than 20% of accounts?

• Is this a very frustrating or visible bug (vs just an annoyance)?

*This is somewhat arbitrary.

If Super-Critical:

• Call the Head of Support & Head of Community

• Community Department should tweet about the issue (make sure to reschedule any other tweets - “check out our blog” would be an unfortunate tweet during an outage)

• Leave a maximum of 30m between any public messages about critical bugs and 15m between public messages about downtime

• DO NOT suggest a timeframe (it may change)

• DO NOT talk about the cause (you may be wrong)

• Going to require a long fix? Publish a blog post & Facebook status too

Step 5: Respond to Issues

Who answers what?

Work hours?Community handles social media, Support handles tickets.

*This is somewhat arbitrary.

Who answers what?

Work hours?Community handles social media, Support handles tickets.

Off hours?Support handles both (but call in backup if needed).

*This is somewhat arbitrary.

Who answers what?

Work hours?Community handles social media, Support handles tickets.

Off hours?Support handles both (but call in backup if needed).

-Regardless, make sure you’re in the Support room in HipChat so you can be communicating with the team-

*This is somewhat arbitrary.

Step 6: Solve and Verify

• Dev should fix the issue (duh).

• Dev should fix the issue (duh).

• Dev should verify that the fix will stick (may require calling in a second dev)

• Dev should fix the issue (duh).

• Dev should verify that the fix will stick (may require calling in a second dev)

• Customer Team member should also verify that issues are resolved

Step 5: Report Damage and

Close the Loop(the 7 questions)

Dev should answer these questions for the Customer

Team member:1. What did our customers experience? (Please be explicit: don’t just say what was broken, explain the experience our customers would have had when trying to accomplish this task.)

2. How many/which customers were affected?

2. When did this issue start? When was it resolved?

3. What caused it?

3. What are we doing to avoid it in the future?

4. What are the chances that there will be related issues in the short-term future?

• What was the damage (data, accounts, etc)?

The Loop-Closing:

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

3.If mass communication occurred, publish announcement of the fix to those channels

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

3.If mass communication occurred, publish announcement of the fix to those channels

*And give the one who reported it a discount for their next month of billing!

Post-issue Communication: should we blog about it?

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

• (If it only affected a small # of accounts, email them)

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

• (If it only affected a small # of accounts, email them)

• If extremely severe, consider reimbursements as well.

Hooray, we’ve saved the day!

Evan Hamilton@evanhamiltonevan@uservoice.comMore content at http//:community.uservoice.com

top related