Top Banner
David Levy AdventuresInSql.com SQL Saturday #67 Chicago
37

What To Do When It All Goes So Wrong

Jul 08, 2015

Download

Technology

David Levy

As IT Professionals we inevitably will see situations where everything goes wrong. At times we are somewhat lucky and this just means diminished functionality or a slow system. Other times our organization is temporarily out of business. Regardless of the scope of the issue, how we react can have a direct impact on how quickly things are returned to normal. This session will cover how to communicate issues, including what to say, who to say it to and when to say it. Part of managing communication is to get everyone into a room, forcing them to talk, so time will be spent on designing an effective war room. The session will also cover how by setting out to prove that an issue is ours we are able to more quickly get at a root cause.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What To Do When It All Goes So Wrong

David Levy

AdventuresInSql.com

SQL Saturday #67 Chicago

Page 2: What To Do When It All Goes So Wrong

More than 11 years in IT

SQL Server DBA for over 3 years

Previous Life as Developer

Blogger◦ http://adventuresinsql.com

◦ Syndicated on SQLServerCentral.com

◦ Syndicated on SQLServerPedia.com

@dave_levy on Twitter

Page 3: What To Do When It All Goes So Wrong

Peak Time of Peak Sales Day

Typical Hourly Sales $100K/HR

Order Entry Screen is Locked Up

Users report Slowness Initially

Now the “Sales Center” Application is Just “Clocking”

Page 4: What To Do When It All Goes So Wrong

Let Everyone Know There is a Problem◦ Prevent Duplicated Efforts

◦ Allows Others to Speak Up

Recent Changes

Related Issues

http://www.freedigitalphotos.net/images/view_photog.php?photogid=1983

Page 5: What To Do When It All Goes So Wrong

Send Up a Flare◦ Send to an IT Only Distribution Group

◦ Keep the Subject Line General

◦ Provide Broad Overview Including:

Systems Impacted

Major Symptoms Including Error Messages

Number of People Impacted

Any Location Specific Information

Page 6: What To Do When It All Goes So Wrong

What Resources Do You need?◦ Subject Matter Experts

◦ Specialized Equipment

Page 7: What To Do When It All Goes So Wrong

Never Assign Blame

Only State Facts

Page 8: What To Do When It All Goes So Wrong

To: IT Emergencies

Subject: Sales Center Issues

Sales Center Users are reporting that the Order Entry screen has quit responding. We are currently investigating the issue with the Sales Center Development Team. We will provide updates as we know more.

Page 9: What To Do When It All Goes So Wrong

Collect

Process

Respond

Page 10: What To Do When It All Goes So Wrong

What Are the Symptoms?

What Locations are Involved?

Page 11: What To Do When It All Goes So Wrong

What Systems are Involved?◦ SQL Server

◦ AS400

◦ Mainframe

◦ Web Farm

◦ Major Network Components like Load Balancers

Page 12: What To Do When It All Goes So Wrong

What Has Changed?◦ Look at Change Control Calendar

◦ Talk to Primary On-Calls for Related Systems

Page 13: What To Do When It All Goes So Wrong

Anything in the Logs?◦ Windows Logs

◦ Application Specific Logs

◦ Custom Exception Handling Systems

Page 14: What To Do When It All Goes So Wrong

What are Performance Indicators Showing?◦ Perfmon

◦ SQL Wait Stats

◦ Third-party tools

Page 15: What To Do When It All Goes So Wrong

Analyze Collected Information◦ Are There Any Obvious Signs of Trouble?

◦ Can the Problem be Linked to a Change?

◦ Can Any Patterns be Identified?

Page 16: What To Do When It All Goes So Wrong

Prove It Is Your Issue◦ Shows Humility

◦ Shows Respect for Everyone Else’s Time

◦ Avoid Appearing Arrogant

Page 17: What To Do When It All Goes So Wrong

Prove It Is Your Issue◦ Construct Tests to Prove Theories in Order of

Likelihood Until Problem Proven or Theories Exhausted

Faster than arguing about what it is not

How can you know it is not your issue?

Page 18: What To Do When It All Goes So Wrong

List Potential Actions◦ Rank by effort, confidence, level of risk

◦ Develop action plans for best options and re-rank

◦ Each potential action should have a rollback plan

Page 19: What To Do When It All Goes So Wrong

Define Measures◦ What will indicate things have gotten better?

Adding this index will reduce Disk IO by 10 million reads per second

The execution time of query x will drop from 6 minutes to 50 milliseconds

Page 20: What To Do When It All Goes So Wrong

Define Measures◦ What will indicate things have gotten worse?

Disk IO may go up

The execution time of query x may go up

Adding this index may slow inserts from the order upload process

Page 21: What To Do When It All Goes So Wrong

Communicate Your Intentions

Make the Change◦ Follow a written plan

◦ Make a single change

◦ A single person should make the change

◦ Document any additional steps taken

Start Over by Collecting More Data

Page 22: What To Do When It All Goes So Wrong

Signs You Need to Convene A War Room◦ Having Trouble Finding Anything Wrong

◦ 30 Minutes Without Progress

◦ An Issue Appears to Span Multiple Systems

◦ Having Difficulty Getting People Engaged

Page 23: What To Do When It All Goes So Wrong

Get Everyone in a Room

No Changes Made Outside the Room

No Heroes◦ Watch out for people doing a lot of typing

◦ Avoid changes that take more than a few minutes

Have a Call in Number for Remote Coworkers

Page 24: What To Do When It All Goes So Wrong

Have a Technology Kit◦ Old Switch

◦ Patch Cords

◦ Mice + Mouse Pads

◦ Power Strips

Page 25: What To Do When It All Goes So Wrong

Monitor Your Guest List◦ 1-2 Representatives From Each Team

◦ Try to Keep Management Out

◦ Watch for Disruptive People

Page 26: What To Do When It All Goes So Wrong

To: IT Emergencies

Subject: Sales Center Issues

We are convening a war room for the Sales Center issue. Everyone working on the issue please meet in the North Conference Room. Remote/WFH coworkers should dial into the conference bridge 888-888-1234, participant code:1234.

Page 27: What To Do When It All Goes So Wrong

Collect

Process

Respond

Page 28: What To Do When It All Goes So Wrong

White Board the Issue◦ Every System Gets Own Column

◦ Write All Facts on White Board

◦ Closed Items Get Crossed Out Not Erased

◦ Include a Resolution for Each Closed Item

Page 29: What To Do When It All Goes So Wrong

Share the Floor◦ Likely Issue Owner Has the Lead

◦ Make Sure Everyone is Heard

◦ Contributing Often Involves Staying Out of the Way

◦ Don’t Be Afraid to Fade Back and Run The Whiteboard

Page 30: What To Do When It All Goes So Wrong

Never Call “Not-It” and Leave◦ Not Helpful

◦ You May be Wrong

◦ Appears Arrogant

Page 31: What To Do When It All Goes So Wrong

Keep an Eye On Time◦ Provide Regular Updates to Management

◦ Bring in Food Around Meal Times

Raises Spirits

Brings in More People to Help

Page 32: What To Do When It All Goes So Wrong

To: IT Emergencies

Subject: Sales Center Issues Update

The Sales Center war room is still going. We are currently looking into a driver issue with IBM. All necessary resources have been engaged.

Page 33: What To Do When It All Goes So Wrong

Keep People in Reserve◦ Each Team Should Divide up the Day

◦ Rotate People In and Out

◦ Send Someone Home Early to Come in Early

Page 34: What To Do When It All Goes So Wrong

Closing Out◦ Communicate Resolution

◦ Capture Contents of Whiteboard

◦ Clean Up Room

Page 35: What To Do When It All Goes So Wrong

To: IT Emergencies

Subject: Sales Center Issues Resolved

The Sales Center issue has been resolved. The issue was caused by a patch that was applied over the weekend. Now that it has been backed out everything has returned to normal.

Page 36: What To Do When It All Goes So Wrong

?

Page 37: What To Do When It All Goes So Wrong