Top Banner
Jason Hand – DevOps Evangelist Tips & Tricks to Reduce TTR for the Next Incident @jasonhand
19

Tips & Tricks To Reducing TTR

Jul 29, 2015

Download

Software

VictorOps
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tips & Tricks To Reducing TTR

Jason Hand – DevOps Evangelist

Tips & Tricks to Reduce TTR for the Next Incident

@jasonhand

Page 2: Tips & Tricks To Reducing TTR

Time to Resolution (TTR)

•  The total amount of time taken to resolve an incident

•  MTTR – Mean Time To Resolution* – summary over time – measurement used to describe the most

"typical" value in a set of values – the lower the better

*Resolve  =  Repair  =  Recover    

Page 3: Tips & Tricks To Reducing TTR

•  Incident Lifecycle – Alerting – Triage – Investigation – Identification – Resolution – Documentation

Page 4: Tips & Tricks To Reducing TTR
Page 5: Tips & Tricks To Reducing TTR

Alerting “zero  1me”  aler1ng  pla6orm  to  find  people  instantly  can  only  really  effect  average  TTR  by  a  very  small  percentage  

No1fy  on-­‐call  members  

Page 6: Tips & Tricks To Reducing TTR

Victor’s Tips

“Include  useful  content  &  context  in  the  alerts.”    

“Use  custom  no8fica8ons  to  dis8nguish  cri8cal  alerts.”    

Page 7: Tips & Tricks To Reducing TTR

Triage Assign  degrees  of  urgency  to  incidents  

Page 8: Tips & Tricks To Reducing TTR

Victor’s Tips

“Get  the  right  alerts  to  the  right  people  through  rou8ng.”    

“Establish  a  single  source  of  truth  for  all  ac8vi8es  of  an  incident.”    

Page 9: Tips & Tricks To Reducing TTR

Investigation • Log  in  • Check  the  logs  • Analyze  metrics  • Review  wikis  • Discuss  w/  team  

Page 10: Tips & Tricks To Reducing TTR

Victor’s Tips

“Collaborate  &  Share.”    

“Connect  with  the  right  resources  and  team  members.”    

Page 11: Tips & Tricks To Reducing TTR

Identification “Everything  will  be  beKer  if  I  fix  this  one  thing.”  

Page 12: Tips & Tricks To Reducing TTR

Victor’s Tips “Provide  quick  access  to  accurate  metrics  &  runbooks.”    

Page 13: Tips & Tricks To Reducing TTR

Resolution  Self-­‐documen1ng  what  teams  do  to  solve  the  problem  

Bidirec1onal  integra1on  with  your  favorite  chat  client  and  the  VictorOps  1meline  

Team  members  performing  system  ac1ons  to  fix  the  problem(s)      

Page 14: Tips & Tricks To Reducing TTR

Victor’s Tips “Be  vocal  &  share  what  is  taking  place.”    

Page 15: Tips & Tricks To Reducing TTR

Documentation Write  down  and  talk  about  what  we  did  

Runbook  

Page 16: Tips & Tricks To Reducing TTR

Victor’s Tips “Conduct  (blameless)  post-­‐mortems.”    

Page 17: Tips & Tricks To Reducing TTR

Tips & Tricks to Reduce TTR for the Next Incident

Summary

Page 18: Tips & Tricks To Reducing TTR

“Conduct  (blameless)  post-­‐mortems.”    

“Be  vocal  &  share  what  is  taking  place.”    

“Provide  quick  access  to  accurate  metrics  &  runbooks.”    

“Collaborate  &  Share.”    

“Connect  with  the  right  resources  and  team  members.”    

“Get  the  right  alerts  to  the  right  people  through  rou8ng.”    

“Establish  a  single  source  of  truth  for  all  ac8vi8es  of  an  incident.”    

“Include  useful  content  &  context  in  the  alerts.”    

“Use  custom  no8fica8ons  to  dis8nguish  cri8cal  alerts.”    

Page 19: Tips & Tricks To Reducing TTR

Jason Hand – DevOps Evangelist Tips & Tricks to Reduce TTR for the Next Incident

@jasonhand

Thank  You  

[email protected]