Top Banner
Telling Tales & Solving Crimes uncovering the practical, business side of New Relic
21

Telling Tales and Solving Crimes with New Relic

Apr 16, 2017

Download

Data & Analytics

James Ford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Telling Tales and Solving Crimes with New Relic

Telling Tales & Solving Crimes

uncovering the practical, business side of New Relic

Page 2: Telling Tales and Solving Crimes with New Relic

“Just like automated deployments and unit tests, New Relic is going

to change how we work.”

Page 3: Telling Tales and Solving Crimes with New Relic

Reacting fastApplication Performance Monitoring

enables us to deal with issues quickly and definitively

Page 4: Telling Tales and Solving Crimes with New Relic

Case Study 1Javascript Errors on the Live site

Page 5: Telling Tales and Solving Crimes with New Relic

Case Study 1 - ‘jQuery’ is undefined

That’s bad. That’s very, very bad.jQuery is key to this application. It needs to work.

Page 6: Telling Tales and Solving Crimes with New Relic

We tested this - how did that happen?

Page 7: Telling Tales and Solving Crimes with New Relic

Scenario:

● Nobody has reported an issue (yet).

● We didn’t pick up on any issues in testing.

● Critical issue - the website won’t work without it.

Recent changes:

● Loading jQuery from a Content Delivery Network.

● Feature-detect based embedding of jQuery.

Page 8: Telling Tales and Solving Crimes with New Relic

Data inspection time...

It happens predominantly in Internet Explorer, but the browser version is not to blame, this time.

Page 9: Telling Tales and Solving Crimes with New Relic

[manual testing]

Repeating our test process to ensure that everything we’ve tested for is still working as expected.

Page 10: Telling Tales and Solving Crimes with New Relic

When all else fails, Google it.

Page 11: Telling Tales and Solving Crimes with New Relic

● Some Corporate (or Educational Institution) networks will be ‘protected’ by disallowing external resources from Content Delivery Networks.

● This will result in 404 Errors when requesting files, which would explain the errors we see in New Relic.

● Our solution: put a fallback version of the file locally to continue supporting these customers.

Result!

Page 12: Telling Tales and Solving Crimes with New Relic

Most importantly, we’ve identified issues that real users are experiencing, debugged and resolved them without the customer or

the client having to report any issues or provide any details.

Which also means we fixed the issue in a fraction of the time!

Page 13: Telling Tales and Solving Crimes with New Relic

Case Study 2Server Crash

Page 14: Telling Tales and Solving Crimes with New Relic

End Users

Technical Contact

Product Owner

DevOps

Development Team

Customer Support

Channels of communication

Page 15: Telling Tales and Solving Crimes with New Relic

End Users

Technical Contact

Product Owner

DevOps

Development Team

Customer Support

At the time it all kicks off… (11am)

PO & Technical AWOL(everyone’s allowed a lunch break)

Page 16: Telling Tales and Solving Crimes with New Relic

End Users

Technical Contact

Product Owner

DevOps

Development Team

Customer Support

11:00am

!

Warning: High load on server

DevOps team gets advance warning of high load on server

and begin investigating.

Page 17: Telling Tales and Solving Crimes with New Relic

End Users

Technical Contact

Product Owner

DevOps

Development Team

Customer Support

11:31am

!

!

!

Alert: Server unavailable

!

!

Alert: Downtime

Server Crash! Email alerts for everyone!

Customer Support team knows of the downtime instantly.

500 Error

Page 18: Telling Tales and Solving Crimes with New Relic

Fortunately, DevOps have been aware of the issue for 30 minutes already and attempting to fix it.

When the server finally dies, they talk to the the Development Team about the possibility of simply rebooting the server.

Solution & implications agreed: Server is rebooted.

Page 19: Telling Tales and Solving Crimes with New Relic

End Users

Technical Contact

Product Owner

DevOps

Development Team

Customer Support

12:00am

Rebooted server comes online and restores service.

Page 20: Telling Tales and Solving Crimes with New Relic

Server Crash: Fallout

● First instance of downtime since we started using New Relic.

● Server ‘officially’ down for 29 minutes. (Customer Support were only aware of downtime for final 11 minutes)

● Enhanced visibility of server health meant remediation steps were underway before the downtime started.

● Downtime issue was resolved in the same time it took for meetings about the downtime to be arranged.

● Once normal operation is resumed, we can use New Relic data & server logs to perform a ‘post mortem’ on the incident.

Page 21: Telling Tales and Solving Crimes with New Relic

“Real user data is much, much better than artificially-created

lab results”