Top Banner
Diagnosing Internet Outages Young Xu, Product Marketing Analyst
20

Diagnosing Internet Outages

Apr 16, 2017

Download

Technology

ThousandEyes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Diagnosing Internet Outages

Diagnosing Internet Outages Young Xu, Product Marketing Analyst

Page 2: Diagnosing Internet Outages

2

About ThousandEyes ThousandEyes delivers visibility into every network your organization relies on.

Founded by network experts; strong

investor backing

Relied on for "critical operations by leading enterprises

Recognized as "an innovative "

new approach

27 Fortune 500

5 top 5 SaaS Companies 4 top 6 US Banks

Page 3: Diagnosing Internet Outages

3

I see an outage. Is this affecting! just this one test, just me, or

everybody?!

Page 4: Diagnosing Internet Outages

4

•  Detect outages in ISPs and understand their impact both globally and as it relates to your organization

Overview: Internet Outage Detection

•  See the global and account scope, as well as likely root cause of BGP reachability outages

Traffic Outage Detection Routing Outage Detection

Page 5: Diagnosing Internet Outages

5

1.  Anonymized traffic data is aggregated from all tests across the entire user base 2.  Algorithms then look for patterns in path traces terminating in the same ISP

How Traffic Outage Detection Works

New York Cloud Agent

Boston Enterprise Agent

Los Angeles Cloud Agent

Level 3 in San Jose

Cogent in Denver

Salesforce

Google

NY Times

Customer 2

Customer 1

Page 6: Diagnosing Internet Outages

6

Traffic Outage Detection Account scope

Global scope

Severity and scope of the issue at this interface

Page 7: Diagnosing Internet Outages

7

Routing Outage Detection Aggregates reachability issues in routing data from 300+ public monitors

Global scope

Account scope

Root cause analysis

Page 8: Diagnosing Internet Outages

8

•  April 23: Hurricane Electric route leak affecting AWS •  May 3: Trans-Atlantic issues in Level 3

–  https://blog.thousandeyes.com/trans-atlantic-issues-level-3-network/ •  May 20: Tata and TISparkle issues with submarine cable

–  https://blog.thousandeyes.com/smw-4-cable-fault-ripple-effects-across-networks/ •  June 6: Hurricane Electric removed >500 prefixes •  June 24: Tata cable cut in Singapore affecting Dropbox •  July 10: Level 3, NTT routing issues affecting JIRA

–  https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ •  July 17: Widespread issues in Telia’s network in Ashburn

–  https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/

Recent Major Outages Detected

Page 9: Diagnosing Internet Outages

9

•  Look for purple indicators and the ‘Outage Detected’ dropdown when investigating issues—these indicate detected outages!

•  Use quick links or select specific nodes/ASes to see how paths have changed over time

•  Correlate data from the web, network and routing layers to analyze root cause

•  See our blogs and Knowledge Base articles for more info: –  Blog on Traffic Outage Detection

–  https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/ –  Blog on Routing Outage Detection

–  https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ –  Knowledge Base:

–  https://support.thousandeyes.com/entries/110214366

Tips for Diagnosing Internet Outages

Page 10: Diagnosing Internet Outages

10

Demo

Page 11: Diagnosing Internet Outages

11

1. Network Layer Issues in Telia in Ashburn

Detected outage coincides with packet loss spikes

Ashburn, VA is “ground zero” for this outage

Page 12: Diagnosing Internet Outages

12

Specific Failure Points in Telia

High severity and wide scope (Outages affecting at least 20 tests for a NA/EU interface are likely to be wide in scope)

Terminal nodes in Telia

Page 13: Diagnosing Internet Outages

13

2. Hurricane Electric Route Flap Affects Telx

Detected outage coincides with spike in AS path changes

Root cause analysis points to Hurricane Electric and Telx

Page 14: Diagnosing Internet Outages

14

Route Flap by Hurricane Electric

Hurricane Electric

Routes flap from using HE to NTT, then back to HE

Page 15: Diagnosing Internet Outages

15

Causing Traffic Issues in Hurricane Electric

Hurricane Electric

Page 16: Diagnosing Internet Outages

16

3. NTT and Level 3 Routing Issues Affect JIRA

JIRA saw 0% availability and 100% packet loss

Most affected interfaces are in Ashburn, VA

Page 17: Diagnosing Internet Outages

17

Traffic Terminating in NTT

Traffic paths originally traversed Level 3 and NTT

Traffic paths then change to traverse only NTT, terminating there

Page 18: Diagnosing Internet Outages

18

JIRA’s /24 Prefix Becomes Unreachable

As the primary upstream ISP, Level 3 is associated with the most affected routes Routes through

upstream ISPs NTT and Level 3 all withdrawn

Page 19: Diagnosing Internet Outages

19

Routers Begin Using Misconfigured /16 Prefix

The backup /16 prefix directs to NTT, not JIRA’s network. This is why the traffic path changed to traverse only NTT, terminating there when JIRA’s IP couldn’t be found in NTT’s network.

Page 20: Diagnosing Internet Outages

See what you’re missing.

Watch the webinar:

https://www.thousandeyes.com/resources/diagnosing-internet-outages-webinar