Top Banner
CONFIDENTIAL 1 Remote Analysis Report Enabling Continual Service Improvement in Critical Systems Overall Health Web Application Database Middleware Citrix Storage Supporting Application Infrastructure Application Communication Network PREPARATION Month: October 2014 Report: Sample Prepared for: Customer Analyst: Analyst ExtraHop Networks Configuration: EH8000 Firmware: 4.0 ID: XXXXX Aug Sep Oct
23

Atlas Services Remote Analysis Report Sample

Jul 07, 2015

Download

Technology

This sample report shows the type of insights that you receive with the monthly Atlas Services remote analysis report.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Atlas Services Remote Analysis Report Sample

CONFIDENTIAL 1

Remote&Analysis&Report&Enabling&Continual&Service&Improvement&in&Critical&Systems&

&& Overall Health

&

&

Web Application Database

&

Middleware Citrix

&

Storage Supporting Application Infrastructure

&

Application Communication Network

PREPARATION

Month: October 2014 Report: Sample Prepared for:

Customer Analyst:

Analyst ExtraHop Networks

Configuration: EH8000

Firmware: 4.0 ID: XXXXX

Aug& Sep& Oct&

Nicole Pennington
Nicole Pennington
This sample demonstrates the type of in-depth insight that your organization will receive from your monthly Atlas Services Remote Analysis Reports.Annotations are provided in this document that highlight the types of analysis provided.
Page 2: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 2

&&&&&

WEB APPLICATION A review of the web application protocols including HTTP and HTTPS.

FINDINGS:

File&Not&Found&errors&(HTTP&status&code&404)&on&device1&have&significantly&decreased.&(Trend:&Resolbed)& ↑&&

&

&&

Investigate&Internal&Server&errors&(HTTP&status&code&500)&that&occurred&on&the&AAAAA&server&and&were&associated&with&a&single&URI.&Internal&Server&errors&were&not&previously&noted&on&this&server.&(New&finding)&

☀&&&

Investigate&improvements&that&can&be&made&to&the&ZZZZZ&server&that&is&experiencing&a&lengthy&processing&time&on&average.&Processing&time&on&this&server&has&become&less&severe&since&the&previous&analysis&period.&(Trend:&Improvement)&

↗&&

&

Nicole Pennington
Nicole Pennington
Nicole Pennington
Previous finding reviews can give you confidence that performed actions are addressing the issues.
Nicole Pennington
Nicole Pennington
Nicole Pennington
Resolved
Page 3: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 3

CRITICAL CONCERNS: 86.9% of HTTP responses on the AAAAA server were Internal Server errors (HTTP status codes 500). Internal Server errors indicate that HTTP server encountered an unexpected condition that prevented it from fulfilling the request.

Internal Server errors on AAAAA (indicated by the vertical red bars) appeared to correlate with the HTTP transaction rate (indicated by the green line). At peak, 3,859 Internal Server errors occurred on this device in a single hour.

100% of Internal Server errors on AAAAA occurred while attempting to access a single URI resource, xxxx.xxxxxxx/PrePayService.

Nicole Pennington
Nicole Pennington
Nicole Pennington
Trend graphs help determine if errors occur during acute events or if they are part of a chronic problem.
Page 4: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 4

IMPROVEMENT OPPORTUNITIES: Several HTTP servers are experiencing lengthy processing time on average. Notice that the ZZZZZ server accounted for 55,742 responses and experienced an average processing time of over 2 seconds.

Utilizing the ExtraHop Heatmaps feature, we see that a high concentration of transactions on ZZZZZ experienced approximately 5 seconds of processing time. A darker area on the graph below indicates a high concentration of transactions.

Note the large standard deviation tied to processing time for the xxx.xxx.xxx.xx:xxxx/EAI/OA URI. This indicates that the processing times experienced for this URI were very “dispersed” and had a large amount of variation, meaning that much larger processing times were also observed. Using these standard deviation and mean measurements, we can conclude that approximately 1,277 transactions experienced processing times of approximately 12.7 seconds.

Nicole Pennington
Nicole Pennington
Nicole Pennington
Heatmaps give a visual representation of processing times.
Page 5: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 5

&&&&&

DATABASE A review of all parsed database protocol traffic, regardless of the type of database. Protocols include (if licensed): TNS (Oracle), TDS (MS SQL), DB2, Informix, Sybase, PostgreSQL, and MySQL

FINDINGS:

Investigate&database&errors&on&the&BBBBB&server&that&occurred&constantly;&these&errors&were&related&to&failed&logins&for&the&ZZZ_ZZZZZ&database.&(New&finding)& ☀&&

&

CRITICAL CONCERNS: None noted.

IMPROVEMENT OPPORTUNITIES: 1.0% of all database responses were errors.

93.3% of all database errors were concentrated on the BBBBB server. Also note that approximately 200% of all responses from this server resulted in errors, indicating that each response sent from this server resulted in two errors.

Nicole Pennington
Nicole Pennington
Percentage calculations allow for quick determination of the relative impact of findings.
Nicole Pennington
Nicole Pennington
Page 6: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 6

Error rate on this server (indicated below by the red vertical bars) stayed in excess of 700 errors per hour for a majority of the observation period.

100% of database errors from BBBBB were returned to the YYYYYY client.

Additionally, 100% of database errors on BBBBB had one of two messages. The messages of these errors suggest that 100% of errors on BBBBB result from the YYYYYY client attempting to log on to BBBBB and open an ZZZ_ZZZZZ database. 100% of these login and open attempts are failing. Investigate scheduled tasks that may be causing these errors.

Also worth noting are the processing times observed on this database server. While a majority of transactions were non-concerning (75% of all database transactions took, at most, 3 milliseconds of processing time), note that database transactions on BBBBB experienced as much as a minute of processing time.

Nicole Pennington
Nicole Pennington
Plotting transactions against errors provides insight into the behavior of error generation.
Nicole Pennington
Page 7: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 7

The ExtraHop Heatmaps feature reveals that a “concentration” of transactions experienced around 3 seconds (3,000 milliseconds) of processing time. A darker area on the graph below indicates a higher concentration of transactions so while a large volume of transactions experienced less than 400 milliseconds of processing time, it may be worth researching what is causing some of the previously discussed failed logins to experience such lengthy processing times.

Page 8: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 8

&&&&&

MIDDLEWARE A review of all parsed middleware protocol traffic (if licensed): FTP, MQSeries, and Memcache.

FINDINGS:

Investigate&FTP&errors&that&occurred&on&the&CCCCC&server&and&appear&to&correlate&with&SITE&method&calls.&The&overall&volume&of&FTP&errors&has&decreased&since&the&previous&analysis&period.&(Trend:&Improvement)&

↗&&

&

CRITICAL CONCERNS: 16.8% of FTP responses resulted in an error. This is a decrease from the 25.4% FTP error rate noted in the previous report.

38.4% of FTP errors originated on the CCCCC server.

Page 9: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 9

Spikes, in both FTP error rate (indicated by the vertical red bars) and transaction rate (indicated by the green line) on CCCCC, occurred that the same time each day. The nightly spike is highly suggestive of an automated FTP process that is broken or otherwise misconfigured.

100% of FTP errors outbound from CCCCC were returned to a single client IP (xxx.xxx.xxx.xxx).

100% of FTP errors on CCCCC affected the XXX_XXX user.

FTP errors on CCCCC had two error messages. The messages are available below.

Page 10: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 10

Further analysis of FTP errors suggests that there is a relationship between FTP 500 errors and the use of the FTP SITE method. FTP 500 errors are indicative of erroneous syntax resulting in an unrecognized action that, as a result, could not take place. Looking at the busiest FTP server (CCCCC), we see an almost 1:1 relationship between the use of the SITE method and FTP error code 500.

IMPROVEMENT OPPORTUNITIES: Not evaluated.

Nicole Pennington
Nicole Pennington
Time trending errors can also help uncover other correlations.
Nicole Pennington
Page 11: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 11

&&&&&

CITRIX A review of Citrix performance

FINDINGS:

Investigate&lengthy&session&load&times&on&the&DDDDD&device&that&primarily&affected&two&clients&and&were&related&to&a&single&application.&Citrix&load&times&have&slightly&decreased&since&the&previous&observation&period.&(Trend:&Improvement)&&

&&

CRITICAL CONCERNS: Several ICA servers are experiencing lengthy load times in excess of 40 seconds per session launch. When launching an ICA session, lengthy load times will delay the start of the ICA session and cause latency in overall application processing. ICA session launches transiting the DDDDD device experienced a high number of launches with long load times.

Nicole Pennington
Nicole Pennington
Citrix analysis can help spot poor application performance, unrelated to the Citrix ICA protocol.
Page 12: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 12

Drilling into DDDDD, we can see that session launches transiting two Cisco devices are primarily affecting two clients: FFFFF and GGGGGG.

Three #MMMMMM application was most impacted by lengthy load times. Investigate transactions that may be impacted by lengthy load times for this application.

IMPROVEMENT OPPORTUNITIES: Not evaluated.

Page 13: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 13

&&&&&

STORAGE A review of all parsed storage protocol traffic. Protocols include (if licensed): CIFS, NFS, and iSCSI.

FINDINGS:

Investigate&STATUS_ACCESS_DENIED&CIFS&errors&that&transited&the&NNNNN&device&and&appeared&to&have&originated&at&yy.yy.yy.yy.&The&volume&of&CIFS&errors&significantly&increased&since&the&previous&observation&period.&(Trend:&Worse)&

↓&&

&

CRITICAL CONCERNS: 49.6% of CIFS responses were errors. Severity of CIFS errors ranges widely from informational to severe. High volumes of errors should be investigated to determine if action is required to fix or if changes can be made to reduce unnecessary processing time.

70.7% of CIFS errors transited the NNNNN device.

Page 14: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 14

CIFS errors on NNNNN were returned to 118 client IPs.

Looking client-side at some of the top contributors of CIFS errors on the NNNNN device, it appears that a large portion of CIFS errors that transited NNNNN originated on SSSSS at yy.yy.yy.yy.

Page 15: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 15

The majority of CIFS errors on NNNNN have variations of STATUS_ACCESS_DENIED error messages.

CIFS error rate (indicated by the vertical red bars) on NNNNN directly correlates with transaction rate (indicated by the green line). Investigate transactions that may be impacted by these CIFS errors. At peak, this device experienced 1,049,331 errors over the course of a single hour, or more than 291 errors every second. Note that this server was only active for four days during the observation period.

IMPROVEMENT OPPORTUNITIES: Not evaluated.

Nicole Pennington
Page 16: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 16

&&&&&

SUPPORTING APPLICATION INFRASTRUCTURE A review of protocol traffic related to supporting application infrastructure, including DNS, SSL, SMTP, and LDAP.

FINDINGS:

Investigate&the&high&volume&of&DNS&response&errors&concentrated&on&the&HHHHH&device&that&were&related&to&reverse&IP&lookups.&(New&finding)& ☀&&

&Investigate&excessive&use&of&the&ANY&method&by&the&PPPPP&server;&a&significant&volume&of&ANY&method&calls&originated&in&Australia.&The&volume&of&ANY&method&calls&has&slightly&decreased&since&the&previous&analysis&period.&&(Trend:&Improvement)&

↗&&

&

CRITICAL CONCERNS: 91.4% of all DNS responses were errors. A DNS response error occurs when a client makes a DNS lookup and the DNS server responds with some sort of error. These errors may not break an application, but they add latency to application transactions and cause unnecessary processing on the DNS server.

48.6% of DNS response errors originated on the HHHHH device. Note that 99.5% of requests made to this device result in a DNS response error.

Nicole Pennington
Nicole Pennington
DNS analysis spots problems contributing to overall latency that can often be fixed with minimal effort.
Page 17: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 17

The DNS response error rate (indicated by the vertical red bars) on HHHHH directly correlates with transaction rate (indicated by the green line). Investigate transactions that may be impacted by DNS response errors.

Nearly 100% of DNS response errors outbound from HHHHH were returned to LLLLL via a Cisco device.

DNS response errors outbound from HHHHH are related a number of reverse IP lookups. Note that these queries are erring nearly 100% of the time they are called.

Page 18: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 18

Over 15,500,000 instances of the DNS “ANY” method occurred during the observation period. This is a decrease in the volume of ANY method requests noted in the previous report, however, this is still a concerning volume. Use of the ANY method returns all known information about a DNS zone in a single request, and is usually indicative of a DNS Amplification Attack. More information available here: http://www.us-cert.gov/ncas/alerts/TA13-088A.

86.3% of ANY method calls occurred on the PPPPP DNS server at xx.yy.zz.aa.

The following Geomap identifies the physical location of IPs that sent ANY requests to the server at xx.yy.zz.aa. A denser dot indicates a higher volume of transactions. Note that the AAA.BB.XXX.ZZ IP located in Canberra, Australia accounts for a large portion of these ANY method requests; this may be related to malicious activity.

IMPROVEMENT OPPORTUNITIES: Not evaluated.

Nicole Pennington
Nicole Pennington
Nicole Pennington
Geomaps allow for a geographical visualization of devices communicating on your network.
Page 19: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 19

&&&&&

APPLICATION COMMUNICATION

FINDINGS:

Investigate&Zero&Windows&that&occurred&on&the&RRRR&device.&Zero&Windows&occurred&in&spikes;&these&spikes&have&become&much&more&severe&since&the&previous&observation&period.&(Trend:&Worse)&

↓&&

&

CRITICAL CONCERNS: More than 77,000,000 Zero Windows were observed on the XXXXXXX network over the course of the seven-day observation period. A Zero Window indicates that the connection between two devices has stalled and that the device sending the Zero Window is unable to keep up with the rate of data that a peer is sending. In effect, the device sending the Zero Window is saying, “send no data until further notice.” 52.4% of Zero Windows were outbound from the RRRR device.

Nicole Pennington
Nicole Pennington
TCP analysis provides insight into a commonly overlooked region, where the network meets the application
Page 20: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 20

At peak, 4,620,000 Zero Windows were sent from RRRR over the course of a single hour, or more than 1,283 Zero Windows sent each second.

60.5% of Zero Windows outbound from RRRR were sent to the TTTTT device.

100% of Zero Windows sent from RRRR were related to the CIFS protocol.

IMPROVEMENT OPPORTUNITIES: Not evaluated.

Nicole Pennington
Nicole Pennington
Nicole Pennington
Tying TCP metrics to an L7 protocol can help diagnose underlying communication problems.
Page 21: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 21

&&&&&

NETWORK

FINDINGS:

Investigate&high&volume&of&IP&fragments&outbound&from&the&UUUUU&device.&Outbound&IP&fragments&were&not&previously&noted&on&this&device.&(New&finding)& ☀&&

&

CRITICAL CONCERNS: More than 29,300,000 IP fragments were sent onto the XXXXXXX network over the course of the seven-day observation period. IP fragmentation may be caused by an MTU mismatch between devices on the network. This results in high volumes of segments being sent across the network, which can overwhelm both the network as well as devices.

44.4% of IP fragments were outbound from the UUUUU device at aa.bbb.ccc.dd.

100% IP fragments from UUUUU were sent to uu.xx.yy.zz via broadcast traffic on UDP port 8156.

& &

Page 22: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 22

METRICS CHECKLIST Web&Application& &5xx&Errors& Review&of&serverTside&errors& ✓&&5xx&server&error&rate& Review&of&HTTP&servers&experiencing&high&5xx&error&rate& ✓&&4xx&Errors& Review&of&clientTside&errors& ✓&&URIs& Review&of&processing&time&by&URI& ✓&&Sever&Processing&Time& A&general&health&check&of&all&HTTP&server&devices&seen&by&ExtraHop.&A&review&of&

group&level&processing&time.& ✓&&

Database& &Errors& Review&of&Database&errors& ✓&&Server&error&rate& Review&of&Database&servers&experiencing&high&error&rate& ✓&&Method&Performance& Review&of&Database&method&performance& ✓&&Server&Processing&Time& A&general&health&check&of&all&DB&server&devices&seen&by&ExtraHop.&A&review&of&

group&level&processing&time.& ✓&&

Middleware& &Errors& Review&of&MQSeries&Errors& ✓&&Errors& Review&of&FTP&errors& ✓&&Error&Rate& Review&of&FTP&error&rate& ✓&&Server&Processing&Time& Review&of&FTP&server&processing&time& ✓&&Errors& Memcache&errors& ✓&&Misses& Review&of&Memcache&servers&experiencing&high&volume&of&misses& ✓&&Hits& Review&of&Memcache&servers&experiencing&high&volume&of&hits& ✓&&

Citrix& &Latency& Review&of&network&latency&time&for&clients&attached&to&a&Citrix&server& ✓&&Load&Time& Review&of&client&load&time&for&clients&attached&to&a&Citrix&server& ✓&&Client&Types& Review&of&Citrix&client&types&used&to&access&Citrix&servers& ✓&&

Storage& &Errors& Review&of&CIFS&errors& ✓&&Error&Rate& Review&of&CIFS&error&rate& ✓&&Processing&time& Review&of&CIFS&processing&time& ✓&&File&access&time& Review&of&file&access&times&on&high&volume&CIFS&servers& ✓&&FSInfo& Review&of&FSInfo&queries&on&high&volume&CIFS&servers& ✓&&Errors& Review&of&NFS&errors& ✓&&Error&Rate& Review&of&NFS&error&rate& ✓&&Processing&time& Review&of&NFS&processing&time& ✓&&File&access&time& Review&of&file&access&times&on&high&volume&NFS&servers& ✓&&Errors& Review&of&iSCSI&errors& ✓&&Error&Rate& Review&of&iSCSI&error&rate& ✓&&File&access&time& Review&of&file&access&times&on&high&volume&iSCSI&servers& ✓&&& & &

Page 23: Atlas Services Remote Analysis Report Sample

Atlas Services | Remote Analysis Report Day 1 – Day 7

CONFIDENTIAL 23

METRICS CHECKLIST (CONTINUED) Supporting&Application&Infrastructure& &Errors& Review&of&SMTP&errors& ✓&&Error&Rate& Review&of&SMTP&error&rate& ✓&&Request&Timeouts& Review&of&DNS&request&timeouts& ✓&&Requests&vs.&Responses& Review&DNS&requests&vs.&DNS&responses& ✓&&Response&Errors& Review&DNS&response&errors& ✓&&Server&Error&Rate& Review&of&DNS&servers&experiencing&high&error&rate& ✓&&Error&Rate& Review&of&DNS&error&rate& ✓&&A&vs.&AAAA& Review&of&IPv6&DNS&lookups&and&responses& ✓&&Processing&Time& Review&of&DNS&processing&time& ✓&&Errors& Review&of&LDAP&errors& ✓&&Processing&Time& Review&of&LDAP&processing&time& ✓&&SSL&Certificate&Size& Review&of&512Tbit&SSL&certificates.& ✓&&Expiring&Certificates& Review&of&SSL&certificate&expiration&dates.& ✓&&

Application&Communication& &Zero&Windows& Number&of&zero&window&advertisements&received.&Zero&windows&are&an&

indication&of&one&side&of&a&TCP&conversation&overwhelming&the&other.& ✓&&Receive&Window&Throttles&

Number&of&times&the&advertised&receive&window&of&the&peer&device&limits&the&throughput&of&the&connection.&Throttling&occurs&when&a&device&is&trying&to&slow&down&the&dataflow&coming&from&a&peer.&

✓&&

Out&of&Order& Number&of&packets&sent&out&of&order.&& ✓&&Tinygrams& Inefficient&segmentation&of&TCP&payload&resulting&in&more&packets&on&the&

network.&& ✓&&Aborts& TCP&conversation&forcibly&ended&due&to&error&within&TCP&data&framework& ✓&&Slow&Starts& Connection&throughput&reduced&due&to&TCP&slow&start&congestion&avoidance.& ✓&&Dropped&Segments& Packets&lost&en&route&between&two&devices&and&required&retransmission& ✓&&Round&Trip&Time& High&network&latency& ✓&&RTO& A&1T&to&8Tsecond&gap&in&TCP&conversations& ✓&&

Network&Health& &VLANs&& A&review&of&relative&traffic&occurring&on&different&tagged&VLANs& ✓&&Multicast&Top&Groups&& A&review&of&Top&Multicast&talkers.& ✓&&Traffic&& A&measure&of&all&the&traffic&being&passed&to&the&ExtraHop&system.& ✓&&IP&Fragmentation& A&review&of&observed&IP&fragmentation.& ✓&&Traffic& A&review&of&the&L3&traffic&profile.& ✓&&Traffic& A&review&of&proportions&of&L7&traffic.& ✓&&&

Nicole Pennington
Nicole Pennington
Subscribing to the Atlas service gets you scheduled reports that detail the items listed on the checklist above.
Nicole Pennington