Page 1
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 1
John Pittle
SharkFest ’17 Europe
#sf17eu • Estoril, Portugal • 7-10 november 2017
Riverbed Technologies9 november2017
Defining a Requirements Based Packet Capture Strategy
Page 2
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 2
John Pittle
SharkFest ’17 Europe
#sf17eu • Estoril, Portugal • 7-10 november 2017
Riverbed Technologies9 november2017
Alternate Title: Preparing to navigate Layer-8
Page 3
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 4
•Which IT teams / disciplines are represented in the session today?
•What industries are represented?
Audience Profile
Page 4
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 5
• Team Lead: App911 Emergency Troubleshooting
• Team Lead: Technology Adoption Services
•Consulting Practice Mentor
•Best Practices Contributor
•Program Owner – Riverbed Performance Management Workshop Series
•Content Developer for Riverbed Performance Management Foundations Course
Speaker Introduction
Page 5
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 6
• Team Lead: App911 Emergency Troubleshooting
• Team Lead: Technology Adoption Services
•Consulting Practice Mentor
•Best Practices Contributor
•Program Owner – Riverbed Performance Management Workshop Series
•Content Developer for Riverbed Performance Management Foundations Course
Speaker Introduction
I Love solving complex performance problems
with packets and performance tools
Page 6
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 7
•We Love Packets!
•Many performance / availability issues can only be solved with packets and expert analysis
•Analysis is often delayed or deferred because we don’t have the packets or the context we need at the time we need them
•Requirements based design of packet capture and analysis solutions can help ensure you get the funding needed to adequately support the business
Session Premise
Page 7
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 8
• Engage and Participate
• Share your experience
• Learn from your Peers
• Improve your Craft and your Value to your Organization
My Ask for This Session
Page 8
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 9
•Performance Management Landscape
•Packet Related Workflows & Technologies
•Requirements & Business Case Mechanics
•Gap & Risk Heat Maps
•Recommendations and Wrap-up
Agenda
Page 9
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 10
• End User Experience
• User End Point Monitoring
• Packets
• Flow (NetFlow, Jflow, Sflow, NBAR, etc)
• SNMP
• Application Metrics
• Application Logging
• Javascript Injection
• Host Metrics
• Infrastructure Metrics
Performance Management Landscape
Visibility and Instrumentation
Page 10
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 11
Hybrid Enterprise
Page 11
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 12
End User Devices & Locations
Page 12
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 13
SaaS Applications
Page 13
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 14
Cloud Hosting & Services
Page 14
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 15
On-Prem Data Center(s)
Page 15
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 16
Business Partners
Page 16
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 17
MPLS Provider(s)
Page 17
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 18
Internet Transport(s)
Page 18
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 19
•How do we get performance visibility to all of this?
Complex!
Page 19
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 20
User End Point Device MonitoringEUE PerformanceBefore / After AnalysisDevice HealthUtilization Monitoring
Page 20
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 21
Browser EUE - Javascript Injection
Page 21
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 22
Internal Application ComponentsJava / .NET ProfilingJMX MonitoringApplication LoggingVendor Agents
Page 22
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 23
Infrastructure Devices / Servers SNMPWMIVendor Agents
Page 23
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 24
Flow RecordsNetflowEnhanced FlowS-Flow, J-FlowNBAR/NBAR2
Page 24
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 25
Packet Capture / CollectionHost CapturesSPAN/TAPPassive AppliancesTraffic Aggregators
Page 25
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 26
Full End to End VisibilityPackets
End Point Device
Infrastructure
FlowJavascript
Application
Page 26
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 27
•Users are complaining!!
•App ABC is slow, what infrastructure does it use?
• Link utilization is 80%, who’s using the bandwidth?
• Server utilization is 85%, who’s generating the load?
•How long has it been going on?
•Management wants hourly status updates
•Who owns the fix?
•My area looks fine, it must be the Network
Heard in the War Room…
Page 27
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 28
•Users are complaining!!
•App ABC is slow, what infrastructure does it use?
• Link utilization is 80%, who’s using the bandwidth?
• Server utilization is 85%, who’s generating the load?
•How long has it been going on?
•Management wants hourly status updates
•Who owns the fix?
•My area looks fine, it must be the Network
Heard in the War Room…
Chaos Confusion
Trust Issues Panic
Unscheduled Overtime
Page 28
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 29
•Are we meeting our SLAs?
•Are customers happy?
• Is IT measurably contributing to company success?
•Are we investing in the right areas? How do we know?
•What’s the impact if we ___________?
Heard in the CIO Staff Meeting
Page 29
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 30
•Are we meeting our SLAs?
•Are customers happy?
• Is IT measurably contributing to company success?
•Are we investing in the right areas? How do we know?
•What’s the impact if we ___________?
Heard in the CIO Staff Meeting
How do we make the right investments to support the business today and in the future?
Page 30
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 31
Complex Requirements!
How can we meet these complex requirements?
Page 31
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 32
•A comprehensive, synergistic, holistic Performance Management strategy is needed to fully answer these questions
•Packet based performance monitoring is a key part of that strategy
Holistic Performance Management
Page 32
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 33
Questions / Discussion
Page 33
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 34
•Capture
•Performance Monitoring
•Triage and Troubleshooting
•Pre-Release Performance Analysis /Protocol Analysis
•Planning
Packet Workflows & Technologies
Page 34
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 35
•Host Based Captures
•Network Devices with Capture Capability
•Passive Appliances
•SPAN/TAP Design
•Packet Aggregation Design
•Packet Aggregation Appliances
Packet Capture
Page 35
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 36
Manage Multiple Host Capture Agents
Page 36
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 37
Manage Multiple Host Agents
Page 37
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 38
Preview before downloading
Page 38
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 39
Preview before downloading
Page 39
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 40
Navigate to most relevant traffic before download
Page 40
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 41
•Always on…
•All packets, all the time, based on the traffic presented
•Capture packets into very large, indexed repository
•Packet Slicing and Filtering
•Preview and filter relevant conversations before downloading for analysis
Passive Appliances - Capture
Page 41
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 42
Passive Appliance - Continuous Capture
Page 42
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 43
• Engineered traffic feeds for performance and security tools
• SPAN design challenges• Device / traffic impacts
• Full duplex over half duplex
• Oversubscription
• TAP design challenges• Full duplex over half duplex
• Managed vs. unmanaged TAPs
• Virtual TAPs for ESX
SPAN & TAP
Page 43
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 44
• Essential in large environments
•Key Features:
• Filtering, Aggregating, Splitting
•Header / Layer modifications
• Time Stamps
•Packet De-duplication
• Flow generation
•Highly Scalable
Packet Aggregators
Page 44
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 45
Questions / Comments
Page 45
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 46
• Always on, always analyzing app and network performance • All conversations, all the time, based on the traffic
presented• Transaction level monitoring (Web, SOAP, SQL, etc.)• TCP Level monitoring (Request / Response, Retrans,
Congestion, In-flight, Windowing)• Proactive alerting• Baselining and historical trends•Quickly determine problem domain; download relevant
packets only when deeper dive is needed
Monitoring - Passive Appliances
Page 46
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 47
•Filter and isolate transactions of interest
•Utilize Automated Expert Analysis
•Overlay traffic with key performance statistics for visual correlation
•End to End Transaction views from multiple capture points
•Analyze performance indicators including protocol effects
Triage & Troubleshooting
Page 47
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 48
•DB instances in AWS East and AWS West
• Full mesh replication between AWS instances, and mirror instances in customer DC-1 / DC-2
•Replication delays between AWS East and DC-2
•DB used the technical term ‘LAG’
• Impact: Customer closes their data entry session; returns a few minutes later and is unable to see the latest updates (due to the LAG)
Example: DB Cloud Replication
Page 48
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 49
Real Time Views - Sample
Page 49
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 50
•Download a 1 minute packet sample
•Chosen from appliance based on low throughput period
•Automated Summary of Delays Analysis
Expert Analysis Sample
Page 50
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 51
• Minor packet loss detected as reported by the 7 3ACK indicators
• Out of sequence packets are not necessarily expected, but we are using Internet transport - so we should expect the unexpected
Summary Statistics
Page 51
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 52
Relevant Statistics
Microbursts of 18-23Mbps
Throughput Bytes in Flight
Out of Sequence
Page 52
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 53
Packet Transfers vs. Bytes in Flight
Page 53
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 54
•What looks like continuous transfer on the appliance summary view, is actually short duration bursts of transmissions
• In 1 minute packet capture we can see dozens of start / stop packet exchange activity
• The top chart – “bytes in-flight” shows spikes and dips that correlate with the packet exchange activity
• Let’s drill down into one of the bursts of packet activity next…
Discussion
Page 54
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 55
399ms burst drill down - 2.2 MB
Page 55
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 56
•Deep dive into a 399ms burst
•Moved 2.2 MB of payload during this burst
• Top chart of bytes in flight looks a lot like TCP slow-start is playing a role
•Drill into other bursts show the exact same TCP slow-start behavior
•Not good for throughput…
• Linux admin reviewed and commented “hmm, looks like slow start on idle” is the default for these servers
Discussion
Page 56
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 57
Page 57
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 58
• Automated expert analysis can be a huge time saver when troubleshooting!
• Diagnosed TCP Slow Start on Idle without looking at decodes
• Packets don’t Lie…., and pictures you paint with packets tell the true story
•One more quick sample of expert analysis visualization before we move on….
Questions / Comments
Page 58
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 59
• Insurance company call center
•Reps have a variety of complaints:• Dropped calls
• Screen pop not synchronized with call arrivals
• CRM app session drops
•Reviewed packets from call center PCs and found periods of packet loss and retransmissions
•Next screens show visualization of TCP RTO affects which eventually lead to TCP RST
Background
Page 59
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 60
TCP RTO Visualization 1 of 4
Page 60
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 61
TCP RTO Visualization 2 of 4
Page 61
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 62
TCP RTO Visualization 3 of 4
Page 62
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 63
TCP RTO Visualization 4 of 4
Page 63
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 64
•Dev Team Unit Testing
•Load Testing
•Pre-Deployment Performance Assessment
•New Technology Assessments
•3rd Party Software Qualification
Performance Analysis Workflows
Page 64
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 65
•Capacity Planning
•Migration Planning
•Technology Assessments
•Bandwidth Impact Assessment
•End to End Modeling
Impact Assessments / Planning
Page 65
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 66
Pre-Migration Assessment Example
Latency Sensitive Conversations
Page 66
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 67
Impact of 40ms Round Trip Latency
Response time increases from 1 minute to 6 minutes
Page 67
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 68
Questions / Discussion
Page 68
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 69
Time to Talk Money
Page 69
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 70
•Packets are an essential data source for Performance Management workflows
•Business leaders / budget owners seldom understand the importance
•They need your help to understand how visibility gaps are actually a risk to the business
Requirements / Business Case
Page 70
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 71
Troubleshooting in the Wild
Page 71
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 72
•DB Replication Delays impact customer data visibility
•Claims Management Down
•Load Testing brings down production data center
•Call Center Disruption
• eCommerce web page crash during checkout
• 2 hour outage of global eCommerce website
•Finance website crashes after super bowl commercial
•Global DNS Failover Troubleshooting
Impact to the Business
Page 72
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 73
•Tie your requirements for packet based capabilities to key apps and key infrastructure services
•Characterize the business risk to your key apps & infrastructure
•Capture current state capabilities
• Identify gaps
• Identify risk to the business
Business Case Guidance
Page 73
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 74
•Poor app performance overall, can’t meet SLAs
•App / Service is non-responsive
•Dependent system is down
•Can’t complete key transactions
• Incomplete visibility
•Poorly performing infrastructure services are impacting everything
Types of Service Delivery Risks
Page 74
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 75
• Lost Revenue
• Lost Productivity / Overtime Costs
•Penalties / Fines
•Missed Market Opportunities
•Customer Satisfaction / Customer Churn
Business Impact
Page 75
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 76
•The most important apps to the business
•Characterize scope, scale, user community
•Identify business disruption when these apps are down or performing poorly
•Simple spreadsheet to capture key attributes
Identify Your Key Apps
Page 76
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 77
Key App Attributes
Page 77
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 78
Additional Attributes
Page 78
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 79
•Service Delivery Managers
•IT Business Office
•BU Owners
•Operations
Who has these details?
Page 79
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 80
• For each Key App - what is the most essential traffic to capture?
•What metrics / capability would this give you?
• If you had “full coverage”, how would you describe it?
Current State: Capture / Visibility Capabilities
Page 80
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 81
• Simple Excel Spreadsheets with conditional formatting
•Visualize where we have coverage vs. where we need coverage
•Use color scheme to indicate risk
• Iterations of the heat map can be used to communicate a plan & cost estimates
Let’s use a Heat Map!
Page 81
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 82
Current State: Packet Capture Coverage
Page 82
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 83
Current State: Packet Capture Coverage
Page 83
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 84
Current State: Packet Capture Coverage
Page 84
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 85
Current State: Packet Capture Coverage
Page 85
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 86
•Where are my gaps / risks today?
•What do I address first?
•…second?
•…third, and so on?
•What would it take to reduce unplanned downtime for this app by 120 minutes per year?
•What would that be worth to the business?
Current State / Future State Roadmap
Page 86
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 87
Phase 1 – This Quarter
Page 87
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 88
Phase 2 – Next Quarter
Page 88
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 89
Phase 3 – two Quarters out
Page 89
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 90
An Alternate Roadmap…
Page 90
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 91
Current State: Packet Capture Coverage
Page 91
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 92
Alternate Phase 1
Page 92
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 93
Comments / Discussion
Page 93
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 94
•What are some key shared services in your environment?
•Degradation in these services will impact the entire environment
Key Infrastructure – Shared Services
Page 94
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 95
•DNS
•NTP
•Active Directory / LDAP
•Single Sign-on
•Email
Key Infrastructure – Shared Services
•Sharepoint Servers
•VPN / Token Gateways
•NAS Storage
•VoIP and related infrastructure
•Etc…
Page 95
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 96
Current State – Critical Shared Services
Page 96
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 97
Heat Map Demo
Page 97
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 98
Questions / Comments
Page 98
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 99
•Leverage host based captures everywhere
•Use passive appliances to get coverage for infrastructure shared services and all application edge traffic (EUE)
•Add supplemental analysis capabilities on top of Wireshark
General Recommendations
Page 99
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 100
• Identify key apps where inter-tier packets are most beneficial and expand traffic feeds
•Keep Management informed of current state and your recommended roadmap to increase visibility
General Recommendations
Page 100
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 101
•Packets are an essential component of your overall Performance Management capabilities
•Most companies have significant gaps in their packet capture and analysis workflows
• These gaps represent business risk and can be identified with a rationalized current state assessment tied to key apps and shared services
•Create a future state roadmap that shows the improvements and benefits of addressing gaps
Wrap-Up
Page 101
#sf17eu • Estoril, Portugal Defining requirements for a Packet Capture Strategy 102
Thank You for your Participation!