Page 1
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
SharkFest ’19 Europe
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Session #14 - TCP Split Brain – Part II
John Pittle
Using Wireshark to
Compare & Contrast
behavior and TCP state of
client vs. server
Riverbed TechnologiesPerformance Management
[email protected]
Page 2
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Premise: TCP Split Brain
• When troubleshooting TCP, you often have to consider both the sender’s unique perspective and the receiver’s unique perspective
• Both endpoints are independent, but at the same time, they do react to packets from the other end
• The joint behavior gets even more interesting when there’s “high” latency in the path
Page 3
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Welcome Back!
• Continuation of Part I … after the break
Page 4
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Session Goals
• Compare and contrast TCP end point behavior
• Drill down into the “what is it doing?” and “why is it doing that?”
• Promote Wireshark Profiles Feature
• Share experience and ideas
• Expose you to visualizations that help reinforce the end point behavior we will be discussing
Page 5
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Part I Comparisons Recap
• 3-way handshake
• Latency
• Expert Info
• Fragment Overlaps / OOS / Retransmissions
Page 6
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Summary Slide from Part I
Page 7
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Part II Agenda
• Finish Fragmentation Topics
• Bytes in Flight Comparison
• Bonus – If Time Available
Page 8
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
About me?
• Performance Engineering since 1980
• Protocol Analysis since 1991• Professional Services with
OPNET / Riverbed since 2005• Love the mystery of a
complicated performance issue • Shaved off beard in 2003…
Page 9
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
My Ask of You
• Engage
• Participate
• We have a lot of detailed material
• We will explore conflicting, contradictory, and possibly confusing details
• Ask Questions
• Question Answers
Page 10
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Application Scenario
• HTTPS Web Application
• Private key is not available
• Host based captures on web server and my laptop
Page 11
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Symptoms to Analyze
• Downloading files take *forever*
• 16 seconds to download a 1.4MB file
• One TCP connection has been isolated as the connection of interest – TCP/52942-443
Page 12
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Page 13
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Where we left off in Part I
Page 14
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Dropped or OOS?
• If we freeze time right here, we can’t be sure if it’s just OOS or really a dropped packet
• We have to examine what comes next…
521543
Page 15
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Dropped or OOS?
• Our missing segment is not showing up yet…
521543
Page 16
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Anchor on SEQ==521543
521543 524112
2,569 Stream Bytes 3,876 Stream Bytes
Server capture
Page 17
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Anchor on SEQ==521543
521543
1,292
524112 525404
1,292 1,292
522834
1,277 bytes missing
Client capture
#1000 #1001 #1003
Page 18
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Anchor on SEQ==521543
521543 524112
2,569 Stream Bytes
521543
1,292
524112 525404
1,292 1,292
3,876 Stream Bytes
522834
1,277 bytes missing
Server capture
Client capture
#1000 #1001 #1003
Page 19
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Other Clues
• Next seq after #1000 should have been 522835
• Wait! Isn’t this the segment that was retransmitted by server? (yes)
Page 20
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
(Slide From Part I)
FRAME #714
Page 21
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Other Clues
• Notice the change in ACK behavior
• Client ACKs every other packet then starts to ACK every packet
• Why is this?
Page 22
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Profile Power
• Let’s flip the view a little so we can quickly see SACK fields in the decode summary
Page 23
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Profile: SB-SACK
• Client “SACKs” the new segments, but continues to report - I’m missing 522835
Page 24
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
• We can see Client is reporting a missing segment
• Yet, why does server continue to send segments other than the one requested?
• Visualization really helps with this…
Page 25
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
End Point State Review
This right most red bar is selected (see black dots on ends),
which populates the summary decode panel at the bottom.
Packet #1002 is our packet of interest for current discussion.
Page 26
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 27
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 28
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Zoom-in a little…
Page 29
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 30
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 31
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Zoom-in
Page 32
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 33
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 34
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discuss state of each end point
Page 35
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Missing segment finally arrives
• Let’s examine a few more details…
Page 36
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
107ms Time Delta
• Why 107ms from previous packet?
Page 37
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Wireshark Feature Parade
Page 38
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
With time ref set to DupACK #3
Page 39
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why “Out of Order”?
• Wouldn’t “Retrans” or “Fast Retrans” be more accurate?
Page 40
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why ACK with SACK Fields?
• We’re all caught up, nothing is missing…so why send SACK?
• “Oh, btw Sender – I received these 15 bytes of TCP stream again unexpectedly. Just letting you know”
Page 41
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why 119ms delay here?
• What does this time delta suggest about bytes in flight and the congestion window?
Page 42
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Here’s the delay – 1 x RTT
Page 43
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Summary
Page 44
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
5th Leg Completed
Page 45
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
• Did you find this deep dive interesting?
• Anything new you’ve not seen before?
• Do you find visualization helpful?
• Other Comments?
Page 46
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Split Brain Comparisons
• Bytes in Flight
Page 47
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Review
• What is the meaning of Bytes in Flight?
• How is this metric related to performance?
• BIF for sender and receiver captures look very different, let’s compare a few packet exchanges
Page 48
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Dry Cleaners Conveyor Belt
Page 49
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Bytes in Flight - Decode
Page 50
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Right Click, Apply as Column
Page 51
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Side - Incrementing
Page 52
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Sample - Math Drill Down
1761 5636
BIF == 3876
1761 5636
BIF == 1292
ACK 4345
(5636-4344)
4344
Page 53
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Send Segment #2
1761 5637
ACK 4345
10804
BIF == 6460
1761 5636
BIF == 1292
ACK 4345
5636-4344
4344
(10804-4344)
Page 54
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Receive ACK
1761 5637
ACK 6929
10804
BIF == 6460
1761 5637 10804
BIF == 3876
Page 55
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Side – Decreasing?
Page 56
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Side - Timing
Page 57
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Let’s jump to Receiver now…
Page 58
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Client Side
Page 59
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
What else can we learn from BIF?
• So far, we’ve seen how BIF can inform about network health & congestion window
• We can also “get a sense” of Send Buffer sizing and the application’s TCP API options
Page 60
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Inferring Send Buffer Size
• Bytes In-flight can give you some insight into how the Send Buffer size might be limiting the Congestion Window
• How can we find the maximum observed Bytes In-Flight?
Page 61
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Throughput Throttling
Congestion Control Window
Receive Buffer
In-Flight Data
Notice how green
line never gets
above red line
Page 62
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Throughput Throttling
Congestion Control Window
Receive Buffer
In-Flight Data
What’s holding us
back now?
Page 63
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Inferring Send Buffer Size
• Bytes In-flight can give you some insight into how the Send Buffer size might be limiting Throughput
• How can we find the maximum observed Bytes In-Flight?
Page 64
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Capture – Sort by BIF
Page 65
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
• Assume that network is healthy + TCP Congestion Window is “happy” + receive window is higher than BIF
• What does this infer about Send Buffer size and implementation?
Page 66
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Btw, something was odd…
• Did you notice anything odd in the last decode summary screen?
• Let’s look again…
Page 67
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Capture – Out of order?
Page 68
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Quick Detour via OOS…
Page 69
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
How can sender segments be OOO?
• Is this even possible?
• Why would this happen?
• Let’s re-sort and have another look…
Page 70
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Zoom in…sort by frame #
Page 71
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Interpretation vs. Decode
Page 72
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why this is important…
• Heuristics, though generally pretty good, but they aren’t perfect
• Corner cases within corner cases…
• If you relied solely on the Expert Info summary you might incorrectly conclude the network is messing with packet order (more than it really is…)
• Human interpretation is key
Page 73
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
Page 74
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
OK, back to BIF…
Page 75
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
BIF - Receiver Side
Page 76
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
In-Flight & OOS / Loss
• Let’s look at how Wireshark calculates Bytes in-flight when there is packet loss or OOS
Page 77
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Client – Receive #1275 OOS
• BIF did not increment for #1275
Page 78
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Storyboard #1
Page 79
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Storyboard #2
Page 80
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Storyboard #3
Page 81
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Storyboard #4
Page 82
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Storyboard #5
Page 83
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why doesn’t BIF Increment
Page 84
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
• We’ve compared based the stream seq #
• We’ve looked at LEN, ACK, BIF, and packet order differences
• We tried to compare the “when”, but this is a real Brain Bender
Page 85
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
We need to visualize this…
• Now that we’ve seen how Wireshark counts bytes in flight…
• …and the challenges of side by side comparisons…
• …let’s look at how we can gain better insight from using visualization
Page 86
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
First, a word about Merging
• What if we could merge captures?
• High Fidelity merge client and service side captures
• Fine tune “Send Time” and “Recv Time”
• Explicitly identify drops
• Accurately measure congestion
• Accurately measure Server Response Time
• Increases Accuracy of Advanced Analytics
Page 87
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Merge Melds Split Brain
Page 88
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Circa 2019…
Page 89
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Merged Packet Exchange with BIF Overlay
Page 90
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Quick Orientation
Page 91
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
In-flight Bytes
Page 92
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Symptom Pop-up Labels
Page 93
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
What does this upward progression tell us about CWND?
Page 94
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why In-flight drops to 0?
Page 95
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
What does this delta tell us?
18KB
9KB
Page 96
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Wireshark BIF Chart
Page 97
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Looking closer @ timing…
Page 98
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
When does Server hear the news?
Page 99
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
#998 is dropped
BTW, could this be a “tail drop”?
Page 100
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
2 of the DupACKs
Page 101
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
3 of 3 DupACKs
Page 102
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Delay Cost of Drop: # of RTTs?
Page 103
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Delay Cost of Drop: # of RTTs?
3 RTT x iRTT == 387 ms
Page 104
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
Page 105
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Let’s Visualize BIF for the Entire TCP Connection
Page 106
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
…drill into the 1.4MB Xfer
Page 107
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Delay Analysis during Xfer
Page 108
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Review
• BIF for receiver side traffic doesn’t tell you much about actual congestion window
• Segmentation offload can create misleading conclusions on sender capture
• Packet loss on high latency path can have severe impact on performance
Page 109
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Comparison Summary
Page 110
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
6th Leg Completed
Page 111
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Bonus Section
• Time Permitting
Page 112
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Split Brain Comparisons
• Congestion
Page 113
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Remember this from Part I
• Client Capture – 121ms
• Server Capture – 129ms
Page 114
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Group Discussion
• What’s the definition of latency?
• What’s the definition of congestion?
• How can you tell latency from congestion?
Page 115
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Congestion is not…
• …protocol delay
• …serialization (bandwidth delay)
• …server delay (well maybe…)
• …others?
Page 116
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Introducing…1W-TTT
• One Way - Total Transfer Time
Page 117
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
1-Way Total Transfer Time
• Probably other names for this…
• Time required for bits to leave sender and arrive at receiver…
• TTT == Total Transfer Time
Page 118
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
One Possible Formula
• 1-Way Total Transfer time (TTT) ………minus
• (Bandwidth Delay + Latency Delay)
• TTT – (BD + LD) == Congestion
Page 119
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
15 seconds
5 seconds
30 seconds
Congestion delay
Latency delay
Bandwidth delay
Exaggerated TTT Example
• Bandwidth = 1,000 bps
• Latency = 5 seconds
Client
Server
T = 0
50
30,000 bits
10 20 30 40 50
Page 120
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
RTT vs. 1-Way TTT
• RTT is not the same as 1-Way TTT
• What can Wireshark tell us about 1-Way TTT?
• Answer: not a lot…but we can infer a few things
Page 121
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
WireShark RTT include PD
• 1-Way Total Transfer time (TTT) ………minus
• (Bandwidth Delay + Latency Delay + Protocol Delay)
• TTT – (BD + LD + PD) == Congestion
Page 122
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Finding congestion with Wireshark
• How can we find examples of congestion in a capture with Wireshark?
• For TCP we can use RTT2ACK, and infer congestion based on our knowledge about how Wireshark calculates RTT2ACK
Page 123
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
It’s not Perfect…
• …but you’ll get a decent approximation
• …however, be aware…
• … it could be overstated due to protocol delay
Page 124
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Sender or Receiver Capture
• Which capture would be most interesting to understand congestion?
• Answer: each one is unique and will tell part of the story (split brain)
Page 125
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
RTT2ACK Decode
Page 126
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Apply as Column
Page 127
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Column added to our view…
Page 128
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Sort by RTT2ACK
Page 129
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Zoom in a little…
Page 130
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
How else could we do this?
Page 131
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
How else could we do this?
• Display filter…
Page 132
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Server Cap Shows 342ms RTT
• Could this be right? We just established RTT is 120-ish ms
• What is interesting about this ACK?
Page 133
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Client shows 209ms RTT?
Page 134
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Recap
• Server shows 342ms RTT2ACK
• 1TTT + Delayed ACK Timer + 1TTT
• Delayed ACK Timer 209ms (from client capture)
• RTT 342ms – 209ms == 133ms (2 x 1TTT)
• Is this in the ball park?
• What’s the smallest latency we’ve seen so far in these captures?
Page 135
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Let’s Visualize – Server
342 ms
Page 136
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Let’s Visualize – Client
209 ms
Page 137
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Discussion
Page 138
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
But Wait!
I thought we were here to talk about congestion? We are, but it
turns out we really need to understand latency first…
Page 139
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Can we Predict RTT?
• Forget the terms “Client” vs. “Server” momentarily
• Consider instead the terms “Sender” vs. “Receiver”
• What would you expect the possible time components for RTT to be for a Receiver capture?
• …and for a Sender side capture?
• (assume there’s “some” latency between hosts)
Page 140
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Receiver Capture
• Should be….
• Either super fast, or….
• Influenced by Delayed ACK Timer, and/or…
• Influenced by OOS and retransmissions
Page 141
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Receiver Capture
Page 142
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Scroll down a little…
Page 143
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
A little more…
Page 144
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
~Speed of Light RTT2ACK
Page 145
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Sender Capture
• Never less than 2 x TTT (…right?)
• Add in Receiver’s Delayed ACK Timer and/or
• Add in Receiver’s time to wait for sender to fix OOS and Packet Loss
Page 146
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Sender Capture – Lowest
Page 147
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Enough about Latency…
Page 148
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
More Visualizations
• Congestion is really hard to analyze with lists of packets and time deltas
• Let’s use visualizations starting at the 10K foot view and then drill down to the details
• We’ll use merged captures…
Page 149
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
10,000 ft/m View
Merge of both captures
Congestion reported for each direction
• Entire capture visualized
• You can see bursts of packet exchanges
• You can see the 10 sec keep alive pattern
• You can see call outs for packet loss
Page 150
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
10,000 ft/m View
Overlay chart of the Congestion Metric
Congestion reported for each direction
Page 151
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Let’s Zoom-In
Page 152
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Zoom-in to File Download
Page 153
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Advanced Analytics: ~ 20% of overall delay (blue)
Page 154
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Why so much congestion?
ISPInternet
NYC VPN
ISP
ISP
Transit
SFC VPN
GatewaySFC Web
Server
Home
Office
Orlando• Multiple Tunnels & Protocols
• Multiple ISPs
• ESX Host Oversubscribed?
• VPN Client Overhead?
Page 155
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Congestion Recap
• Any surprises about the amount of congestion?
• Congestion occurs in both directions independently
• Does this view help to explain how congestion impacts performance?
Page 156
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Summary Results
Page 157
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Marathon Completed !
Page 158
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Summary & Wrap-Up
• Interpreting TCP behavior can be confusing and complicated, especially when there is “high” latency in the path
• Captures from both end points can be beneficial
• You need to split your brain into “server perspective” and “client perspective”
Page 159
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Summary & Wrap-Up
• Wireshark features are extremely helpful, but can only take you so far
• Visualization can help you understand behavior and quickly interpret root cause
• Advanced analytics are icing on the cake…
Page 160
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Recap Part I Topics
• Background on app, topology, and symptoms
• Compare and Contrast (aka Split Brain)• 3-way Handshake
• Latency
• Expert Info
• Fragment Overlaps ,OOS, Retransmissions
Page 161
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Recap Part II Topics
• Continue where we left off
• Compare and contrast…• Bytes in Flight
• Bonus Topic
• Session Wrap-Up
Page 162
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Full Summary Results
Page 163
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
Final Questions / Comments
Page 164
#sf19eu • Palacio Estoril Hotel, Estoril, Portugal • Nov 4 - 8
End of Session