AVOIDING 3AM TROUBLESHOOTING Jack Fenimore FSE, Central and Southern Ohio
What are we troubleshooting?
Did this work before?
Does the traffic go through the F5?
Is it reproducible?
Is there a log server?
Did the timing of the issue coincide with any other changes?
Before beginning determine what devices are involved
Obtain or create a network diagram from the client to the F5 to the pool members
• Displays a snapshot of the BIG-IP system configuration in a user-friendly format
• Evaluates the configuration against a database of known issues, common errors, and published F5 best practices
• Provides tailored feedback about configuration issues, a description of the issue, recommendations for resolution, and a link to additional information in the AskF5 Knowledge Base
What does BIG-IP iHealth do?
iHealth Diagnostics Page
Reports configuration issues and provides a link to additional
information in AskF5
• Routing to a listener on the BIG-IP
• Listeners are
• Self IPs
• SNATs
• NATs
• Virtual Servers
How Does Traffic Enter a BIG-IP?
10.2.2.100:80
10.2.2.1
External VLAN
Internet
10.2.2.50
NAT to 192.168.4.8
1. Existing connection in connection table
2. Packet filter rule
3. Virtual server
4. SNAT
5. NAT
6. Self-IP
7. Drop
Packet Processing Priority
• Standard
• Forwarding IP
BIG-IP Virtual Server Types
http_pool 1.1.1.1 :8080 1.1.1.2 :8080
VLAN Internal
IP 1.1.1.254
VLAN External
IP 2.2.2.254
RED BLUE
HTTP request DST: 2.2.2.2:80 SRC: 3.3.3.3
http_vs 2.2.2.2:80
Client 3.3.3.3
HTTP request DST: 1.1.1.1:8080 SRC: 3.3.3.3
HTTP response DST: 3.3.3.3
SRC: 2.2.2.2:80
HTTP response DST: 3.3.3.3
SRC:1.1.1.1:8080
BIG-IP LTM chooses RED
The default gateway for the RED and BLUE servers is 1.1.1.254 on BIG-IP LTM
Standard Virtual Server Packet Flow
IPv4 IPv6
VS listener
IPv4 IPv6
iRules
Load balancing
algorithms
TCP
Express
iRules SSL
iRules HTTP
iRules RAM
Cache
Proxy
iRules HTTP
iRules
iRules
iRules TCP
Express
iRules
SSL
1. Specific IP address and specific port 10.0.33.199:80
2. Specific IP address and all ports 10.0.33.199:*
3. Network IP address and specific port 10.0.33.0:433 netmask 255.255.255.0
4. Network IP address and all ports 10.0.33.0:* netmask 255.255.255.0
5. All networks and specific port 0.0.0.0:80 netmask 0.0.0.0
6. All networks and all ports 0.0.0.0:* netmask 0.0.0.0
Virtual Server Priority
Layer 3
• Ping
• Check routes
• Tracepath utility
• Traceroute from both directions
• Telnet to the remote port
[root@3900-1:Active:In Sync] config # tracepath 10.0.180.1
1: 10.50.0.221 (10.50.0.221) 0.175ms pmtu 1500
1: 10.0.180.1 (10.0.180.1) 2.981ms reached
Resume: pmtu 1500 hops 1 back 1
Connections 10.0.180.250:59918 - 10.50.220.101:80 - any6.any - any6.any ----------------------------------------------------------- TMM 2 Type any Acceleration none Protocol tcp Idle Time 52 Idle Timeout 300 Unit ID 1 Lasthop /Common/external 00:18:19:9e:b4:75 Virtual Path 10.50.220.101:80 ClientSide ServerSide Client Addr 10.0.180.250:59918 any6.any Server Addr 10.50.220.101:80 any6.any Bits In 5.4K 0 Bits Out 4.8K 0 Packets In 6 0 Packets Out 5 0
10.0.180.140:51711 - 10.50.220.100:80 - 10.80.0.220:51711 - 10.80.0.51:8080 --------------------------------------------------------------------------- TMM 3 Type any Acceleration none Protocol tcp Idle Time 2 Idle Timeout 300 Unit ID 1 Lasthop /Common/external 00:18:19:9e:b4:75 Virtual Path 10.50.220.100:80 ClientSide ServerSide Client Addr 10.0.180.140:51711 10.80.0.220:51711 Server Addr 10.50.220.100:80 10.80.0.51:8080 Bits In 52.9K 67.2K Bits Out 134.9K 54.2K Packets In 21 15 Packets Out 36 25
[root@3900-1:Active:Changes Pending] config # tmsh show ltm persistence persist-records client-addr 10.0.180.140 Sys::Persistent Connections source-address 10.50.220.100:80 10.80.0.51:8080 3 Total records returned: 1
tmsh show /sys conn
tmsh show /ltm persistence persist-records
MAC Masquerade
• Unique MAC assigned to a traffic group
• Minimize ARP communication or dropped packets during failover by using a consistent MAC address
• Improve reliability and failover speed
• Improve interoperability with switches slow to process gARP’s
• When a BIG-IP becomes active it will send a gARP for all Virtual IP’s for which it is now active. If link down on failover is set it will also perform an interface reset, dropping carrier momentarily
• SOL13502 (SOL7214 for v10.x)
Review of Auto Last Hop
• Tracks the source MAC address and VLAN of incoming connections.
• Return traffic from pools is sent to the MAC transmitted the request,
• Even if the routing table points to a different network or interface
• The BIG-IP can send return traffic to clients even if no matching route.
• Auto Last Hop is a desired behavior and so it is enabled by default.
• F5 Networks recommends leaving enabled
• Under rare circumstances you may want to disable Auto Last Hop
• If disabled the routing table is used to forward the packet
• SOL11796: Overview of the Auto Last Hop setting
TCP Reset Cause
• Informs where and why a TCP reset was generated. (SOL13223)
• A diagnostic enhancement
• Use as necessary for troubleshooting
• Added for all profiles which could cause a TCP RST
• HTTP
• Stream
• FastL4
• FastHTTP
• etc. 3900-1 err tmm3[8641]: 01230140:3: RST sent from 10.80.0.50:80 to 10.80.0.221:1115, [0x173b10d:5961] TCP RST from remote system
Viewing Reset Cause
• Insert into TCP reset (packet captures)
- tmsh mod sys db tm.rstcause.pkt {value "enable"}
- The default is “disabled”
• Send to syslog (/var/log/ltm)
• tmsh mod sys db tm.rstcause.log {value “enable”}
• The default is “disabled”
• Show reset cause stats
• tmsh show net rst-cause
RST Packets Containing Data (RFC1122)
• What do the RFCs have to say about this?
• A TCP SHOULD allow a received RST to include data.
• It has been suggested that a RST segment could contain ASCII text that encoded and explained the cause of the RST. No standard has yet been established for such data.
• Some other stacks do the same (e.g., HP-UX and MacOS)
• Has been known to cause issues in the field
Pool action on service down
How the system should respond when the target pool member
becomes unavailable – pool object property.
• None: Specifies that the system maintains existing connections,
but does not send new traffic to the member (default)
• Reject: Use "Reject" when you want LTM to explicitly close both
sides of the connection when the server goes DOWN
• Drop: Specifies that the system simply cleans up the
connection, no reset will be sent
• Reselect: Specifies that the system manages established client
connections by moving them to an alternative pool member
I did ABC and now when I log in the GUI I see: “The configuration has not yet loaded. If this message persists, it may
indicate a configuration problem.”
To determine what is wrong:
tmsh load /sys config partitions all
Attack Prevention and Dynamic Reaping
• SYN flood, DDoS, DoS attack prevention
• SYN Cookies*
• Dynamic Reaping
• Continually monitors existing TCP connections to ensure the integrity of the connection table
• Removes the oldest idle connections if it needs to clear up more memory
• Protects the BIG-IP against SYN attacks from non-spoofed IP addresses that fully negotiate a connection
• Avoid changing default values without Support assistance
7 APPLICATION
6 PRESENTATION
5 SESSION
4 TRANSPORT
3 NETWORK
2 DATA LINK
1 PHYSICAL
* The article http://cr.yp.to/syncookies.html provides an elaborate explanation
of SYN cookies
Tips on General Configuration
• Set DNS and NTP
• Re-activate your license before upgrading (*Will impact traffic)
• Adjust the Number of Records Per Screen
• Set up a floating IP address on each VLAN
• Understand the BIG-IP operates in STP pass-thru mode
• Virtual Address vs Virtual Server, disabling ARP
• Nagles algorithm