workspace/skills/f5-troubleshoot/SKILL.md
F5 BIG-IP troubleshooting - virtual server failures, pool member health, connection issues, SSL/TLS problems, iRule errors, persistence issues, and performance degradation. Use when a VIP is not responding, pool members are marked down, users report SSL errors, the application is slow, or iRule TCL errors appear in logs.
npx skillsauth add automateyournetwork/netclaw f5-troubleshootInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The F5 MCP server provides 6 tools. Call them via mcp-call with the required environment variables:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" <tool_name> '{"param":"value"}'
| Tool | Purpose | When to Use |
|------|---------|-------------|
| list_tool | List and inspect object configuration | Verify config is correct |
| show_stats_tool | Show live statistics and counters | Identify traffic flow issues |
| show_logs_tool | Show system logs | Find errors and event correlation |
| update_tool | Modify object configuration | Apply fixes |
| create_tool | Create new objects | Add missing objects |
| delete_tool | Remove objects | Remove problematic objects |
Clients report they cannot connect to the application VIP.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Check:
enabled: true? If disabled, someone took it out of service.destination (VIP:port) correct?pool assigned?sourceAddressTranslation configured? (Without SNAT/automap, return traffic may bypass the BIG-IP.)Decision tree:
update_tool with {"enabled":true}update_tool with {"pool":"pool_name"}IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Analyze:
| Metric | Healthy Indicator | Problem Indicator |
|--------|-------------------|-------------------|
| Status availability | available | offline or unknown |
| Current connections | > 0 during business hours | 0 on production VIP |
| Total connections | Incrementing | Flat or zero |
| Client-side bits in | > 0 | Zero (no client traffic arriving) |
| Server-side bits out | > 0 | Zero (no traffic reaching backend) |
| Client bits in, server bits out = 0 | - | VIP not processing traffic at all |
| Client bits in > 0, server bits out = 0 | - | Traffic arriving but not forwarded to pool |
If status is offline:
The virtual server is marked down because the associated pool has no available members. Proceed to Step 3.
If current connections = 0 but status is available:
The VIP is healthy but no clients are connecting. The issue is upstream of the BIG-IP:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Check:
available? If all members are offline, the pool is down.enabled or disabled? Disabled members were intentionally drained.If all members are offline -> Go to "Pool Member Marked Down" section below.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"200"}'
Scan for:
01010028 -- No members available in pool (confirms pool down)01010025 -- Connection limit reached on virtual server0107142f -- SSL handshake failure01070417 -- HTTP parse error01010240 -- Connection queue fullIP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"profile"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
Check:
Health monitor is marking one or more pool members as offline.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Record: Which members are offline, which are available, which are disabled.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Analyze:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Scan for these patterns:
| Log Message | Meaning | Common Cause |
|-------------|---------|--------------|
| 01071681 Pool member ... monitor status down | Health check failed | Server not responding |
| 01071682 Pool member ... monitor status up | Health check recovered | Server came back |
| 01010028 No members available | All members down | Total pool failure |
| FQDN ... cannot be resolved | DNS resolution failure | DNS issue for FQDN pool members |
| monitor ... instance ... timed out | Monitor timeout | Server too slow or unreachable |
Common root causes for pool member down:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
From the pool config, identify the monitor name and verify:
GET /health HTTP/1.1\r\nHost: app.example.com\r\n\r\n)200 OK or healthy)*:* (use member address:port) or a specific IP:port?If the server is healthy but the monitor is wrong, fix the monitor:
Update the pool with a correct monitor:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"monitor":"tcp"},"object_type":"pool","object_name":"pool_webapp"}'
If a member needs to be temporarily removed (graceful drain):
Update the pool without the problematic member:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80"]},"object_type":"pool","object_name":"pool_webapp"}'
WARNING: This removes the member entirely. Existing connections will be terminated. For graceful drain, disable the member instead if the API supports it.
If a replacement member needs to be added:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80","10.1.1.14:80"]},"object_type":"pool","object_name":"pool_webapp"}'
Users report intermittent connectivity, session drops, or being load-balanced to a different server mid-session.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Check for connection limit issues:
connectionLimit set and being reached?clientsideCurConns near the limit?01010240)If connection limit is being hit:
Either increase the limit or scale out with additional pool members:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"connectionLimit":0},"object_type":"virtual","object_name":"vs_webapp_https"}'
Setting connectionLimit to 0 removes the limit entirely.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Persistence troubleshooting:
| Issue | Symptom | Resolution | |-------|---------|------------| | No persistence configured | Users lose session on every request | Add cookie or source-addr persistence | | Source-addr persistence with SNAT | All users from same SNAT IP go to same member | Switch to cookie persistence | | Cookie persistence but app on HTTP | Persistence cookie not inserted | Ensure HTTP profile is assigned | | Persistence timeout too short | Users lose session during idle | Increase persistence timeout | | Persistence timeout too long | Sessions stick to drained member | Lower timeout or use cookie | | Fallback persistence not set | When primary persistence fails, connections randomize | Set fallback persistence |
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
If one member has vastly more connections than others:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"300"}'
Scan for:
01010025 -- Connection limit reached01010240 -- Connection queue full01060102 -- Rate limit reachedTCL error -- iRule causing connection dropsreset cause -- Connection resets (RST) from server or BIG-IPUsers see certificate warnings, SSL handshake failures, or HTTPS connections fail entirely.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"profile"}'
Check the SSL client profile assigned to the virtual server:
Common SSL issues:
| Issue | Symptom | Log Pattern |
|-------|---------|-------------|
| Expired certificate | Browser shows "Not Secure" | 0107142f SSL handshake failed |
| Wrong certificate (hostname mismatch) | Browser shows certificate warning | Client disconnects after handshake |
| Missing intermediate CA | Works in some browsers, fails in others | 0107143c certificate verification failed |
| Weak cipher suite only | Modern browsers refuse to connect | 0107142f with no common cipher |
| TLS version mismatch | Client can't negotiate | 0107142f protocol version |
| Client cert required but not sent | Connection refused | 01071065 peer did not return certificate |
| SNI misconfiguration | Wrong cert served for hostname | Client sees cert for different domain |
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Verify the correct SSL profile is assigned in the profiles list with context: clientside.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"300"}'
Key SSL log messages:
| Log Code | Meaning | Action |
|----------|---------|--------|
| 0107142f | SSL handshake failure | Check cipher/version/cert compatibility |
| 0107143c | Certificate verification failure | Check cert chain completeness |
| 01071065 | Peer certificate missing | Client cert auth configured but client has no cert |
| 01070417 | HTTP request on HTTPS port | Client sending plain HTTP to SSL VIP |
| SSL routines:ssl3_read_bytes:sslv3 alert | SSL alert received from peer | Version/cipher mismatch |
Update SSL profile ciphers to modern standards:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"ciphers":"TLSv1.2:TLSv1.3:!SSLv3:!RC4:!3DES:!EXPORT"},"object_type":"profile","object_name":"clientssl_webapp"}'
Assign the correct SSL profile to a virtual server:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"profiles":[{"name":"clientssl_webapp","context":"clientside"},{"name":"http"},{"name":"tcp-wan-optimized","context":"clientside"},{"name":"tcp-lan-optimized","context":"serverside"}]},"object_type":"virtual","object_name":"vs_webapp_https"}'
WARNING: The profiles list is a full replacement. Include ALL desired profiles.
Logs show TCL errors or iRule-related failures.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Scan for iRule error patterns:
| Pattern | Meaning | Common Cause |
|---------|---------|--------------|
| TCL error | Tcl script runtime error | Syntax error, undefined variable, missing command |
| can't read "variable" | Variable not defined | Variable used before assignment or in wrong event |
| command not found | Invalid Tcl or iRule command | Typo or deprecated command |
| HTTP::collect without HTTP::release | Payload collection started but never released | Missing release in all code paths (memory leak) |
| invalid command name "pool" | Pool command in wrong event | pool used outside HTTP_REQUEST event |
| too many re-entering calls | Recursive iRule invocation | iRule triggering itself |
| exceeded CPU time limit | iRule taking too long | Complex regex or infinite loop |
| abort | iRule explicitly aborted | Error condition in catch block |
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
Cross-reference the iRule name from the log error with the iRule inventory. Check which virtual servers have this iRule assigned.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"problematic_irule","object_type":"irule"}'
Common iRule bugs to check for:
HTTP::collect without corresponding HTTP::release in all branchesdefault case in switch statementslog statements in high-traffic events (performance issue, not error)catch) around operations that can failIP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"apiAnonymous":"when HTTP_REQUEST {\n catch {\n switch -glob [string tolower [HTTP::uri]] {\n \"/api/*\" { pool pool_api_backend }\n default { pool pool_webapp }\n }\n } err {\n log local0. \"iRule error: $err\"\n pool pool_webapp\n }\n}"},"object_type":"irule","object_name":"uri_routing"}'
Alternatively, if the iRule is causing critical failures, remove it from the virtual server immediately:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"rules":[]},"object_type":"virtual","object_name":"vs_webapp_https"}'
This removes all iRules from the virtual server. Traffic will flow to the default pool without any iRule processing. Fix the iRule, then re-attach it.
Application is slow, high latency, or throughput has dropped.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Look for:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Look for:
If distribution is uneven, consider changing load balancing:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"loadBalancingMode":"least-connections-member"},"object_type":"pool","object_name":"pool_webapp"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
If members are down, the remaining members are handling more traffic than designed. This is the most common cause of "slow application" reports -- not a BIG-IP issue but a capacity issue.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Performance-related log patterns:
| Pattern | Meaning | Action |
|---------|---------|--------|
| 01010025 | Connection limit reached | Increase limit or add capacity |
| 01010240 | Connection queue full | Increase queue depth or backend capacity |
| 01060102 | Rate limit reached | Review rate limiting config |
| 01070727 | Pool member rate limit | Member receiving too much traffic |
| memory | BIG-IP memory pressure | Check for memory leaks, iRule issues |
| disk_usage | BIG-IP disk pressure | Check for log rotation issues |
| tmm_semaphore | TMM (Traffic Management Microkernel) contention | BIG-IP itself is overloaded |
| aggressive_mode | Memory aggressive mode enabled | BIG-IP is under severe memory pressure |
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
iRule performance killers:
log statements on every request -> Disk I/O bottleneckHTTP::collect large payloads -> Memory consumptionDNS::lookup in data path -> Blocking operation, adds latencypersist uie with large strings -> Persistence table bloatIf the root cause is insufficient backend capacity, add more pool members:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80","10.1.1.12:80","10.1.1.13:80","10.1.1.14:80"]},"object_type":"pool","object_name":"pool_webapp"}'
WARNING: Members list is a full replacement. Include ALL desired members (existing + new).
Logs indicate high-availability state changes, failover events, or configuration sync failures.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
HA-related log patterns:
| Pattern | Severity | Meaning |
|---------|----------|---------|
| ha_status active -> standby | CRITICAL | This unit has gone standby -- failover occurred |
| ha_status standby -> active | CRITICAL | This unit has become active -- peer failed |
| failover | CRITICAL | Failover event in progress |
| config_sync failed | HIGH | Configuration not synchronizing between peers |
| device_trust | HIGH | Device trust certificate issue |
| heartbeat lost | CRITICAL | HA heartbeat lost -- peer may be down |
| network_failover | CRITICAL | Network-based failover triggered |
After any failover event, immediately verify all virtual servers and pools:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"virtual"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"pool"}'
Confirm all virtual servers are available and all pool members are healthy on the now-active unit.
| Code | Severity | Meaning | First Action |
|------|----------|---------|--------------|
| 01010025 | HIGH | VS connection limit reached | Check stats, increase limit |
| 01010028 | CRITICAL | No pool members available | Check pool health |
| 01010029 | CRITICAL | Pool member monitor down | Check member + monitor |
| 01010240 | HIGH | Connection queue full | Check capacity |
| 01060102 | HIGH | Rate limit reached | Review rate config |
| 0107142f | CRITICAL | SSL handshake failure | Check cert + ciphers |
| 01070417 | HIGH | HTTP parse error | Check client requests |
| 0107143c | WARNING | Cert verification fail | Check cert chain |
| 01071681 | WARNING | Pool member marked down | Check member health |
| 01071682 | INFO | Pool member marked up | Recovery event |
| 01070727 | WARNING | Member rate limit | Check distribution |
| TCL error | HIGH | iRule error | Check iRule code |
Client reports application down
|
+-> Check VIP status (list_tool + show_stats_tool virtual)
|
+-> VIP offline?
| +-> Check pool (list_tool + show_stats_tool pool)
| +-> All members down? -> Check servers + monitors
| +-> Some members down? -> Reduced capacity, check remaining
| +-> No pool assigned? -> Assign pool (update_tool)
|
+-> VIP available but 0 connections?
| +-> DNS, firewall, or routing issue upstream of BIG-IP
|
+-> VIP available, connections present, but errors?
+-> Check logs (show_logs_tool)
+-> SSL errors? -> Check profiles + certs
+-> HTTP errors? -> Check iRules + backend health
+-> Connection limits? -> Scale out or increase limits
| Skill | Integration Point | |-------|------------------| | f5-health-check | Run health check first to scope the problem | | f5-config-mgmt | Apply fixes using proper change workflow | | servicenow-change-workflow | Create incident tickets for CRITICAL findings | | drawio-diagram | Visualize traffic flow for complex troubleshooting | | markmap-viz | Create troubleshooting decision trees |
After completing a troubleshooting session, record findings and resolution in GAIT:
python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"prompt":"F5 troubleshoot: vs_webapp_https not responding to clients","response":"Investigation: VIP status offline due to pool_webapp all members down. Root cause: HTTP health monitor expecting 200 but app returning 301 redirect after deployment. Fix: updated monitor receive string to accept 301. Verification: all 3 pool members now available, VIP status available, client connections incrementing. Logs clear of 01010028 errors.","artifacts":["f5-troubleshoot-report.txt"]}'
testing
Human-in-the-loop escalation via HumanRail — route low-confidence agent decisions, pre-destructive operation approvals, and ambiguous incident tickets to real human engineers. Human answers are verified and returned as structured output. Workers are paid via Lightning Network. Use when the agent is uncertain, when a destructive change needs explicit human sign-off beyond a ServiceNow CR, or when an ambiguous ticket requires human triage before automated handling.
testing
Manage EVE-NG node lifecycle. Use when listing nodes, checking runtime state, creating or deleting nodes, starting or stopping nodes or whole labs, verifying node details, or wiping node NVRAM back to factory defaults.
development
Manage EVE-NG labs and platform inventory. Use when listing labs, checking lab metadata, creating or deleting labs, importing or exporting lab archives, checking EVE-NG health or auth, or verifying available node images before build work.
tools
Execute live CLI commands on running EVE-NG nodes over telnet console. Use when running show commands, making live config changes, verifying protocol state, testing connectivity, checking console readiness, or interacting with IOS, Junos, VPCS, EOS, or NX-OS nodes.