Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

All Catalyst Center tool calls use this invocation pattern:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT

Variable shorthand used throughout this document:

CATC_CMD="CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT"

How to Call Tools

Use the $MCP_CALL protocol handler to invoke MCP tools:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" TOOL_NAME 'ARGS_JSON'

When to Use

Help desk escalation: user reports "I can't connect"
NOC alert: device unreachable in Catalyst Center
Client connectivity complaints (wired or wireless)
Site-wide outage affecting multiple users
Wireless roaming issues or poor signal complaints
Interface down investigations
Post-change verification when something goes wrong
Correlation between Catalyst Center data and live device state via pyATS

Troubleshooting Principles

Define the problem -- What exactly is reported? Single user, multiple users, entire site? Wired or wireless? When did it start?
Scope the impact -- Use Catalyst Center client counts and device reachability to quantify: how many clients affected? How many devices unreachable?
Gather data from the controller -- Catalyst Center provides a controller-level view that covers hundreds of devices simultaneously. Start here.
Correlate with live device state -- When Catalyst Center data is stale or insufficient, escalate to pyATS for real-time CLI data.
Isolate the fault domain -- Is it the client, the access layer, the distribution/core, or a WAN/upstream issue?
Resolve and verify -- Fix the issue, confirm via both Catalyst Center and pyATS, document.

Issue 1: Device Unreachable

Trigger: Catalyst Center shows one or more devices with reachabilityStatus: Unreachable.

Decision Tree

Device Unreachable in Catalyst Center
|
+-- Step 1: Confirm the scope
|   How many devices are unreachable?
|   Are they at the same site?
|   |
|   +-- Single device unreachable
|   |   --> Go to "Single Device Investigation"
|   |
|   +-- Multiple devices at same site
|   |   --> Go to "Site-Wide Outage Triage" (Issue 4)
|   |
|   +-- Multiple devices across sites
|       --> Go to "Catalyst Center or Upstream Issue"
|
+-- Step 2: Single Device Investigation
|   |
|   +-- Check device details in CatC
|   +-- Check last update time
|   +-- Check collection status and error code
|   +-- Attempt pyATS connectivity test
|   |
|   +-- pyATS connects successfully?
|   |   |
|   |   +-- YES: CatC polling issue (SNMP/NETCONF credentials, ACL)
|   |   +-- NO: Device is truly unreachable
|   |       |
|   |       +-- Ping from adjacent device (pyATS)
|   |       +-- Check upstream interface state
|   |       +-- Check ARP/MAC table on upstream switch
|   |       +-- Physical layer: power, cable, SFP

Step 1: Identify All Unreachable Devices

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"reachabilityStatus":["Unreachable"]}'

Record for each device: hostname, managementIpAddress, platformId, locationName, lastUpdated, collectionStatus, errorCode.

Step 2: Check If It Is Site-Localized

# Get the site hierarchy to understand the topology
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_sites '{}'

Group unreachable devices by locationName. If multiple devices at the same site are unreachable, skip to Issue 4 (Site-Wide Outage).

Step 3: Inspect the Specific Device

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["UNREACHABLE-SW-01"]}'

Analyze:

lastUpdated -- How long has it been unreachable? Minutes = recent event. Hours/days = chronic issue.
collectionStatus -- Could Not Synchronize vs Partial Collection Failure vs Managed
errorCode -- Specific error from Catalyst Center's collection engine

Step 4: Check Interfaces on the Unreachable Device (If CatC Has Cached Data)

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID-from-step-3>"}'

NOTE: This returns the last-known interface state. If the device became unreachable recently, this data may still reflect the pre-failure state. Look for interfaces that were already showing errors before the device went dark.

Step 5: Escalate to pyATS for Live Validation

When $PYATS_MCP_SCRIPT and $PYATS_TESTBED_PATH are available and the device is in the pyATS testbed:

# Attempt to connect and get basic state
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UNREACHABLE-SW-01","command":"show ip interface brief"}'

If pyATS connects but CatC cannot reach the device:

The device is alive but Catalyst Center's polling is blocked
Check SNMP community strings, NETCONF/RESTCONF configuration
Check if an ACL on the device is blocking the CatC management IP
Verify the management VRF routing to the CatC appliance

If pyATS also cannot connect:

The device is truly unreachable from the management network
Ping from an adjacent device in pyATS:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"UPSTREAM-SW-01","command":"ping 10.1.10.1"}'

Check the upstream switch's interface to the unreachable device:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show interfaces GigabitEthernet1/0/1"}'

Check ARP and MAC table on the upstream switch:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show arp"}'

Common Root Causes: Device Unreachable

| Root Cause | CatC Indicators | pyATS Indicators | Resolution | |------------|-----------------|-------------------|------------| | Device powered off | Unreachable, no updates | Connection refused | Check power, PDU, UPS | | Management interface down | Unreachable | Cannot SSH | Verify mgmt interface config via console | | SNMP credential mismatch | Partial Collection Failure | pyATS connects OK | Fix SNMP community on device or CatC | | ACL blocking CatC | Unreachable from CatC | pyATS connects OK | Update ACL to permit CatC management IPs | | Routing issue to mgmt subnet | Unreachable | Ping fails from adjacent device | Check routing table, management VRF | | Upstream link failure | Multiple devices unreachable | Upstream interface down | Physical layer: cable, SFP, port | | Device crash/reload | Recently unreachable | Boot messages in logs | Check show logging, show version uptime |

Issue 2: Client Connectivity Problems

Trigger: User reports cannot connect to the network, slow connectivity, or intermittent drops.

Decision Tree

Client Connectivity Issue
|
+-- Step 1: Identify the client
|   Do we have the MAC address, IP address, or username?
|   |
|   +-- MAC known --> get_client_details_by_mac
|   +-- IP known --> get_clients_list with ipv4_address filter, then get details by MAC
|   +-- Only username/location --> get_clients_list filtered by site
|
+-- Step 2: Is the client visible in CatC?
|   |
|   +-- YES: Client is associating/authenticating
|   |   |
|   |   +-- Check health score
|   |   +-- Wired or wireless?
|   |   |   |
|   |   |   +-- WIRED --> Check connected switch port, VLAN, duplex
|   |   |   +-- WIRELESS --> Check RSSI, SNR, band, AP, SSID
|   |   |
|   |   +-- Check IP assignment (DHCP working?)
|   |   +-- Check connected network device status
|   |
|   +-- NO: Client not visible
|       |
|       +-- Is the access switch/AP reachable?
|       +-- Is the client physically connected (cable/Wi-Fi enabled)?
|       +-- Is 802.1X/MAB failing? (check ISE if available)
|       +-- Is the SSID broadcasting?

Step 1: Get Time Range

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 4 hours"}'

Step 2: Look Up the Client

By MAC address (preferred):

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

By IP address:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"ipv4_address":["10.1.50.42"]}'

Step 3: Analyze Client Data

For wired clients, check:

connectedNetworkDeviceName -- Which switch is the client on?
connectedNetworkDeviceInterfaceName -- Which port?
vlanId -- Correct VLAN assignment?
ipv4Address -- Valid IP? Or APIPA (169.254.x.x) indicating DHCP failure?
healthScore -- Score 0-10. Below 7 = issue.

For wireless clients, check:

ssid -- Connected to the correct SSID?
band -- 2.4 GHz or 5 GHz?
rssi -- Signal strength (see RSSI table below)
snr -- Signal-to-noise ratio
dataRate -- Current data rate in Mbps
connectedNetworkDeviceName -- Which AP?
healthScore -- Client health score

Step 4: Check the Access Device

If the client's access switch or AP shows issues:

# Is the access device reachable?
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["ACC-SW-01"]}'

# Get the switch's interfaces to check the client-facing port
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<switch-UUID>"}'

Step 5: Escalate to pyATS for Switch-Port Level Troubleshooting

When CatC data is insufficient (e.g., you need to check port security, 802.1X session state, or MAC address table):

# Check the specific switchport
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show interfaces GigabitEthernet1/0/15"}'

# Check authentication sessions (802.1X/MAB)
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show authentication sessions"}'

# Check MAC address table for the client
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show mac address-table"}'

# Check DHCP snooping bindings
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show ip dhcp snooping binding"}'

# Check port-security
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show port-security"}'

Common Root Causes: Client Connectivity

| Root Cause | CatC Indicators | pyATS/CLI Indicators | Resolution | |------------|-----------------|----------------------|------------| | DHCP exhaustion | Client has 169.254.x.x IP | show ip dhcp pool shows 0 free | Expand DHCP scope or reduce lease time | | Port security violation | Client not visible | err-disabled state on port | Clear port-security, investigate | | 802.1X failure | Client not visible | Authentication failed in ISE | Check credentials, certificate, RADIUS | | Wrong VLAN | Client visible but no connectivity | Port in wrong VLAN | Correct VLAN assignment | | Duplex mismatch | Poor health score (wired) | Half-duplex on one end | Set both ends to auto or match manually | | Wireless: low RSSI | Low health score, low RSSI | N/A (wireless) | Move closer to AP, add AP coverage | | Wireless: co-channel | Intermittent drops | N/A (wireless) | Adjust channel plan, reduce AP power | | AP overloaded | Multiple clients with issues | N/A (wireless) | Load balance, add AP capacity | | Upstream link failure | Multiple clients affected | Interface down on distribution | Physical layer or routing issue |

Issue 3: Interface Down Analysis

Trigger: Catalyst Center shows interfaces in admin up / operationally down state, or CatC alerts on interface status changes.

Step 1: Get All Interfaces on the Device

# Find the device first
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-01"]}'

# Fetch interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID>"}'

Step 2: Identify Problem Interfaces

Filter the interface response for:

adminStatus: "UP" AND status: "down" -- Operationally failed interfaces (admin enabled but link is down)
High error counters in the interface data
Interfaces with no description (potential unused/misconfigured ports)

Step 3: Classify the Interface

| Interface Type | Impact | Urgency | |---------------|--------|---------| | Uplink to distribution/core | CRITICAL: Affects all downstream devices and clients | Immediate | | Inter-switch link (trunk) | HIGH: May break VLAN spanning, STP reconvergence | Immediate | | Access port (user-facing) | MEDIUM: Single user affected | Standard | | Management interface | HIGH: Loses management access to the device | Urgent | | Loopback | HIGH if used for routing (OSPF RID, BGP update-source) | Urgent |

Step 4: Escalate to pyATS for Detailed Interface Diagnostics

# Detailed interface statistics
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces TenGigabitEthernet1/0/1"}'

# Check for recent log messages about the interface
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-01"}'

# Check SFP/transceiver status
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces transceiver"}'

# Check EtherChannel status if applicable
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show etherchannel summary"}'

# Check spanning-tree for blocked ports
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show spanning-tree"}'

Interface Down Root Causes

| Symptom | CatC Data | CLI Investigation | Likely Cause | |---------|-----------|-------------------|-------------| | admin UP, oper DOWN | Status mismatch in CatC | No link pulse on interface | Bad cable, SFP, remote end shut | | CRC errors incrementing | Error counters in interface data | show interfaces CRC count | Faulty cable, bad SFP, duplex mismatch | | Err-disabled | Port not visible or shows errors | show interfaces status err-disabled | Port-security, BPDU guard, storm-control | | Flapping (up/down cycles) | Multiple status changes | show logging UPDOWN messages | Loose cable, auto-negotiation failure | | STP blocked | Port up but no traffic | show spanning-tree BLK state | STP topology issue, loop detected |

Issue 4: Site-Wide Outage Triage

Trigger: Multiple users at a single site report connectivity loss, or Catalyst Center shows multiple devices unreachable at the same location.

Decision Tree

Site-Wide Outage
|
+-- Step 1: Quantify the impact
|   How many devices unreachable at this site?
|   How many clients affected?
|   |
|   +-- ALL devices unreachable at site
|   |   --> WAN link failure, upstream router, or site power outage
|   |
|   +-- SOME devices unreachable
|   |   --> Distribution layer or IDF/MDF issue
|   |
|   +-- Devices reachable but clients can't connect
|       --> DHCP, VLAN, wireless controller, or authentication issue
|
+-- Step 2: Identify the failure boundary
|   Which devices ARE still reachable?
|   What is the common upstream for the failed devices?
|
+-- Step 3: Investigate the upstream device
|   Check interfaces, routing, logs on the common upstream
|
+-- Step 4: Resolve and verify
    Fix the issue, confirm all devices come back, verify client counts recover

Step 1: Assess Site Device Status

# Get all devices at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"locationName":["Global/USA/NYC/Floor3"]}'

Categorize the results:

Count reachable vs unreachable devices
Identify device roles (ACCESS, DISTRIBUTION, CORE)
If the distribution switch is unreachable, all downstream access switches will also be unreachable

Step 2: Check Client Impact

# Get time range
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 1 hours"}'

# Count clients at the affected site NOW
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_count '{"start_time":1705312800000,"end_time":1705399200000,"site_hierarchy":["Global/USA/NYC/Floor3"]}'

# Compare with a baseline (e.g., same time window yesterday)
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"yesterday"}'

Impact assessment:

Current client count vs expected baseline
If client count dropped to 0 or near-0: complete site outage
If client count dropped by 50%: partial outage (one IDF or floor affected)

Step 3: Find the Failure Boundary

Check the upstream/distribution devices that serve the affected site:

# Check the distribution switch
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-NYC"]}'

# Check its interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<DIST-UUID>"}'

Step 4: Escalate to pyATS for the Reachable Upstream

If the distribution/core device is still reachable:

# Check all interfaces on the distribution switch
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip interface brief"}'

# Check routing table -- are routes to the affected site present?
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip route"}'

# Check OSPF/BGP adjacencies
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip ospf neighbor"}'

# Check for recent events in the logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-NYC"}'

# Ping the unreachable access switches from the distribution
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"DIST-SW-NYC","command":"ping 10.1.30.1"}'

Site Outage Root Causes

| Scope | Likely Cause | Key Evidence | Resolution | |-------|-------------|-------------|------------| | All devices + all clients | Site power outage | All devices unreachable, no response to pings | Dispatch facilities team, check UPS/PDU | | All devices + all clients | WAN link failure | Edge router reachable but uplink down | Check ISP, activate backup WAN if available | | One floor/IDF | Distribution switch failure | Dist switch unreachable, access switches behind it down | Power cycle, console access, hardware swap | | One floor/IDF | Trunk link failure | Access switches up but no VLAN connectivity | Check trunk port, SFP, cable between access and dist | | Devices up but no clients | DHCP failure | Clients getting APIPA addresses | Check DHCP server, ip helper-address, DHCP relay | | Devices up, wireless clients down | WLC issue | APs reachable but no SSID broadcast | Check WLC status, AP join status | | Devices up, some clients down | VLAN issue | Specific VLAN clients affected | Check VLAN trunking, SVI status |

Issue 5: Wireless Client Roaming Issues

Trigger: Wireless users report drops when moving between floors or areas, or CatC shows poor wireless health scores.

Step 1: Get Client History

# Extended time range to capture roaming events
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 8 hours"}'

# Get detailed client info with wireless views
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

Step 2: Analyze Roaming Indicators

Look for:

AP changes -- Frequent AP name changes indicate roaming events
RSSI drops -- RSSI dipping below -70 dBm before roaming = sticky client (not roaming early enough)
Band changes -- Client flipping between 2.4 GHz and 5 GHz during movement
SSID consistency -- Client should stay on the same SSID during roaming

Step 3: Check AP Coverage at the Problem Area

# List all APs at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"family":["Unified AP"],"locationName":["Global/USA/NYC/Floor3"]}'

# Count wireless clients per AP to find overloaded APs
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"client_type":"wireless","site_hierarchy":["Global/USA/NYC/Floor3"]}'

Roaming Issue Root Causes

| Symptom | Likely Cause | Resolution | |---------|-------------|------------| | Client holds onto far AP (sticky client) | 802.11k/v not enabled or client doesn't support it | Enable Optimized Roaming, BSS Transition Management | | Client drops during roam | 802.11r (FT) not enabled, or PMK caching disabled | Enable Fast Transition (FT), enable CCKM/PMK caching | | Client roams to 2.4 GHz | Band steering not aggressive enough | Increase band steering threshold, check client capability | | Dead zone between APs | RF coverage gap | Add AP, increase power (carefully), adjust antenna | | Client bounces between 2 APs | Equal signal from both APs | Reduce power on one AP, adjust cell boundaries |

Issue 6: Catalyst Center Collection or API Issues

Trigger: CatC data seems stale, APIs return unexpected errors, or collectionStatus shows failures across many devices.

Check Collection Failures

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Partial Collection Failure"]}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Could Not Synchronize"]}'

API Time Range Errors

If client API calls return error 14013 (start time > 30 days ago):

Always use get_api_compatible_time_range first to ensure the time range is valid
The tool automatically clamps start times to the 30-day boundary

If client API calls return error 14006 (data not ready for endTime):

The get_clients_list, get_client_details_by_mac, and get_clients_count tools automatically retry with the API-suggested adjusted endTime
This is normal behavior when querying very recent data (within the last few minutes)

Troubleshooting Report Format

Always produce a structured findings report after completing a troubleshooting session:

Troubleshooting Report
=======================
Catalyst Center: $CCC_HOST
Timestamp: YYYY-MM-DD HH:MM UTC
Ticket/Case: [reference number if applicable]

Problem Statement
-----------------
[What was reported, by whom, when it started]

Impact Assessment
-----------------
Devices affected: X unreachable out of Y total at site
Clients affected: ~Z clients (estimated from client count delta)
Sites affected: [list]
Severity: CRITICAL / HIGH / MEDIUM / LOW

Investigation Timeline
-----------------------
1. [HH:MM] Checked CatC device reachability -- found X devices unreachable
2. [HH:MM] Identified site-localized issue at Global/USA/NYC/Floor3
3. [HH:MM] Checked distribution switch DIST-SW-NYC -- reachable
4. [HH:MM] Found TenGig1/0/1 (uplink to Floor3 IDF) down/down
5. [HH:MM] pyATS logs show %LINK-3-UPDOWN at 14:23 UTC
6. [HH:MM] SFP transceiver showing rx power below threshold

Root Cause
----------
Failing SFP transceiver on DIST-SW-NYC TenGigabitEthernet1/0/1
(uplink to Floor3 access layer IDF)

Resolution
----------
[Steps taken to resolve, or escalation path if unresolved]

Verification
-----------
- All Floor3 access switches returned to Reachable in CatC
- Client count at Floor3 recovered to baseline (335 clients)
- No further interface flaps observed in 30-minute monitoring window

Preventive Measures
--------------------
- Schedule optical monitoring for all uplink SFPs
- Add redundant uplink to Floor3 IDF (single point of failure identified)

GAIT Audit Trail

After completing any troubleshooting session, record the findings and resolution in GAIT:

python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"input":{"role":"assistant","content":"Troubleshooting: Site-wide outage at Global/USA/NYC/Floor3. Root cause: Failing SFP on DIST-SW-NYC Te1/0/1. Impact: 335 clients, 5 access switches unreachable for 47 minutes. Resolution: SFP replaced, services restored. Preventive: Redundant uplink recommended.","artifacts":[]}}'

Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

All Catalyst Center tool calls use this invocation pattern:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT

Variable shorthand used throughout this document:

CATC_CMD="CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT"

How to Call Tools

Use the $MCP_CALL protocol handler to invoke MCP tools:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" TOOL_NAME 'ARGS_JSON'

When to Use

Help desk escalation: user reports "I can't connect"
NOC alert: device unreachable in Catalyst Center
Client connectivity complaints (wired or wireless)
Site-wide outage affecting multiple users
Wireless roaming issues or poor signal complaints
Interface down investigations
Post-change verification when something goes wrong
Correlation between Catalyst Center data and live device state via pyATS

Troubleshooting Principles

Define the problem -- What exactly is reported? Single user, multiple users, entire site? Wired or wireless? When did it start?
Scope the impact -- Use Catalyst Center client counts and device reachability to quantify: how many clients affected? How many devices unreachable?
Gather data from the controller -- Catalyst Center provides a controller-level view that covers hundreds of devices simultaneously. Start here.
Correlate with live device state -- When Catalyst Center data is stale or insufficient, escalate to pyATS for real-time CLI data.
Isolate the fault domain -- Is it the client, the access layer, the distribution/core, or a WAN/upstream issue?
Resolve and verify -- Fix the issue, confirm via both Catalyst Center and pyATS, document.

Issue 1: Device Unreachable

Trigger: Catalyst Center shows one or more devices with reachabilityStatus: Unreachable.

Decision Tree

Device Unreachable in Catalyst Center
|
+-- Step 1: Confirm the scope
|   How many devices are unreachable?
|   Are they at the same site?
|   |
|   +-- Single device unreachable
|   |   --> Go to "Single Device Investigation"
|   |
|   +-- Multiple devices at same site
|   |   --> Go to "Site-Wide Outage Triage" (Issue 4)
|   |
|   +-- Multiple devices across sites
|       --> Go to "Catalyst Center or Upstream Issue"
|
+-- Step 2: Single Device Investigation
|   |
|   +-- Check device details in CatC
|   +-- Check last update time
|   +-- Check collection status and error code
|   +-- Attempt pyATS connectivity test
|   |
|   +-- pyATS connects successfully?
|   |   |
|   |   +-- YES: CatC polling issue (SNMP/NETCONF credentials, ACL)
|   |   +-- NO: Device is truly unreachable
|   |       |
|   |       +-- Ping from adjacent device (pyATS)
|   |       +-- Check upstream interface state
|   |       +-- Check ARP/MAC table on upstream switch
|   |       +-- Physical layer: power, cable, SFP

Step 1: Identify All Unreachable Devices

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"reachabilityStatus":["Unreachable"]}'

Record for each device: hostname, managementIpAddress, platformId, locationName, lastUpdated, collectionStatus, errorCode.

Step 2: Check If It Is Site-Localized

# Get the site hierarchy to understand the topology
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_sites '{}'

Group unreachable devices by locationName. If multiple devices at the same site are unreachable, skip to Issue 4 (Site-Wide Outage).

Step 3: Inspect the Specific Device

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["UNREACHABLE-SW-01"]}'

Analyze:

lastUpdated -- How long has it been unreachable? Minutes = recent event. Hours/days = chronic issue.
collectionStatus -- Could Not Synchronize vs Partial Collection Failure vs Managed
errorCode -- Specific error from Catalyst Center's collection engine

Step 4: Check Interfaces on the Unreachable Device (If CatC Has Cached Data)

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID-from-step-3>"}'

Step 5: Escalate to pyATS for Live Validation

When $PYATS_MCP_SCRIPT and $PYATS_TESTBED_PATH are available and the device is in the pyATS testbed:

# Attempt to connect and get basic state
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UNREACHABLE-SW-01","command":"show ip interface brief"}'

If pyATS connects but CatC cannot reach the device:

The device is alive but Catalyst Center's polling is blocked
Check SNMP community strings, NETCONF/RESTCONF configuration
Check if an ACL on the device is blocking the CatC management IP
Verify the management VRF routing to the CatC appliance

If pyATS also cannot connect:

The device is truly unreachable from the management network
Ping from an adjacent device in pyATS:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"UPSTREAM-SW-01","command":"ping 10.1.10.1"}'

Check the upstream switch's interface to the unreachable device:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show interfaces GigabitEthernet1/0/1"}'

Check ARP and MAC table on the upstream switch:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show arp"}'

Common Root Causes: Device Unreachable

Issue 2: Client Connectivity Problems

Trigger: User reports cannot connect to the network, slow connectivity, or intermittent drops.

Decision Tree

Client Connectivity Issue
|
+-- Step 1: Identify the client
|   Do we have the MAC address, IP address, or username?
|   |
|   +-- MAC known --> get_client_details_by_mac
|   +-- IP known --> get_clients_list with ipv4_address filter, then get details by MAC
|   +-- Only username/location --> get_clients_list filtered by site
|
+-- Step 2: Is the client visible in CatC?
|   |
|   +-- YES: Client is associating/authenticating
|   |   |
|   |   +-- Check health score
|   |   +-- Wired or wireless?
|   |   |   |
|   |   |   +-- WIRED --> Check connected switch port, VLAN, duplex
|   |   |   +-- WIRELESS --> Check RSSI, SNR, band, AP, SSID
|   |   |
|   |   +-- Check IP assignment (DHCP working?)
|   |   +-- Check connected network device status
|   |
|   +-- NO: Client not visible
|       |
|       +-- Is the access switch/AP reachable?
|       +-- Is the client physically connected (cable/Wi-Fi enabled)?
|       +-- Is 802.1X/MAB failing? (check ISE if available)
|       +-- Is the SSID broadcasting?

Step 1: Get Time Range

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 4 hours"}'

Step 2: Look Up the Client

By MAC address (preferred):

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

By IP address:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"ipv4_address":["10.1.50.42"]}'

Step 3: Analyze Client Data

For wired clients, check:

connectedNetworkDeviceName -- Which switch is the client on?
connectedNetworkDeviceInterfaceName -- Which port?
vlanId -- Correct VLAN assignment?
ipv4Address -- Valid IP? Or APIPA (169.254.x.x) indicating DHCP failure?
healthScore -- Score 0-10. Below 7 = issue.

For wireless clients, check:

ssid -- Connected to the correct SSID?
band -- 2.4 GHz or 5 GHz?
rssi -- Signal strength (see RSSI table below)
snr -- Signal-to-noise ratio
dataRate -- Current data rate in Mbps
connectedNetworkDeviceName -- Which AP?
healthScore -- Client health score

Step 4: Check the Access Device

If the client's access switch or AP shows issues:

# Is the access device reachable?
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["ACC-SW-01"]}'

# Get the switch's interfaces to check the client-facing port
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<switch-UUID>"}'

Step 5: Escalate to pyATS for Switch-Port Level Troubleshooting

When CatC data is insufficient (e.g., you need to check port security, 802.1X session state, or MAC address table):

# Check the specific switchport
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show interfaces GigabitEthernet1/0/15"}'

# Check authentication sessions (802.1X/MAB)
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show authentication sessions"}'

# Check MAC address table for the client
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show mac address-table"}'

# Check DHCP snooping bindings
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show ip dhcp snooping binding"}'

# Check port-security
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show port-security"}'

Common Root Causes: Client Connectivity

Issue 3: Interface Down Analysis

Trigger: Catalyst Center shows interfaces in admin up / operationally down state, or CatC alerts on interface status changes.

Step 1: Get All Interfaces on the Device

# Find the device first
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-01"]}'

# Fetch interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID>"}'

Step 2: Identify Problem Interfaces

Filter the interface response for:

adminStatus: "UP" AND status: "down" -- Operationally failed interfaces (admin enabled but link is down)
High error counters in the interface data
Interfaces with no description (potential unused/misconfigured ports)

Step 3: Classify the Interface

Step 4: Escalate to pyATS for Detailed Interface Diagnostics

# Detailed interface statistics
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces TenGigabitEthernet1/0/1"}'

# Check for recent log messages about the interface
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-01"}'

# Check SFP/transceiver status
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces transceiver"}'

# Check EtherChannel status if applicable
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show etherchannel summary"}'

# Check spanning-tree for blocked ports
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show spanning-tree"}'

Interface Down Root Causes

Issue 4: Site-Wide Outage Triage

Trigger: Multiple users at a single site report connectivity loss, or Catalyst Center shows multiple devices unreachable at the same location.

Decision Tree

Site-Wide Outage
|
+-- Step 1: Quantify the impact
|   How many devices unreachable at this site?
|   How many clients affected?
|   |
|   +-- ALL devices unreachable at site
|   |   --> WAN link failure, upstream router, or site power outage
|   |
|   +-- SOME devices unreachable
|   |   --> Distribution layer or IDF/MDF issue
|   |
|   +-- Devices reachable but clients can't connect
|       --> DHCP, VLAN, wireless controller, or authentication issue
|
+-- Step 2: Identify the failure boundary
|   Which devices ARE still reachable?
|   What is the common upstream for the failed devices?
|
+-- Step 3: Investigate the upstream device
|   Check interfaces, routing, logs on the common upstream
|
+-- Step 4: Resolve and verify
    Fix the issue, confirm all devices come back, verify client counts recover

Step 1: Assess Site Device Status

# Get all devices at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"locationName":["Global/USA/NYC/Floor3"]}'

Categorize the results:

Count reachable vs unreachable devices
Identify device roles (ACCESS, DISTRIBUTION, CORE)
If the distribution switch is unreachable, all downstream access switches will also be unreachable

Step 2: Check Client Impact

# Get time range
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 1 hours"}'

# Count clients at the affected site NOW
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_count '{"start_time":1705312800000,"end_time":1705399200000,"site_hierarchy":["Global/USA/NYC/Floor3"]}'

# Compare with a baseline (e.g., same time window yesterday)
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"yesterday"}'

Impact assessment:

Current client count vs expected baseline
If client count dropped to 0 or near-0: complete site outage
If client count dropped by 50%: partial outage (one IDF or floor affected)

Step 3: Find the Failure Boundary

Check the upstream/distribution devices that serve the affected site:

# Check the distribution switch
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-NYC"]}'

# Check its interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<DIST-UUID>"}'

Step 4: Escalate to pyATS for the Reachable Upstream

If the distribution/core device is still reachable:

# Check all interfaces on the distribution switch
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip interface brief"}'

# Check routing table -- are routes to the affected site present?
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip route"}'

# Check OSPF/BGP adjacencies
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip ospf neighbor"}'

# Check for recent events in the logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-NYC"}'

# Ping the unreachable access switches from the distribution
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "${PYATS_PYTHON:-python3} -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"DIST-SW-NYC","command":"ping 10.1.30.1"}'

Site Outage Root Causes

Issue 5: Wireless Client Roaming Issues

Trigger: Wireless users report drops when moving between floors or areas, or CatC shows poor wireless health scores.

Step 1: Get Client History

# Extended time range to capture roaming events
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 8 hours"}'

# Get detailed client info with wireless views
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

Step 2: Analyze Roaming Indicators

Look for:

AP changes -- Frequent AP name changes indicate roaming events
RSSI drops -- RSSI dipping below -70 dBm before roaming = sticky client (not roaming early enough)
Band changes -- Client flipping between 2.4 GHz and 5 GHz during movement
SSID consistency -- Client should stay on the same SSID during roaming

Step 3: Check AP Coverage at the Problem Area

# List all APs at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"family":["Unified AP"],"locationName":["Global/USA/NYC/Floor3"]}'

# Count wireless clients per AP to find overloaded APs
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"client_type":"wireless","site_hierarchy":["Global/USA/NYC/Floor3"]}'

Roaming Issue Root Causes

Issue 6: Catalyst Center Collection or API Issues

Trigger: CatC data seems stale, APIs return unexpected errors, or collectionStatus shows failures across many devices.

Check Collection Failures

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Partial Collection Failure"]}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Could Not Synchronize"]}'

API Time Range Errors

If client API calls return error 14013 (start time > 30 days ago):

Always use get_api_compatible_time_range first to ensure the time range is valid
The tool automatically clamps start times to the 30-day boundary

If client API calls return error 14006 (data not ready for endTime):

The get_clients_list, get_client_details_by_mac, and get_clients_count tools automatically retry with the API-suggested adjusted endTime
This is normal behavior when querying very recent data (within the last few minutes)

Troubleshooting Report Format

Always produce a structured findings report after completing a troubleshooting session:

Troubleshooting Report
=======================
Catalyst Center: $CCC_HOST
Timestamp: YYYY-MM-DD HH:MM UTC
Ticket/Case: [reference number if applicable]

Problem Statement
-----------------
[What was reported, by whom, when it started]

Impact Assessment
-----------------
Devices affected: X unreachable out of Y total at site
Clients affected: ~Z clients (estimated from client count delta)
Sites affected: [list]
Severity: CRITICAL / HIGH / MEDIUM / LOW

Investigation Timeline
-----------------------
1. [HH:MM] Checked CatC device reachability -- found X devices unreachable
2. [HH:MM] Identified site-localized issue at Global/USA/NYC/Floor3
3. [HH:MM] Checked distribution switch DIST-SW-NYC -- reachable
4. [HH:MM] Found TenGig1/0/1 (uplink to Floor3 IDF) down/down
5. [HH:MM] pyATS logs show %LINK-3-UPDOWN at 14:23 UTC
6. [HH:MM] SFP transceiver showing rx power below threshold

Root Cause
----------
Failing SFP transceiver on DIST-SW-NYC TenGigabitEthernet1/0/1
(uplink to Floor3 access layer IDF)

Resolution
----------
[Steps taken to resolve, or escalation path if unresolved]

Verification
-----------
- All Floor3 access switches returned to Reachable in CatC
- Client count at Floor3 recovered to baseline (335 clients)
- No further interface flaps observed in 30-minute monitoring window

Preventive Measures
--------------------
- Schedule optical monitoring for all uplink SFPs
- Add redundant uplink to Floor3 IDF (single point of failure identified)

GAIT Audit Trail

After completing any troubleshooting session, record the findings and resolution in GAIT:

python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"input":{"role":"assistant","content":"Troubleshooting: Site-wide outage at Global/USA/NYC/Floor3. Root cause: Failing SFP on DIST-SW-NYC Te1/0/1. Impact: 335 clients, 5 access switches unreachable for 47 minutes. Resolution: SFP replaced, services restored. Preventive: Redundant uplink recommended.","artifacts":[]}}'

Adoption

automateyournetwork/catc-troubleshoot

$ install --global

Security Scan Results

SKILL.md

Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

How to Call Tools

When to Use

Troubleshooting Principles

Issue 1: Device Unreachable

Decision Tree

Step 1: Identify All Unreachable Devices

Step 2: Check If It Is Site-Localized

Step 3: Inspect the Specific Device

Step 4: Check Interfaces on the Unreachable Device (If CatC Has Cached Data)

Step 5: Escalate to pyATS for Live Validation

Common Root Causes: Device Unreachable

Issue 2: Client Connectivity Problems

Decision Tree

Step 1: Get Time Range

Step 2: Look Up the Client

Step 3: Analyze Client Data

Step 4: Check the Access Device

Step 5: Escalate to pyATS for Switch-Port Level Troubleshooting

Common Root Causes: Client Connectivity

Issue 3: Interface Down Analysis

Step 1: Get All Interfaces on the Device

Step 2: Identify Problem Interfaces

Step 3: Classify the Interface

Step 4: Escalate to pyATS for Detailed Interface Diagnostics

Interface Down Root Causes

Issue 4: Site-Wide Outage Triage

Decision Tree

Step 1: Assess Site Device Status

Step 2: Check Client Impact

Step 3: Find the Failure Boundary

Step 4: Escalate to pyATS for the Reachable Upstream

Site Outage Root Causes

Issue 5: Wireless Client Roaming Issues

Step 1: Get Client History

Step 2: Analyze Roaming Indicators

Step 3: Check AP Coverage at the Problem Area

Roaming Issue Root Causes

Issue 6: Catalyst Center Collection or API Issues

Check Collection Failures

API Time Range Errors

Troubleshooting Report Format

GAIT Audit Trail

Related Skills

automateyournetwork/n2n-federation

automateyournetwork/ue5-network-viz

automateyournetwork/humanrail-escalation

automateyournetwork/subnet-calculator

automateyournetwork/catc-troubleshoot

$ install --global

Security Scan Results

SKILL.md

Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

How to Call Tools

When to Use

Troubleshooting Principles

Issue 1: Device Unreachable

Decision Tree

Step 1: Identify All Unreachable Devices

Step 2: Check If It Is Site-Localized

Step 3: Inspect the Specific Device

Step 4: Check Interfaces on the Unreachable Device (If CatC Has Cached Data)

Step 5: Escalate to pyATS for Live Validation

Common Root Causes: Device Unreachable

Issue 2: Client Connectivity Problems

Decision Tree

Step 1: Get Time Range

Step 2: Look Up the Client

Step 3: Analyze Client Data

Step 4: Check the Access Device

Step 5: Escalate to pyATS for Switch-Port Level Troubleshooting

Common Root Causes: Client Connectivity

Issue 3: Interface Down Analysis