workspace/skills/pyats-parallel-ops/SKILL.md
Fleet-wide parallel device operations - concurrent health checks, config audits, routing snapshots, severity-sorted reporting, and failure-isolated multi-device automation. Use when checking all devices at once, running bulk health checks, collecting configs from the entire fleet, or comparing state across multiple routers and switches.
npx skillsauth add automateyournetwork/netclaw pyats-parallel-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In OpenClaw, parallel execution (pCall) is achieved by listing multiple exec commands in a single response. The agent runtime dispatches them concurrently and collects all results before proceeding.
To run the same command on multiple devices in parallel, list the calls together:
# Device 1
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show version"}'
# Device 2
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show version"}'
# Device 3
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show version"}'
All three commands execute concurrently. Results arrive independently and are aggregated by the agent.
Always start by listing all devices in the testbed so you know what to operate on:
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_list_devices '{}'
This returns every device with its name, platform, OS, and connection details. Use this to build the device list for parallel operations.
Run a health check on all devices in the testbed concurrently.
Issue these commands simultaneously -- one set per device:
# R1 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show processes cpu sorted"}'
# R2 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show processes cpu sorted"}'
# SW1 - CPU and memory
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show processes cpu sorted"}'
Then in a second parallel wave, collect interface and NTP status:
# R1 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ip interface brief"}'
# R2 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show ip interface brief"}'
# SW1 - Interfaces
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"SW1","command":"show ip interface brief"}'
# R1 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"R1"}'
# R2 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"R2"}'
# SW1 - Logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"SW1"}'
After all parallel results return, analyze each device individually and produce the fleet summary (see Fleet Report Format below).
Collect the running configuration from every device in parallel for compliance analysis.
# R1 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"R1"}'
# R2 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"R2"}'
# SW1 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"SW1"}'
# SW2 - Running config
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_running_config '{"device_name":"SW2"}'
After collection, apply the pyats-security audit checks to each config and produce a fleet-wide security posture report.
Common config audit checks to apply in parallel:
service password-encryption enabledCapture the routing table from every device simultaneously for baseline documentation or pre-change verification.
# R1 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ip route"}'
# R2 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R2","command":"show ip route"}'
# R3 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R3","command":"show ip route"}'
# R4 - Full routing table
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R4","command":"show ip route"}'
After collection, analyze per device:
Produce a fleet routing summary:
Fleet Routing Snapshot - YYYY-MM-DD HH:MM UTC
┌──────────┬────────┬────────┬──────┬──────┬──────────────┬─────────┐
│ Device │ Total │ Conn. │ OSPF │ BGP │ Default Rte │ Status │
├──────────┼────────┼────────┼──────┼──────┼──────────────┼─────────┤
│ R1 │ 47 │ 5 │ 12 │ 28 │ via 10.1.1.2 │ HEALTHY │
│ R2 │ 45 │ 4 │ 12 │ 27 │ via 10.1.1.1 │ HEALTHY │
│ R3 │ 38 │ 3 │ 12 │ 21 │ via 10.2.1.1 │ WARNING │
│ R4 │ 0 │ 0 │ 0 │ 0 │ MISSING │ CRITICAL│
└──────────┴────────┴────────┴──────┴──────┴──────────────┴─────────┘
After collecting results from all devices, aggregate findings and sort by severity. This is the standard output format for all fleet operations.
Fleet Health Report - YYYY-MM-DD HH:MM UTC
Testbed: production-network
Devices scanned: 8 | Duration: 12s (parallel)
=== CRITICAL (Immediate Action) ===
[C-001] R4 - UNREACHABLE
Connection timed out after 30s. Verify device is powered on and management IP is reachable.
Impact: No data collected for R4. Manual investigation required.
[C-002] SW2 - CPU 97% (5min avg)
Top process: OSPF-1 Hello (45%), IP Input (32%)
Impact: Risk of control plane failure. OSPF hellos may be missed.
=== HIGH (Fix Within Hours) ===
[H-001] R2 - GigabitEthernet3 down/down
Last state change: 2 hours ago. 47 resets in last 24h.
Impact: Backup WAN link unavailable. No redundancy for site B.
[H-002] SW1 - OSPF neighbor 3.3.3.3 in INIT state
Expected: FULL. Interface: Vlan100. Duration: 45 minutes.
Impact: Inter-VLAN routing for VLAN 100 may be impaired.
=== MEDIUM (Fix Within Days) ===
[M-001] R1 - NTP not synchronized
No peer with '*' in show ntp associations. Clock offset: unknown.
Impact: Log timestamps may be inaccurate for forensics.
[M-002] R3 - 3 OSPF adjacency flaps in last 24h
Neighbors affected: 2.2.2.2 on Gi1 (flapped 3 times).
Impact: Route convergence events. Brief traffic disruption during SPF.
=== HEALTHY ===
R1: All checks passed (CPU 12%, Mem 45%, 4/4 interfaces up, OSPF stable)
R3: All checks passed (CPU 8%, Mem 38%, 3/3 interfaces up, BGP stable)
SW3: All checks passed (CPU 5%, Mem 22%, 24/24 ports up, STP stable)
=== FLEET SUMMARY ===
┌──────────┬──────────┬──────────────────────────────────────────────┐
│ Device │ Status │ Key Finding │
├──────────┼──────────┼──────────────────────────────────────────────┤
│ R4 │ CRITICAL │ Unreachable - connection timeout │
│ SW2 │ CRITICAL │ CPU 97% - OSPF/IP Input │
│ R2 │ HIGH │ Gi3 down/down - 47 resets │
│ SW1 │ HIGH │ OSPF neighbor INIT - Vlan100 │
│ R1 │ MEDIUM │ NTP not synchronized │
│ R3 │ MEDIUM │ 3 OSPF flaps in 24h │
│ R1 │ HEALTHY │ All checks passed │
│ R3 │ HEALTHY │ All checks passed │
│ SW3 │ HEALTHY │ All checks passed │
└──────────┴──────────┴──────────────────────────────────────────────┘
Overall Fleet Status: CRITICAL (2 critical, 2 high, 2 medium, 3 healthy)
When one device fails during parallel execution, it does not block or cancel the other operations:
# If R4 times out, you still get results from R1, R2, R3
# In the fleet report, R4 appears as:
# [C-001] R4 - UNREACHABLE
# Connection timed out. Device excluded from further checks.
The key principle: always produce a report for every device, even if the report says "unreachable."
Group devices by their function in the network to prioritize operations:
Core routers: R1, R2 (check first - highest blast radius)
Distribution: SW1, SW2 (check second)
Access: SW3, SW4, SW5 (check third)
WAN: WAN1, WAN2 (check in parallel with core)
For multi-site networks, group by location:
Site A (HQ): R1, SW1, SW2
Site B (Branch): R2, SW3
Site C (DR): R3, SW4
When validating a change, group by affected vs unaffected:
Affected devices: R1, R2 (check thoroughly - full health check)
Adjacent devices: SW1, R3 (check routing adjacencies and connectivity)
Unaffected devices: SW3, SW4 (spot check - verify no collateral damage)
| Fleet Size | Strategy | |------------|----------| | 1-5 devices | Single parallel wave, all commands at once | | 6-20 devices | Two waves: critical devices first, then remaining | | 20-50 devices | Group by role/site, run 10-15 devices per wave | | 50+ devices | Group by site, sample 20% per wave, expand if issues found |
For large fleets, start with a sampling strategy: pick 2-3 devices per role per site, run full health checks, then expand to the full fleet only if anomalies are found.
testing
Human-in-the-loop escalation via HumanRail — route low-confidence agent decisions, pre-destructive operation approvals, and ambiguous incident tickets to real human engineers. Human answers are verified and returned as structured output. Workers are paid via Lightning Network. Use when the agent is uncertain, when a destructive change needs explicit human sign-off beyond a ServiceNow CR, or when an ambiguous ticket requires human triage before automated handling.
testing
Manage EVE-NG node lifecycle. Use when listing nodes, checking runtime state, creating or deleting nodes, starting or stopping nodes or whole labs, verifying node details, or wiping node NVRAM back to factory defaults.
development
Manage EVE-NG labs and platform inventory. Use when listing labs, checking lab metadata, creating or deleting labs, importing or exporting lab archives, checking EVE-NG health or auth, or verifying available node images before build work.
tools
Execute live CLI commands on running EVE-NG nodes over telnet console. Use when running show commands, making live config changes, verifying protocol state, testing connectivity, checking console readiness, or interacting with IOS, Junos, VPCS, EOS, or NX-OS nodes.