skills/tunnels-for-agents/SKILL.md
Tunneling for AI agent systems — exposing local services to the internet and connecting agents across network boundaries. Covers ngrok, Cloudflare Tunnel, Tailscale Funnel, bore, localhost.run, SSH tunneling, and WireGuard. Agent patterns: webhook callbacks to local agents, tunneling MCP servers, agent-to-agent communication across NATs, and tunnel mesh architectures. Activate on 'tunnel', 'ngrok', 'cloudflare tunnel', 'tailscale funnel', 'SSH tunnel', 'port forwarding', 'expose localhost', 'WireGuard', 'tunnel MCP server', 'NAT traversal'. NOT for: reverse proxy and load balancing (use reverse-proxy-for-agents), container networking (use devops-automator), firewall rules and zero-trust policies (use agentic-zero-trust-security), DNS management (use infrastructure skills).
npx skillsauth add curiositech/windags-skills tunnels-for-agentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in network tunneling with deep knowledge of how AI agent systems use tunnels to cross network boundaries. You understand the security tradeoffs of exposing local services, the operational differences between tunnel tools, and the specific patterns that emerge when agents need to communicate across NATs, firewalls, and cloud/local boundaries.
A tunnel creates a pathway between two network endpoints that would otherwise be unable to communicate directly. For agent systems, this means: exposing a local MCP server so a cloud-hosted LLM can reach it, giving a local agent access to a webhook callback URL, connecting agents running on different developer machines, and bridging the gap between "runs on my laptop" and "accessible from the internet."
Need to expose agent service?
│
├─ Quick dev test (< 1 hour)?
│ ├─ Need request inspection?
│ │ └─► ngrok http 3001 (builtin traffic viewer)
│ └─ Zero signup/install?
│ └─► ssh -R 80:localhost:3001 localhost.run
│
├─ Production MCP server (permanent)?
│ ├─ Own domain + need auth?
│ │ └─► Cloudflare Tunnel + CF Access
│ └─ Demo/testing only?
│ └─► ngrok with stable domain (paid)
│
├─ Agent fleet (multiple machines)?
│ ├─ Need selective public exposure?
│ │ └─► Tailscale mesh + Funnel for specific services
│ └─ All private team access?
│ └─► Tailscale serve (no public exposure)
│
├─ Need to access remote database/API?
│ ├─ Have SSH access to jumpbox?
│ │ └─► ssh -L 5432:db:5432 user@jumpbox
│ └─ Need persistent connection?
│ └─► autossh or WireGuard VPN
│
└─ Self-hosted/no external deps?
├─ Simple TCP forwarding?
│ └─► bore (Rust, minimal relay server)
└─ Need full VPN mesh?
└─► WireGuard (manual key management)
Bandwidth and Auth Decision Matrix:
| Scenario | Tool | Auth Method | Bandwidth Limit | Setup Time | |----------|------|-------------|-----------------|------------| | MCP demo to client | ngrok free | OAuth/IP restrict | 1GB/month | 30s | | Production MCP | Cloudflare | CF Access (SSO) | Unlimited | 5min | | Agent fleet mesh | Tailscale | SSO + ACLs | Unlimited | 2min/machine | | Database access | SSH tunnel | SSH keys | Unlimited | 10s | | CI/CD tunnels | bore | Shared secret | Unlimited | 1min |
Symptoms: Agents timeout sporadically; tunnel shows as "connected" but traffic fails; error messages vary from "connection refused" to "timeout after 30s"
Detection: curl -v tunnel-url returns connection error but tunnel process is still running
Root cause: NAT timeout killed the tunnel connection but tunnel client hasn't detected it yet
Fix: Add keepalive: SSH -o "ServerAliveInterval 30", ngrok has builtin keepalive, Tailscale handles this automatically
Prevention: Use autossh for SSH tunnels; monitor tunnel health with automated curl checks
Symptoms: MCP server receiving unexpected requests; agents accessed by unauthorized users; tunnel access works from random IPs
Detection: Check tunnel access logs for unfamiliar source IPs; MCP server logs show tool calls you didn't make
Root cause: Created tunnel without authentication (default behavior for most tools)
Fix: Immediate: Close tunnel with ngrok api tunnels delete or kill process. Add auth: ngrok --oauth, Cloudflare Access, SSH keys
Prevention: Never create production tunnels without auth; audit active tunnels weekly with ngrok api tunnels list
Symptoms: Large agent responses (100K+ tokens) timeout; tunnel works for small requests but fails for file uploads/downloads; sporadic 503 errors Detection: Monitor tunnel bandwidth usage; large payloads consistently fail while small ones succeed Root cause: Hit free tier bandwidth limits; tunnel provider throttling or cutting connections Fix: Upgrade to paid tier (ngrok) or switch to unlimited provider (Cloudflare/Tailscale) Prevention: Test with realistic payload sizes; set up bandwidth monitoring alerts
Symptoms: Multiple tunnel URLs for same service; confusion about which tunnel is "live"; security team finds unknown exposed services
Detection: ps aux | grep -E "(ngrok|cloudflared|bore)" shows multiple tunnel processes; netstat -tlnp shows unexpected listening ports
Root cause: Starting new tunnels without killing old ones; no tunnel lifecycle management
Fix: Kill all tunnel processes, audit exposed services, restart only needed tunnels with proper process management
Prevention: Use systemd/launchd for persistent tunnels; document active tunnels in project README; automated tunnel inventory
Symptoms: Tunnel fails to start with "port already in use"; multiple tunnel tools fighting over same port; agents can't bind to expected ports
Detection: lsof -i :PORT shows multiple processes bound to same port; tunnel startup logs show bind errors
Root cause: Running multiple tunnel tools simultaneously (ngrok + cloudflared + Tailscale serve all on port 443)
Fix: Stop all tunnel processes, assign unique ports per service, use port mapping in tunnel config
Prevention: Standardize on one tunnel tool per use case; document port assignments; use high ports (8000+) for local services
Scenario: Need to expose a local MCP server running file management tools to Claude Desktop, with authentication to prevent unauthorized access.
Agent Decision Process:
cloudflared tunnel create mcp-tunnel creates persistent tunnel# Step 1: Agent creates tunnel infrastructure
cloudflared tunnel create mcp-tunnel
# Output: Tunnel ID abc123-def456-ghi789
# Step 2: Configure tunnel routing
cat > ~/.cloudflared/config.yml << EOF
tunnel: abc123-def456-ghi789
credentials-file: ~/.cloudflared/abc123-def456-ghi789.json
ingress:
- hostname: mcp.mycompany.com
service: http://localhost:3001
originRequest:
connectTimeout: 30s
noTLSVerify: true
- service: http_status:404
EOF
# Step 3: Create DNS record
cloudflared tunnel route dns mcp-tunnel mcp.mycompany.com
# Step 4: Test tunnel before adding auth
cloudflared tunnel run mcp-tunnel &
curl -v https://mcp.mycompany.com/health
# Should reach local MCP server
# Step 5: Add Cloudflare Access auth (via dashboard)
# Create application for mcp.mycompany.com
# Add policy: Allow emails ending in @mycompany.com
# Test: Browser redirect to Google login before reaching MCP
# Step 6: Configure Claude Desktop
cat > ~/.claude_desktop_config.json << EOF
{
"mcpServers": {
"file-tools": {
"command": "mcp-server",
"args": ["--config", "file-tools.json"],
"env": {
"MCP_TUNNEL_URL": "https://mcp.mycompany.com"
}
}
}
}
EOF
Expert insight: Notice the tunnel health test (Step 4) before adding authentication. Novices often add auth first, then can't debug connectivity issues. Also, the connectTimeout: 30s handles slow MCP tool execution.
Scenario: 3-machine agent fleet (dev laptop, cloud VM, CI runner) where cloud VM becomes unreachable via Tailscale. Need to diagnose and restore mesh connectivity.
Agent Decision Process:
tailscale status to see peer connectivitytailscale ping to isolate connectivity issuestailscale up --force-reauth# Step 1: Check mesh status from laptop
tailscale status
# Output shows:
# laptop.tail123.ts.net 100.64.0.1 online
# vm.tail123.ts.net 100.64.0.2 offline last seen: 2h ago
# ci.tail123.ts.net 100.64.0.3 online
# Step 2: Test specific connectivity
tailscale ping vm.tail123.ts.net
# Fails with "no route to host"
# Step 3: Check if VM can reach other peers (SSH to VM)
ssh [email protected]
tailscale status
# Shows: "Not connected" or "Authentication required"
# Step 4: Re-authenticate VM to tailnet
sudo tailscale up --force-reauth
# Opens browser for re-auth, or shows device authorization URL
# Step 5: Verify mesh restored from laptop
tailscale ping vm.tail123.ts.net
# Success: pong from 100.64.0.2
# Step 6: Test agent-to-agent communication
curl http://vm.tail123.ts.net:8001/health
# Agent API on VM now reachable from laptop agent
# Step 7: Set up monitoring to catch this early
cat > monitor-mesh.sh << 'EOF'
#!/bin/bash
for peer in vm.tail123.ts.net ci.tail123.ts.net; do
if ! tailscale ping --timeout=5s $peer >/dev/null 2>&1; then
echo "ALERT: $peer unreachable in Tailscale mesh"
# Send to monitoring system
fi
done
EOF
chmod +x monitor-mesh.sh
# Run via cron every 5 minutes
Expert insight: The key diagnostic is tailscale status showing "offline" vs "not connected". Offline means the peer was reachable but hasn't been seen recently (network issue). Not connected means authentication expired (policy issue).
Scenario: Local agent needs to access a production database that's only reachable through a bastion host, with connection persistence across laptop sleep/wake cycles.
# Step 1: Set up persistent SSH tunnel with autossh
brew install autossh
# Step 2: Create SSH config for connection reuse
cat >> ~/.ssh/config << 'EOF'
Host bastion
HostName bastion.company.com
User tunnel-user
IdentityFile ~/.ssh/tunnel-key
ServerAliveInterval 30
ServerAliveCountMax 3
Host db-tunnel
HostName db-server.internal
ProxyJump bastion
LocalForward 5432 localhost:5432
EOF
# Step 3: Start persistent tunnel
autossh -M 20000 -fNL 5432:db-server.internal:5432 [email protected]
# Step 4: Test database connectivity
psql -h localhost -p 5432 -U agent_user production_db
# Should connect to remote database via tunnel
# Step 5: Agent uses local database connection
export DATABASE_URL="postgresql://agent_user:password@localhost:5432/production_db"
python agent.py
# Agent queries database as if it were local
# Step 6: Set up systemd service for persistence (Linux)
cat > ~/.config/systemd/user/db-tunnel.service << 'EOF'
[Unit]
Description=Database SSH Tunnel
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -NL 5432:db-server.internal:5432 [email protected]
Restart=always
RestartSec=10
[Install]
WantedBy=default.target
EOF
systemctl --user enable db-tunnel.service
systemctl --user start db-tunnel.service
Expert insight: The ProxyJump SSH config eliminates the need for manual tunnel chaining. Autossh monitoring port (-M 20000) tests tunnel health by sending data through a separate connection.
curl -v tunnel-url before sharing URLngrok api tunnels list, cloudflared tunnel list)Do NOT use tunnels for:
Delegate to other skills when:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.