skills/domain-tls-routing/SKILL.md
Diagnose domain resolution, TLS certificate provisioning, and reverse proxy routing issues. Use when a domain is not resolving, TLS certificates fail, proxy returns 502/503/504, or custom domains are stuck in pending status.
npx skillsauth add nixopus/agent domain-tls-routingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The user knows ONLY that they have "a server" with deployed apps and domains. They must NEVER learn about:
Banned phrases in user-facing output: "edge proxy", "VM proxy", "VM", "virtual machine", "infra host", "guest IP", "edge Caddy", "upstream dial", "Caddy admin API", "reconciler", "Cloudflare zone", "two-layer", "proxy layer", "on-demand TLS", "ACME challenge on", "registration queue", "server block", "route matcher".
Always say: "your server", "your domain", "DNS settings", "TLS certificate", "routing configuration", "proxy service".
Requests reach the user's app through a routing chain. The agent needs to understand this to diagnose issues, but must never describe the architecture to the user.
The routing chain has an outer layer and an inner layer. The outer layer receives public traffic and forwards it to the correct server. The inner layer runs on the user's server and routes to the correct application container.
The outer layer handles wildcard TLS for *.nixopus.ai subdomains and forwards custom domain traffic. The inner layer handles per-application routing and TLS for application-specific domains.
DNS records (A and wildcard A) are managed by the system for *.nixopus.ai subdomains. Custom domains require the user to set up a CNAME pointing to their assigned subdomain.nixopus.ai.
When diagnosing, check from outside in: public reachability first, then server-level proxy config, then container-level app health. If the outer layer is misconfigured, the user can't fix it — escalate internally. If the inner layer is misconfigured, use proxy_config and domain tools to fix it.
| Type | Example | How it works |
|---|---|---|
| Auto-generated subdomain | a1b2c3d4.example.nixopus.ai | Created during app deployment; DNS is pre-configured |
| Custom domain | app.userdomain.com | User adds CNAME pointing to their subdomain.nixopus.ai; requires DNS verification |
generate_random_subdomain creates an 8-char prefix + org domainadd_application_domain*.subdomain.nixopus.aiFailure points: step 3 (route registration fails), step 5 (TLS provisioning fails if DNS doesn't resolve to the server).
app.userdomain.com → subdomain.nixopus.ai_nixopus-verify.app.userdomain.com → verification tokendns_verified, routing configuredFailure points: step 3 (user misconfigures DNS), step 4 (DNS propagation delay), step 5 (routing registration fails), step 7 (ACME challenge fails because DNS doesn't resolve correctly).
Check domain status
get_domains to find the domain and its current statuspending_dns: DNS not yet configured or verified — guide user through DNS setupdns_verified: DNS is good, problem is downstreamCheck DNS resolution
network_diagnostics with type dns targeting the domainsubdomain.nixopus.aiCheck reachability
http_probe the domain on port 443Tell the user: "Your domain's DNS is not pointing to the correct server" or "DNS is configured correctly but there's a routing issue on the server."
Verify DNS first — TLS provisioning requires DNS to resolve to the server
network_diagnostics type dns on the domainCheck proxy config
proxy_config for the applicationtls_enabled is false: TLS not configured for this routetls_enabled is true but cert errors persist: certificate provisioning may have failedCheck HTTP vs HTTPS
http_probe on port 80 (HTTP) — if it works but 443 doesn't, TLS provisioning failedhttp_probe on port 443 (HTTPS) — if cert error, the certificate is invalid or missingCommon TLS failure causes
| Symptom | Cause | Fix |
|---|---|---|
| ERR_CERT_AUTHORITY_INVALID | Certificate not yet provisioned or provisioning failed | Verify DNS points to the server; wait a few minutes for automatic provisioning |
| ERR_CERT_COMMON_NAME_INVALID | Certificate issued for wrong domain | Check the domain binding matches the actual domain name |
| SSL_ERROR_RX_RECORD_TOO_LONG | App serving plain HTTP on the HTTPS port | The app should not handle TLS itself; the server's proxy handles TLS termination |
| ERR_TOO_MANY_REDIRECTS | Both app and proxy redirect HTTP→HTTPS | Disable the app's own HTTPS redirect; the proxy already handles this |
| ERR_CONNECTION_REFUSED on 443 | TLS not enabled or proxy not listening | Check proxy config and that the proxy service is running on the server |
| Certificate expired | Auto-renewal failed | Check proxy health; renewal needs DNS to resolve correctly and ports 80/443 accessible |
Tell the user: "The TLS certificate hasn't been provisioned yet because your DNS isn't pointing to the server" or "There's a certificate mismatch for your domain."
Diagnose from outside in:
External probe
http_probe the public URLCheck proxy config
proxy_config for the applicationupstream matches the expected host:portdomain matches the requested domainCheck container reachability from inside
container_exec ["curl", "-s", "-o", "/dev/null", "-w", "%{http_code}", "localhost:PORT"]Check port alignment
All four must agree:
| Layer | Check with |
|---|---|
| App listen port | container_exec ["ss", "-tlnp"] |
| Container published port | container_inspect → ports |
| Proxy upstream port | proxy_config → upstream |
| Application config port | get_application → port |
Interpret the status code
| Code | Meaning | Likely cause | |---|---|---| | 502 Bad Gateway | Proxy can't connect to the app | Container not running, wrong port, or app crashed | | 503 Service Unavailable | App not ready | App still starting, container in crash loop, or resource exhaustion | | 504 Gateway Timeout | App didn't respond in time | App hanging, database connection timeout, or infinite loop | | 521 | Server is down | The proxy service itself is not running on the server | | 522 | Connection timed out | Network issue preventing the request from reaching the app | | 523 | Origin is unreachable | The container or server network is down |
Tell the user: "Your app isn't responding on the expected port" or "There's a port mismatch in the routing configuration."
pending_dnsGet the domain details
get_domains filtering for the custom domaintarget_subdomain (the CNAME target)Check what DNS records exist
network_diagnostics type dns on the custom domain{target_subdomain}.nixopus.ai or A record to server IPCommon causes
| Issue | Diagnosis | Fix |
|---|---|---|
| No CNAME record | DNS lookup returns NXDOMAIN or wrong IP | User needs to add CNAME record at their DNS provider |
| CNAME points to wrong target | DNS lookup shows wrong value | User needs to update CNAME to the correct subdomain.nixopus.ai |
| Proxied through Cloudflare (orange cloud) | DNS resolves to Cloudflare IP, not server IP | User should disable Cloudflare proxy (grey cloud) or use DNS-only mode |
| TXT verification missing | CNAME exists but verification fails | User needs to add _nixopus-verify.domain TXT record |
| DNS propagation delay | Records just added | Wait up to 48 hours; most providers propagate within 5 minutes |
| CAA record blocking Let's Encrypt | TLS fails even after DNS verified | User needs to add CAA record allowing letsencrypt.org |
Tell the user: "Your DNS CNAME isn't set up correctly" or "DNS changes can take some time to propagate."
The domain resolves, TLS works, but the app returns errors or a wrong page.
Verify domain binding
get_application to check the application's domain listCheck proxy config
proxy_config to verify the route exists and upstream is correctCheck for domain conflicts
get_domains to see if the domain is bound to multiple applicationsCheck compose service routing
get_application → check compose service configurationTell the user: "The domain isn't linked to your application" or "The routing points to a different service in your app."
When guiding users through DNS setup:
| Provider | CNAME path | Notes | |---|---|---| | Cloudflare | DNS → Add Record → CNAME | Disable proxy (grey cloud icon) for TLS to work | | Route 53 | Hosted Zone → Create Record → CNAME | Use simple routing | | Vercel | Domains → Add DNS Record | May conflict with Vercel's own DNS | | Namecheap | Advanced DNS → Add CNAME | Host field is the subdomain only, not FQDN | | GoDaddy | DNS Management → Add CNAME | Remove trailing dot if added automatically | | Google Domains | DNS → Custom Records → CNAME | FQDN for target | | DigitalOcean | Networking → Domains → Add Record | CNAME with trailing dot |
For A records (alternative to CNAME): the user needs the server's public IP. Use get_servers to find it, then tell the user "your server's IP address is X.X.X.X."
When proxy-level issues are suspected but no specific domain is failing:
Proxy health — host_exec to check if the proxy service is running
host_exec ["systemctl", "status", "nixopus-caddy", "--no-pager"]host_exec ["systemctl", "restart", "nixopus-caddy"]Proxy config validation
host_exec ["curl", "-s", "localhost:2019/config/"] to check the proxy can load its configDomain re-sync — if multiple domains are misconfigured, re-check each domain's binding and proxy config individually using the tools above
Tell the user: "The proxy service on your server needed a restart" or "I've refreshed the routing configuration." Never expose the internal details of what was checked or fixed.
failure-diagnosis — For container-level failures (build errors, crashes, exit codes) that may underlie routing issuestools
Compressed catalog of all Nixopus API operations for the nixopus_api() tool
development
Deploy static file sites — Caddy/nginx serving, Staticfile config, and Dockerfile patterns. Use when deploying a static HTML site with no server-side runtime, or when index.html or a Staticfile is detected at the project root.
devops
Deploy shell script applications — interpreter detection, setup scripts, and Dockerfile patterns. Use when deploying a shell script project, or when start.sh is detected.
development
Self-healing loop for failed deployments — diagnose, fix, redeploy up to 3 attempts, then escalate or rollback. Load when a deployment fails or build errors occur.