skills/render-debug/SKILL.md
Debug failed Render deployments by analyzing logs, metrics, and database state. Identifies errors (missing env vars, port binding, OOM, etc.) and suggests fixes. Use when deployments fail, services won't start, or users mention errors, logs, or debugging.
npx skillsauth add render-oss/skills render-debugInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze deployment failures using logs, metrics, and database queries. Identify root causes and apply fixes.
Activate this skill when:
MCP tools (preferred): Test with list_services() - provides structured data
CLI (fallback): render --version - use if MCP tools unavailable
Authentication: For MCP, use an API key (set in the MCP config or via the RENDER_API_KEY env var, depending on tool). For CLI, verify with render whoami -o json.
Workspace: get_selected_workspace() or render workspace current -o json
Note: MCP tools require the Render MCP server. If unavailable, use the CLI for logs and deploy status; metrics and structured database queries require MCP.
If list_services() fails, set up the Render MCP server. For detailed per-tool walkthroughs, see render-mcp.
Quick setup: Add the Render MCP server to your AI tool's MCP config:
https://mcp.render.com/mcpAuthorization: Bearer <YOUR_API_KEY>https://dashboard.render.com/u/*/settings#api-keysAfter configuring, restart your tool and retry list_services(). Then set your workspace with list_workspaces() / get_selected_workspace().
list_services()
If MCP isn't configured, ask whether to set it up (preferred) or continue with CLI. Then proceed.
Look for services with failed status. Get details:
get_service(serviceId: "<id>")
Build/Deploy Logs (most failures):
list_logs(resource: ["<service-id>"], type: ["build"], limit: 200)
Runtime Error Logs:
list_logs(resource: ["<service-id>"], level: ["error"], limit: 100)
Search for Specific Errors:
list_logs(resource: ["<service-id>"], text: ["KeyError", "ECONNREFUSED"], limit: 50)
HTTP Error Logs:
list_logs(resource: ["<service-id>"], statusCode: ["500", "502", "503"], limit: 50)
Match log errors against known patterns:
| Error | Log Pattern | Common Fix |
|-------|-------------|------------|
| MISSING_ENV_VAR | KeyError, not defined | Add to render.yaml or update_environment_variables |
| PORT_BINDING | EADDRINUSE | Use 0.0.0.0:$PORT |
| MISSING_DEPENDENCY | Cannot find module | Add to package.json/requirements.txt |
| DATABASE_CONNECTION | ECONNREFUSED :5432 | Check DATABASE_URL, DB status |
| HEALTH_CHECK | Health check timeout | Add /health endpoint, check port binding |
| OUT_OF_MEMORY | heap out of memory, exit 137 | Optimize memory or upgrade plan |
| BUILD_FAILURE | Command failed | Fix build command or dependencies |
Full error catalog: references/error-patterns.md
If errors repeat across deploys: Switch from incremental fixes to a broader sweep. Scan the codebase/config for all likely causes in that error class (related env vars, build config, dependencies, or type errors) and address them together before the next redeploy.
For crashes, slow responses, or resource issues:
get_metrics(
resourceId: "<service-id>",
metricTypes: ["cpu_usage", "memory_usage", "memory_limit"]
)
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_latency"],
httpLatencyQuantile: 0.95
)
Detailed metrics guide: references/metrics-debugging.md
For database-related errors:
# Check database status
list_postgres_instances()
# Check connections
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])
# Query directly
query_render_postgres(
postgresId: "<postgres-id>",
sql: "SELECT state, count(*) FROM pg_stat_activity GROUP BY state"
)
Detailed database guide: references/database-debugging.md
For environment variables:
update_environment_variables(
serviceId: "<service-id>",
envVars: [{"key": "MISSING_VAR", "value": "value"}]
)
For code changes:
# Check deploy status
list_deploys(serviceId: "<service-id>", limit: 1)
# Check for new errors
list_logs(resource: ["<service-id>"], level: ["error"], limit: 20)
# Check metrics
get_metrics(resourceId: "<service-id>", metricTypes: ["http_request_count"])
Pre-built debugging sequences for common scenarios:
| Scenario | Workflow |
|----------|----------|
| Deploy failed | list_deploys → list_logs(type: build) → fix → redeploy |
| App crashing | list_logs(level: error) → get_metrics(memory) → fix |
| App slow | get_metrics(http_latency) → get_metrics(cpu) → query_postgres |
| DB connection | list_postgres → get_metrics(connections) → query_postgres |
| Post-deploy check | list_deploys → list_logs(error) → get_metrics |
Detailed workflows: references/quick-workflows.md
# Service Discovery
list_services()
get_service(serviceId: "<id>")
list_postgres_instances()
# Logs
list_logs(resource: ["<id>"], level: ["error"], limit: 100)
list_logs(resource: ["<id>"], type: ["build"], limit: 200)
list_logs(resource: ["<id>"], text: ["search"], limit: 50)
# Metrics
get_metrics(resourceId: "<id>", metricTypes: ["cpu_usage", "memory_usage"])
get_metrics(resourceId: "<id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
# Database
query_render_postgres(postgresId: "<id>", sql: "SELECT ...")
# Deployments
list_deploys(serviceId: "<id>", limit: 5)
# Environment Variables
update_environment_variables(serviceId: "<id>", envVars: [{key, value}])
render services -o json
render logs -r <service-id> --level error -o json
render logs -r <service-id> --tail -o text
render deploys create <service-id> --wait
development
Configures Render web services—port binding, TLS, health checks, custom domains, auto-deploy, PR previews, persistent disks, and deploy lifecycle. Use when the user needs to set up a web service, fix health check failures, add a custom domain, configure zero-downtime deploys, or troubleshoot port binding issues.
development
Deploys and configures static sites on Render's global CDN—build commands, publish paths, SPA routing, redirects, custom headers, and PR previews. Use when the user needs to deploy a static site, set up a React/Vue/Hugo/Gatsby frontend, configure SPA fallback routing, add redirect rules, customize response headers, or choose between a static site and a web service for their frontend. Trigger terms: static site, CDN, SPA, single-page app, React deploy, Vue deploy, Hugo, Gatsby, Docusaurus, Jekyll, staticPublishPath.
tools
Scales Render services—configures autoscaling targets, chooses instance types, sets manual instance counts, and optimizes cost. Use when the user needs to handle more traffic, set up autoscaling, pick the right instance type, reduce costs, or troubleshoot scaling behavior like slow scale-down or stuck instances.
development
Configures Render private services—internal-only apps that accept traffic exclusively from other Render services over the private network. Use when the user needs an internal API, microservice, gRPC server, sidecar, or any service that should not be publicly accessible. Also use when choosing between a private service and a background worker. Trigger terms: private service, pserv, internal service, internal API, microservice, gRPC, not public, private network service.