skills/render-monitor/SKILL.md
Monitor Render services in real-time. Check health, performance metrics, logs, and resource usage. Use when users want to check service status, view metrics, monitor performance, or verify deployments are healthy.
npx skillsauth add render-oss/skills render-monitorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Real-time monitoring of Render services including health checks, performance metrics, and logs.
Activate this skill when users want to:
MCP tools (preferred): Test with list_services() - provides structured data
CLI (fallback): render --version - use if MCP tools unavailable
Authentication: For MCP, use an API key (set in the MCP config or via the RENDER_API_KEY env var, depending on tool). For CLI, verify with render whoami -o json.
Workspace: get_selected_workspace() or render workspace current -o json
Note: MCP tools require the Render MCP server. If unavailable, use the CLI for status and logs; metrics and database queries require MCP.
If list_services() fails, set up the Render MCP server. For detailed per-tool walkthroughs, see render-mcp.
Quick setup: Add the Render MCP server to your AI tool's MCP config:
https://mcp.render.com/mcpAuthorization: Bearer <YOUR_API_KEY>https://dashboard.render.com/u/*/settings#api-keysAfter configuring, restart your tool and retry list_services(). Then set your workspace with list_workspaces() / get_selected_workspace().
Run these 5 checks to assess service health:
# 1. Check service status
list_services()
# 2. Check latest deploy
list_deploys(serviceId: "<service-id>", limit: 1)
# 3. Check for errors
list_logs(resource: ["<service-id>"], level: ["error"], limit: 20)
# 4. Check resource usage
get_metrics(resourceId: "<service-id>", metricTypes: ["cpu_usage", "memory_usage"])
# 5. Check latency
get_metrics(resourceId: "<service-id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
list_services()
get_service(serviceId: "<id>")
list_deploys(serviceId: "<service-id>", limit: 5)
| Status | Meaning |
|--------|---------|
| live | Deployment successful |
| build_in_progress | Building |
| build_failed | Build failed |
| deactivated | Replaced by newer deploy |
list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)
list_logs(resource: ["<service-id>"], statusCode: ["500", "502", "503"], limit: 50)
get_metrics(
resourceId: "<service-id>",
metricTypes: ["cpu_usage", "memory_usage", "cpu_limit", "memory_limit"]
)
| Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| | CPU | <70% | 70-85% | >85% | | Memory | <80% | 80-90% | >90% |
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_latency"],
httpLatencyQuantile: 0.95
)
| p95 Latency | Status | |-------------|--------| | <200ms | Excellent | | 200-500ms | Good | | 500ms-1s | Concerning | | >1s | Problem |
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_request_count"]
)
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_latency"],
httpPath: "/api/users"
)
Detailed metrics guide: references/metrics-guide.md
list_postgres_instances()
get_postgres(postgresId: "<postgres-id>")
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])
query_render_postgres(
postgresId: "<postgres-id>",
sql: "SELECT state, count(*) FROM pg_stat_activity GROUP BY state"
)
query_render_postgres(
postgresId: "<postgres-id>",
sql: "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10"
)
list_key_value()
get_key_value(keyValueId: "<kv-id>")
list_logs(resource: ["<service-id>"], limit: 100)
list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)
list_logs(resource: ["<service-id>"], text: ["timeout", "error"], limit: 50)
list_logs(
resource: ["<service-id>"],
startTime: "2024-01-15T10:00:00Z",
endTime: "2024-01-15T11:00:00Z"
)
render logs -r <service-id> --tail -o text
# Services
list_services()
get_service(serviceId: "<id>")
list_deploys(serviceId: "<id>", limit: 5)
# Logs
list_logs(resource: ["<id>"], level: ["error"], limit: 100)
list_logs(resource: ["<id>"], text: ["search"], limit: 50)
# Metrics
get_metrics(resourceId: "<id>", metricTypes: ["cpu_usage", "memory_usage"])
get_metrics(resourceId: "<id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
get_metrics(resourceId: "<id>", metricTypes: ["http_request_count"])
# Database
list_postgres_instances()
get_postgres(postgresId: "<id>")
query_render_postgres(postgresId: "<id>", sql: "SELECT ...")
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])
# Key-Value
list_key_value()
get_key_value(keyValueId: "<id>")
Use these if MCP tools are unavailable:
# Service status
render services -o json
render services instances <service-id>
# Deployments
render deploys list <service-id> -o json
# Logs
render logs -r <service-id> --tail -o text # Stream logs
render logs -r <service-id> --level error -o json # Error logs
render logs -r <service-id> --type deploy -o json # Build logs
# Database
render psql <database-id> # Connect to PostgreSQL
# SSH for live debugging
render ssh <service-id>
| Indicator | Healthy | Warning | Critical |
|-----------|---------|---------|----------|
| Deploy Status | live | update_in_progress | build_failed |
| Error Rate | <0.1% | 0.1-1% | >1% |
| p95 Latency | <500ms | 500ms-2s | >2s |
| CPU Usage | <70% | 70-90% | >90% |
| Memory Usage | <80% | 80-95% | >95% |
development
Configures Render web services—port binding, TLS, health checks, custom domains, auto-deploy, PR previews, persistent disks, and deploy lifecycle. Use when the user needs to set up a web service, fix health check failures, add a custom domain, configure zero-downtime deploys, or troubleshoot port binding issues.
development
Deploys and configures static sites on Render's global CDN—build commands, publish paths, SPA routing, redirects, custom headers, and PR previews. Use when the user needs to deploy a static site, set up a React/Vue/Hugo/Gatsby frontend, configure SPA fallback routing, add redirect rules, customize response headers, or choose between a static site and a web service for their frontend. Trigger terms: static site, CDN, SPA, single-page app, React deploy, Vue deploy, Hugo, Gatsby, Docusaurus, Jekyll, staticPublishPath.
tools
Scales Render services—configures autoscaling targets, chooses instance types, sets manual instance counts, and optimizes cost. Use when the user needs to handle more traffic, set up autoscaling, pick the right instance type, reduce costs, or troubleshoot scaling behavior like slow scale-down or stuck instances.
development
Configures Render private services—internal-only apps that accept traffic exclusively from other Render services over the private network. Use when the user needs an internal API, microservice, gRPC server, sidecar, or any service that should not be publicly accessible. Also use when choosing between a private service and a background worker. Trigger terms: private service, pserv, internal service, internal API, microservice, gRPC, not public, private network service.