plugins/render/skills/render-monitor/SKILL.md
Monitor Render services in real-time. Check health, performance metrics, logs, and resource usage. Use when users want to check service status, view metrics, monitor performance, or verify deployments are healthy.
npx skillsauth add openai/plugins render-monitorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Real-time monitoring of Render services including health checks, performance metrics, and logs.
Activate this skill when users want to:
MCP tools (preferred): Test with list_services() - provides structured data
CLI (fallback): render --version - use if MCP tools unavailable
Authentication: For MCP, use an API key (set in the MCP config or via the RENDER_API_KEY env var, depending on tool). For CLI, verify with render whoami -o json.
Workspace: get_selected_workspace() or render workspace current -o json
Note: MCP tools require the Render MCP server. If unavailable, use the CLI for status and logs; metrics and database queries require MCP.
If list_services() fails because MCP isn't configured, guide the user to set up the hosted Render MCP server. Ask which AI tool they're using, then provide the matching instructions below. Always use their API key.
Walk the user through these steps:
https://dashboard.render.com/u/*/settings#api-keys
~/.cursor/mcp.json (replace <YOUR_API_KEY>):{
"mcpServers": {
"render": {
"url": "https://mcp.render.com/mcp",
"headers": {
"Authorization": "Bearer <YOUR_API_KEY>"
}
}
}
}
list_services().Walk the user through these steps:
https://dashboard.render.com/u/*/settings#api-keys
export RENDER_API_KEY="<YOUR_API_KEY>"
codex mcp add render --url https://mcp.render.com/mcp --bearer-token-env-var RENDER_API_KEY
list_services().If the user is on another AI app, direct them to the Render MCP docs for that tool's setup steps and install method.
After MCP is configured, have the user set the active Render workspace with a prompt like:
Set my Render workspace to [WORKSPACE_NAME]
Run these 5 checks to assess service health:
# 1. Check service status
list_services()
# 2. Check latest deploy
list_deploys(serviceId: "<service-id>", limit: 1)
# 3. Check for errors
list_logs(resource: ["<service-id>"], level: ["error"], limit: 20)
# 4. Check resource usage
get_metrics(resourceId: "<service-id>", metricTypes: ["cpu_usage", "memory_usage"])
# 5. Check latency
get_metrics(resourceId: "<service-id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
list_services()
get_service(serviceId: "<id>")
list_deploys(serviceId: "<service-id>", limit: 5)
| Status | Meaning |
|--------|---------|
| live | Deployment successful |
| build_in_progress | Building |
| build_failed | Build failed |
| deactivated | Replaced by newer deploy |
list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)
list_logs(resource: ["<service-id>"], statusCode: ["500", "502", "503"], limit: 50)
get_metrics(
resourceId: "<service-id>",
metricTypes: ["cpu_usage", "memory_usage", "cpu_limit", "memory_limit"]
)
| Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| | CPU | <70% | 70-85% | >85% | | Memory | <80% | 80-90% | >90% |
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_latency"],
httpLatencyQuantile: 0.95
)
| p95 Latency | Status | |-------------|--------| | <200ms | Excellent | | 200-500ms | Good | | 500ms-1s | Concerning | | >1s | Problem |
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_request_count"]
)
get_metrics(
resourceId: "<service-id>",
metricTypes: ["http_latency"],
httpPath: "/api/users"
)
Detailed metrics guide: references/metrics-guide.md
list_postgres_instances()
get_postgres(postgresId: "<postgres-id>")
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])
query_render_postgres(
postgresId: "<postgres-id>",
sql: "SELECT state, count(*) FROM pg_stat_activity GROUP BY state"
)
query_render_postgres(
postgresId: "<postgres-id>",
sql: "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10"
)
list_key_value()
get_key_value(keyValueId: "<kv-id>")
list_logs(resource: ["<service-id>"], limit: 100)
list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)
list_logs(resource: ["<service-id>"], text: ["timeout", "error"], limit: 50)
list_logs(
resource: ["<service-id>"],
startTime: "2024-01-15T10:00:00Z",
endTime: "2024-01-15T11:00:00Z"
)
render logs -r <service-id> --tail -o text
# Services
list_services()
get_service(serviceId: "<id>")
list_deploys(serviceId: "<id>", limit: 5)
# Logs
list_logs(resource: ["<id>"], level: ["error"], limit: 100)
list_logs(resource: ["<id>"], text: ["search"], limit: 50)
# Metrics
get_metrics(resourceId: "<id>", metricTypes: ["cpu_usage", "memory_usage"])
get_metrics(resourceId: "<id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
get_metrics(resourceId: "<id>", metricTypes: ["http_request_count"])
# Database
list_postgres_instances()
get_postgres(postgresId: "<id>")
query_render_postgres(postgresId: "<id>", sql: "SELECT ...")
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])
# Key-Value
list_key_value()
get_key_value(keyValueId: "<id>")
Use these if MCP tools are unavailable:
# Service status
render services -o json
render services instances <service-id>
# Deployments
render deploys list <service-id> -o json
# Logs
render logs -r <service-id> --tail -o text # Stream logs
render logs -r <service-id> --level error -o json # Error logs
render logs -r <service-id> --type deploy -o json # Build logs
# Database
render psql <database-id> # Connect to PostgreSQL
# SSH for live debugging
render ssh <service-id>
| Indicator | Healthy | Warning | Critical |
|-----------|---------|---------|----------|
| Deploy Status | live | update_in_progress | build_failed |
| Error Rate | <0.1% | 0.1-1% | >1% |
| p95 Latency | <500ms | 500ms-2s | >2s |
| CPU Usage | <70% | 70-90% | >90% |
| Memory Usage | <80% | 80-95% | >95% |
tools
Top-level workflow skill for USD performance diagnosis and optimization. Use for slow loading, high memory, low FPS, or 'optimize my scene' requests; delegates auth/runtime setup to Phase 0 owners.
data-ai
Use when the user mentions MagicPath, designs, UI components, themes, canvas selections, or repo-to-canvas UI work; run magicpath-ai to search, inspect, install, or author components.
documentation
Use as the top-level router for Omniverse Realtime Viewer USD app requests and focused viewer reference documents.
tools
Turn Notion specs into implementation plans, tasks, and progress tracking; use when implementing PRDs/feature specs and creating Notion plans + tasks from them.