skills/rai-health-skill/SKILL.md
Guides diagnosis of RAI engine performance issues and recommends remediation. Use when an engine is slow, unresponsive, or needs scaling.
npx skillsauth add RelationalAI/rai-agent-skills rai-health-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
What: A process skill for setting up RAI observability, reading the three core reasoner metrics (memory, CPU, demand), interpreting utilization patterns, and prescribing the correct remediation action.
When to use:
When NOT to use:
rai-pyrel-codingrai-configurationrai-prescriptive-solver-managementOverview (process steps):
OBSERVABILITY_PREVIEW)| View | Key Column | Healthy Signal |
|------|-----------|----------------|
| logic_reasoner__memory_utilization | MEMORY_UTILIZATION (0.0–1.0) | < 0.80 on most runs |
| logic_reasoner__cpu_utilization | CPU_UTILIZATION (0.0–1.0) | < 0.85 sustained; < 0.95 peak |
| logic_reasoner__demand | DEMAND (0.0+) | ≤ 1.0 (> 1.0 = queuing) |
Quickest health check — all three metrics joined, last hour:
SELECT
m.REASONER_NAME,
m.TIMESTAMP,
m.MEMORY_UTILIZATION,
c.CPU_UTILIZATION,
d.DEMAND,
d.REASONER_CAPACITY
FROM relationalai.observability_preview.logic_reasoner__memory_utilization m
JOIN relationalai.observability_preview.logic_reasoner__cpu_utilization c
ON m.REASONER_ID = c.REASONER_ID AND m.TIMESTAMP = c.TIMESTAMP
JOIN relationalai.observability_preview.logic_reasoner__demand d
ON m.REASONER_ID = d.REASONER_ID AND m.TIMESTAMP = d.TIMESTAMP
WHERE m.TIMESTAMP >= DATEADD(hour, -1, CURRENT_TIMESTAMP())
ORDER BY m.TIMESTAMP DESC;
Always include a time-range filter. Querying without
WHERE timestamp >= ...scans the entire Event Table and incurs high Snowflake compute costs.
Before querying metrics, confirm the events view is registered and data is flowing.
CALL relationalai.app.CHECK_EVENTS_VIEW_STATUS();
| Status | Meaning | Action |
|--------|---------|--------|
| Events view active | Healthy, events flowing | None |
| No events view registered | Setup not done | Follow setup in references/setup-guide.md |
| ERROR | Configuration broken | Fix per error message reported |
Run
CHECK_EVENTS_VIEW_STATUS()whenever observability views return unexpected or empty results — it diagnoses most configuration issues automatically.
SELECT REASONER_NAME,
AVG(MEMORY_UTILIZATION) AS avg_mem,
MAX(MEMORY_UTILIZATION) AS peak_mem
FROM relationalai.observability_preview.logic_reasoner__memory_utilization
WHERE timestamp >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
GROUP BY REASONER_NAME;
SELECT REASONER_NAME,
AVG(CPU_UTILIZATION) AS avg_cpu,
MAX(CPU_UTILIZATION) AS peak_cpu
FROM relationalai.observability_preview.logic_reasoner__cpu_utilization
WHERE timestamp >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
GROUP BY REASONER_NAME;
SELECT REASONER_NAME,
AVG(DEMAND) AS avg_demand,
MAX(DEMAND) AS peak_demand,
REASONER_CAPACITY
FROM relationalai.observability_preview.logic_reasoner__demand
WHERE timestamp >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
GROUP BY REASONER_NAME, REASONER_CAPACITY;
Interpret patterns, not isolated spikes. Utilization is naturally spiky. The key question is: does almost every workload run exceed the threshold? Isolated peaks are normal; consistent exceedance across runs is the signal to act.
Signals: MEMORY_UTILIZATION < 0.80, CPU_UTILIZATION < 0.85, DEMAND ≤ 1.0 on most runs.
Signals (any of):
MEMORY_UTILIZATION > 0.80 on most workload runsCPU_UTILIZATION consistently > 0.95CPU_UTILIZATION = 1.0 AND MEMORY_UTILIZATION = 1.0 AND DEMAND > 1.0 (all hard limits hit)Action: Upgrade reasoner size. No in-place resize exists — delete and recreate.
rai reasoners:suspend --type Logic --name <name>
rai reasoners:delete --type Logic --name <name>
rai reasoners:create --type Logic --name <name> --size <larger-size>
Signals: CPU_UTILIZATION consistently 0.85–0.95 (below critical but limited headroom for bursts).
Action: Schedule a resize during a low-traffic window before the next traffic spike. No immediate action required.
Signals: DEMAND consistently > 1.0 (more jobs than available queue slots).
Action:
Signals: CPU_UTILIZATION < 0.30 AND MEMORY_UTILIZATION never exceeds 0.30 across workload runs.
Action: Downgrade to a smaller reasoner — you are paying for unused capacity.
rai reasoners:suspend --type Logic --name <name>
rai reasoners:delete --type Logic --name <name>
rai reasoners:create --type Logic --name <name> --size <smaller-size>
Signals: DEMAND = 0 for extended periods.
Action: Suspend the reasoner or reduce its auto_suspend threshold to stop billing for idle time.
rai reasoners:suspend --type Logic --name <name>
Two application roles control who can configure and who can read observability data:
| Role | Capabilities | Grant To |
|------|-------------|----------|
| observability_admin | Register/unregister events view; call CHECK_EVENTS_VIEW_STATUS() | Small trusted ops group |
| observability_viewer | Read-only on all observability views | Engineering and operations users |
GRANT APPLICATION ROLE relationalai.observability_viewer TO ROLE <your_role>;
GRANT APPLICATION ROLE relationalai.observability_admin TO ROLE <your_role>;
Observability views are non-materialized — every query scans the Snowflake Event Table in real time. No extra storage cost, but Snowflake compute credits are consumed on every query.
Cost scales with: event volume × time range × query complexity.
| Rule | Detail |
|------|--------|
| Always filter by time | WHERE timestamp >= DATEADD(hour, -24, ...) — never query without bounds |
| Monitor query costs | SELECT query_id, total_elapsed_time, credits_used_cloud_services FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY WHERE query_text ILIKE '%observability%' |
| Prefer hourly/daily aggregations for dashboards | Avoid raw per-minute scans in scheduled jobs |
| Mistake | Cause | Fix |
|---------|-------|-----|
| No data in metric views | Events view not registered or setup incomplete | Run CHECK_EVENTS_VIEW_STATUS(); complete setup if needed |
| Registration fails: "CHANGE_TRACKING" error | Change tracking not enabled on view or event table | Add CHANGE_TRACKING = TRUE to view definition and underlying table |
| Registration fails: "A view is already registered" | Prior view still bound | Call UNREGISTER_EVENTS_VIEW() first, then re-register |
| Queries are slow or expensive | Missing time-range filter scans full event table | Always add WHERE timestamp >= DATEADD(...) |
| DEMAND > 1.0 but CPU is low | Job routing: one reasoner saturated, others idle | Redistribute jobs across reasoner instances |
| Isolated spikes look alarming | Normal: spiky workloads cause transient peaks | Focus on pattern across runs, not individual data points |
| observability_viewer cannot see views | Role not granted | Run GRANT APPLICATION ROLE ... TO ROLE ... as admin |
| Reference | Description | File | |-----------|-------------|------| | Setup guide | Full 6-step observability setup | setup-guide.md | | Metric schemas | All metric view schemas | metric-schemas.md |
data-ai
Configure and train graph neural network (GNN) models, generate predictions, evaluate results, and manage trained models. Use when ready to train, generate predictions, evaluate, or manage models; for concepts, data loading, edges, and feature configuration, see `rai-predictive-modeling`.
development
Build graph neural network (GNN) models — concepts, Snowflake data loading, task relationships, graph edges, and PropertyTransformer features. Use for node classification, regression, and link prediction tasks; for training, predictions, and evaluation, see `rai-predictive-training`.
development
Setup and configuration for RelationalAI — first-time install walkthrough and all raiconfig.yaml tuning. Use when installing RAI, connecting to Snowflake, or editing raiconfig.yaml. Not for writing PyRel model code (see rai-pyrel-coding) or solver usage and diagnostics (see rai-prescriptive-solver-management).
testing
Converts natural language business rules into PyRel derived properties — validation, classification, derivation, alerting, and reconciliation. Use whenever a task assigns each entity a new tier, segment, score, or flag, or derives a new property; author it here as a derived property, then query it with rai-querying.