plugins/rai/skills/rai-discovery/SKILL.md
Translation, ideation, and routing layer between an ontology and the RAI reasoners. Surfaces questions the data can answer, classifies them by reasoner family (prescriptive, graph, predictive, rules), and translates user-facing problem framings into the technical implementation hints the downstream reasoner skills need. Use before choosing a reasoner workflow or when scoping what to build next.
npx skillsauth add RelationalAI/rai-agent-skills rai-discoveryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
What: Multi-reasoner question discovery from ontology models. Acts as the translation, ideation, and routing layer between the ontology and the reasoners — surfaces what the data can answer, classifies by reasoner family, and translates user-facing problem framings into the technical implementation hints the downstream coding skills consume.
When to use:
When NOT to use:
rai-prescriptive-problem-formulationrai-pyrel-codingrai-ontology-designrai-prescriptive-solver-managementrai-prescriptive-results-interpretationOverview:
inspect.schema(model) — concepts, properties with real types, relationships, data sources| Signal in Ontology | Reasoner | Question Pattern |
|--------------------|----------|-----------------|
| Constrained resources, costs, capacities | Prescriptive | "What should we do?" — allocate, schedule, route. Within prescriptive, formulation splits along a style axis: MIP-style (Problem(model, Float) + HiGHS/Gurobi — continuous-friendly) vs CSP-style (Problem(model, Integer) + MiniZinc — all-integer with globals, multi-solution enumeration, audit/witness). See prescriptive.md § Formulation Style Detection. |
| Network topology, graph structure | Graph | "What patterns exist?" — centrality, clusters, paths |
| Labels/values per entity, historical pair data, graph topology | Predictive | "What will happen?" / "Which Y for each X?" — node classification, node regression, link prediction |
| Threshold/status fields, business rules | Rules | "Is this valid?" — compliance, classification |
| Feasibility | Meaning | Next Step | |-------------|---------|-----------| | READY | All data in model | Proceed to reasoner workflow | | MODEL_GAP | Data in schema, not mapped | Enrich ontology first | | DATA_GAP | Data doesn't exist | Blocks the question |
Question discovery is the analyst's springboard into data-driven reasoning. The ontology reveals what questions the data can answer -- the analyst learns what's possible before choosing what to pursue.
Ground in the real model first. Before enumerating opportunities from memory or source files, run inspect.schema(model) to see what's actually registered — concepts, properties (including inherited), types (enriched from the backing TableSchema where available), relationships, and both model.tables and inline model.data_items sources. Discovery suggestions are only useful if they're grounded in what the data can actually support; guessing from partial reads produces confident-but-wrong recommendations. See rai-querying/references/inspect-module.md.
from relationalai.semantics import inspect
schema = inspect.schema(model)
# Now enumerate signals from real state: concepts with network topology,
# temporal properties, constrained-resource concepts, boolean flags, etc.
When to skip: if the user is asking "what can I do with this new dataset" and the model is still greenfield (no concepts yet), start at Step 2 and return to inspect later.
Analyze the ontology — what concepts, relationships, and data exist? Look for network topology (graph), temporal patterns (predictive), constrained decisions (prescriptive), threshold/status fields (rules).
Classify by reasoner — for each opportunity, determine which reasoner(s) apply (→ Reasoner Classification). Tag with reasoners field.
Identify chains — where one reasoner's output enables another (→ Multi-Reasoner Chaining, Cumulative Discovery).
Assess feasibility — READY / MODEL_GAP / DATA_GAP for each suggestion (→ Feasibility Framework).
Generate ranked suggestions — with implementation hints per reasoner type (→ reference files: prescriptive.md, graph.md, predictive.md, rules.md).
User selects — route to the appropriate reasoner workflow (→ Post-Discovery Routing). If MODEL_GAP, enrich first (→ Enrichment Handoff).
statement field as a business question the analyst can evaluate -- not a technical formulation
implementation_hint fields -- these drive downstream workflow and must reference actual concept/property namesPresent suggestions as a landscape of what the data can answer -- across reasoner types -- not as a menu of one kind of question.
Frame suggestions by the type of question they answer:
Connect each suggestion to the decision or insight it enables, not just the analysis it performs:
Each suggestion must be tagged with one or more reasoner types. Use these signals to classify:
| Signal | Primary Reasoner | Question Pattern |
|--------|-----------------|------------------|
| Optimizing decisions over constrained resources | Prescriptive | "What should we do?" — allocate, schedule, route, price. Cross-cutting style choice within prescriptive: MIP-style (continuous-friendly) vs CSP-style (all-integer + globals). See prescriptive.md § Formulation Style Detection. |
| Understanding structure, connectivity, influence | Graph | "What patterns exist?" — who is central, what clusters exist, shortest path |
| Predicting node labels/values or future links from features and graph topology | Predictive | "What will happen?" / "Which Y for each X?" — node classification, node regression, link prediction |
| Enforcing business rules and logical constraints | Rules | "Is this valid?" — compliance, classification, derivation |
Disambiguation rules:
competing_objectives in the prescriptive hint. See prescriptive.md § Multi-Objective Detection for the checklist.scenario_parameter in the prescriptive hint. See prescriptive.md § Scenario Detection for the checklist.csp_style_witness_enumeration in the prescriptive hint — a style choice within an existing problem type, not a new problem type. See prescriptive.md § Formulation Style Detection for the checklist.For detailed question types, classification signals, and structural checklists per reasoner, see the reasoner-specific reference files: prescriptive.md, graph.md, predictive.md, rules.md.
Some questions require multiple reasoners in sequence. Each stage's output enriches the ontology, enabling the next stage.
| Chain | Pattern | What flows between stages | |-------|---------|---------------------------| | Predictive → Prescriptive | Predict parameters, then optimize | Forecasted values become constraint/objective data | | Graph → Prescriptive | Discover structure, then optimize over it | Centrality scores, cluster labels become weights/filters | | Rules → Prescriptive | Validate/classify, then optimize given compliance | Flags and classifications constrain the feasible set | | Rules → Graph | Flag entities, then analyze their structural role | Flagged nodes become the focus of graph analysis | | Graph → Predictive | Extract structural features, then predict | Centrality, component membership become prediction features | | Predictive → Rules | Predict outcomes, then enforce thresholds | Predicted scores are evaluated against business rules |
statement field: "Forecast appointment volume per clinic (predictive), then assign staff to shifts to meet expected demand (prescriptive)"reasoners: ["predictive", "prescriptive"] (ordered by execution sequence)Each stage enriches the shared ontology with new properties. Downstream stages consume those properties as if they were base data.
Property or Relationship on an existing concept via model.define(). Downstream stages reference it like any other property.model.data() and bind with model.define().|): Allows downstream stages to degrade gracefully when an upstream enrichment is missing for some entities — e.g., Entity.predicted_value | Entity.current_value.Each reasoner adds new concepts and properties to the ontology. Discovery should surface not just what's answerable now, but what becomes answerable after earlier stages run.
| Stage 1 Output | What It Adds to Ontology | Stage 2 Questions Unlocked |
|----------------|--------------------------|---------------------------|
| Graph centrality | node.centrality_score | Predictive: centrality as feature. Prescriptive: weight allocation by node importance. |
| Graph reachability | impact_count, affected flags | Prescriptive: minimize disruption to high-impact nodes. Rules: alert on critical dependencies. |
| Graph WCC / community | WCC: (node, component_id_node) membership (access .id to get its identifying value; cast to int only for integer-identified nodes); community: node.community_label (int) | Prescriptive: optimize within-cluster vs cross-cluster. Rules: flag isolated components. |
| Predictive node classification | Entity.predictions with .probs, .predicted_labels | Rules: flag above threshold. Prescriptive: incorporate risk/class as constraint. |
| Predictive node regression | Entity.predictions.predicted_value (incl. per-period forecasts) | Prescriptive: optimize against predicted values, often via aggregation/bridge concept. |
| Predictive link prediction | User.predictions with .rank, .scores, .predicted_<target> | Prescriptive: top-K predicted pairs as candidate edges in assignment/matching. Rules: flag pairs above score threshold. |
When generating suggestions:
Second-order questions are expansion opportunities, not alternatives. The analyst sees: "Here's what you can do now. Here's what opens up if you also run graph analysis."
The ontology grows through use:
Each layer makes the next more powerful. Question discovery should convey this progression — show users what opens up after each stage.
Shared across all reasoners. Classify each suggestion's data readiness:
enrich_ontology. Each gap should reference a specific source_table and source_column — without these, the enrichment tool cannot generate the correct define() rule.Order suggestions by feasibility: READY first, then MODEL_GAP. Prefer suggestions that can proceed without manual data collection.
model_gap_fixes with source_table/source_column)If the schema shows NO unmapped columns, there are no model_gaps — all suggestions should be READY. Decision variables, cross-product concepts, and computed expressions are created during formulation — they are NOT model_gaps.
inspect.schema()The trichotomy — in model / mappable from schema / needs new data — is a set-difference over three inputs:
from relationalai.semantics import inspect
schema = inspect.schema(model)
info = schema[concept_name]
# 1. What's currently mapped on this concept (identity + properties)
mapped = {f.name for f in info.identify_by} | {p.name for p in info.properties}
# 2. What columns the backing source exposes
# - For model.Table(): use table.to_schema() or INFORMATION_SCHEMA.COLUMNS
# - For model.data(df): df.columns
source_cols = set(...)
# 3. For a business question requiring specific fields:
in_model = needed & mapped
mappable = (needed - mapped) & source_cols
data_gap = needed - mapped - source_cols
in_model → READY. mappable → MODEL_GAP (each entry becomes a model_gap_fixes item with source_table/source_column). data_gap → DATA_GAP (blocks the question).
The set-difference is stable because inspect.schema() walks the whole model in one call — run it once, reuse for every candidate suggestion in a discovery pass.
Model gaps are ONLY for data in the schema but not in the model. Decision variables, cross-products, and computed expressions are NOT model gaps (formulation layer). Check "Available for enrichment" columns in schema info. Each gap must specify source_table and source_column.
For full gap classification rules (property vs relationship gaps, boundary between base model and reasoner workflow), see rai-ontology-design § Model Gap Identification.
Selecting the right question is "Phase 0" -- before any reasoner workflow begins. A poor choice wastes all downstream effort.
Focus on questions at the intersection of available data (feasible) and useful answers (valuable). Work forwards from what data exists and backwards from what decisions/insights matter.
Data feasibility green flags:
Data feasibility red flags:
Answer value green flags:
Answer value red flags:
For each candidate, score on a 1-5 scale:
| Criterion | What to assess | |-----------|---------------| | Data availability | How much required data exists today? | | Data quality | How reliable is the data? | | Decision/insight frequency | How often is this needed? | | Impact | What is the cost of not having this answer? | | Implementation path | Can the answer be acted upon? |
Prioritize questions scoring high on BOTH data AND value dimensions.
Before starting any reasoner workflow, confirm:
When suggesting questions, explore different aspects of the domain where the data supports it:
Cross-domain coverage: If the model's concepts span multiple distinct business domains, spread suggestions across them rather than clustering in one area. Identify domains semantically from concept names and relationships (e.g., concepts prefixed with "Jira" vs "GitHub" vs "RAI" suggest different domains). Aim for at least one suggestion per domain before doubling up on any.
If the domain is narrow (e.g., only budget allocation data), it's fine to suggest variations on the same theme with different objectives or constraints -- as long as each is grounded in the actual ontology and represents a meaningfully different business question.
Vary the business question itself, not just constraints -- use different objectives that reference different properties from the model.
When the user selects a question with MODEL_GAP feasibility:
enrich_ontology, not the reasoner workflowmodel_gap_fixesFor graph questions, enrichment may also include constructing derived relationships needed for graph edges (e.g., a connects_to relationship derived from Activity data linking source/target nodes).
After the user selects a question, route to the appropriate reasoner workflow based on the reasoners tag.
Discovery output serves two audiences: the user (who evaluates which questions to pursue) and the downstream reasoner workflow (which needs structured routing metadata). Keep these separate:
Each suggestion includes a reasoners field — an ordered list specifying the execution sequence. Single-reasoner questions have one entry; chained questions list stages in order.
{
"statement": "Identify which hubs are critical connectors in the network",
"reasoners": ["graph"],
"feasibility": "READY",
"implementation_hint": {
"algorithm": "eigenvector_centrality",
"graph_construction": {
"node_concept": "Node",
"directed": false,
"weighted": true,
"edge_definition": "Activity linking source_node to target_node"
},
"output_binding": "(node, centrality_score)"
}
}
Implementation hint fields vary by reasoner:
| Reasoner | Fields |
|----------|--------|
| prescriptive | decision_scope, forcing_requirement, objective_property, decision_variable, scenario_parameter, competing_objectives, csp_style_witness_enumeration |
| graph | algorithm, graph_construction (node_concept, directed, weighted, edge_definition), target_filter, output_binding |
| rules | rule_type, source_concept, condition_properties, join_path, threshold, output_type, output_property, downstream_use |
| predictive | User-facing: type (node_classification | node_regression | link_prediction), mode (pre_computed | rai_predictive). Concept routing: target_concept, target_property (classification/regression), link_target_concept (link prediction only), feature_properties, output_concept, pre_computed_table. GNN task routing (for rai_predictive mode): task_type (binary_classification | multiclass_classification | multilabel_classification | regression | link_prediction | repeated_link_prediction), eval_metric, has_time_column, temporal_column (when has_time_column=True). See predictive.md for the user-type → task_type translation rules. |
For chained questions, use a stages array in implementation_hint:
{
"statement": "Identify critical nodes, then optimize allocation weighted by importance",
"reasoners": ["graph", "prescriptive"],
"feasibility": "READY",
"implementation_hint": {
"stages": [
{
"reasoner": "graph",
"algorithm": "eigenvector_centrality",
"graph_construction": { "node_concept": "Node", "directed": false, "weighted": true },
"output_binding": "Node.centrality_score"
},
{
"reasoner": "prescriptive",
"decision_scope": "Node.allocation_quantity",
"objective_property": "maximize weighted_allocation (centrality_score * quantity)"
}
]
}
}
After discovery, load the per-reasoner execution skills:
| Selected reasoner | Skills to load (in order) |
|-------------------|---------------------------|
| prescriptive | rai-prescriptive-problem-formulation → rai-prescriptive-solver-management → rai-prescriptive-results-interpretation |
| graph | rai-graph-analysis |
| predictive | rai-predictive-modeling → rai-predictive-training |
| rules | rai-rules-authoring |
For all reasoners, also load rai-querying + rai-pyrel-coding for v1 syntax, imports, and query patterns. If the selected question is MODEL_GAP, load rai-ontology-design first to enrich the ontology before the reasoner skill runs (see Enrichment Handoff above).
Discovery covers what to ask. The reasoner-specific reference files in this skill (prescriptive.md / graph.md / predictive.md / rules.md) translate the user's framing into the technical fields each downstream skill consumes (problem_type / algorithm / task_type / rule_type). The downstream coding skills cover how to write the PyRel. Skipping the coding-skill load leads to hallucinated APIs and wrong imports.
| Mistake | Cause | Fix |
|---------|-------|-----|
| Suggesting questions with no data backing | Skipping feasibility check before proposing | Use READY/MODEL_GAP/DATA_GAP classification; verify data exists before suggesting |
| All suggestions are the same reasoner type | Only considering optimization use cases | Check ontology for graph structure, temporal features, rule patterns -- not just optimization |
| Chained question with unclear handoff | Missing interface specification between stages | Each stage must define inputs and outputs explicitly |
| Missing forcing requirement (prescriptive) | Overlooking mandatory constraint in prescriptive questions | See prescriptive.md for forcing constraint and implementation hint guidance |
| All suggestions cluster in one domain | Not surveying the full concept space | Spread across distinct business domains present in concept names |
| Confusing model gaps with reasoner-layer constructs | Treating computed outputs as missing data | Decision variables, predictions, graph metrics have no source table -- they're not model gaps |
| Suggesting DATA_GAP questions as top choices | Prioritizing novelty over feasibility | Order by feasibility: READY first, MODEL_GAP second, DATA_GAP only if domain is very narrow |
| Reference | Description | File |
|-----------|-------------|------|
| Prescriptive | Optimization problem types (resource allocation, network flow, routing, scheduling, pricing) → translate into formulation parameters for rai-prescriptive-problem-formulation | prescriptive.md |
| Graph | Graph question types (centrality, community, reachability, distance, similarity) → translate into RAI Graph algorithms for rai-graph-analysis | graph.md |
| Predictive | User-facing predictive types (node classification, node regression, link prediction) → translate into GNN task_type / eval_metric / has_time_column for rai-predictive-modeling and rai-predictive-training | predictive.md |
| Rules | Rule question types (validation, classification, derivation, alerting, reconciliation) → translate into rule_type and PyRel patterns for rai-rules-authoring | rules.md |
| Pattern | Description | File | |---------|-------------|------| | Prescriptive routing | Discovery scenario walkthrough for optimization problems | prescriptive_routing.md | | Graph routing | Discovery scenario walkthrough for graph analytics | graph_routing.md | | Rules routing | Discovery scenario walkthrough for classification, validation, and derivation rules | rules_routing.md | | Predictive routing | Discovery walkthroughs for node classification, node regression, link prediction (GNN mode) and pre-computed predictions | predictive_routing.md | | Chained routing | Discovery scenario walkthrough for multi-reasoner pipelines | chained_routing.md |
data-ai
Configure and train graph neural network (GNN) models, generate predictions, evaluate results, and manage trained models. Use when ready to train, generate predictions, evaluate, or manage models; for concepts, data loading, edges, and feature configuration, see `rai-predictive-modeling`.
development
Build graph neural network (GNN) models — concepts, Snowflake data loading, task relationships, graph edges, and PropertyTransformer features. Use for node classification, regression, and link prediction tasks; for training, predictions, and evaluation, see `rai-predictive-training`.
development
Setup and configuration for RelationalAI — first-time install walkthrough and all raiconfig.yaml tuning. Use when installing RAI, connecting to Snowflake, or editing raiconfig.yaml. Not for writing PyRel model code (see rai-pyrel-coding) or solver usage and diagnostics (see rai-prescriptive-solver-management).
testing
Converts natural language business rules into PyRel derived properties — validation, classification, derivation, alerting, and reconciliation. Use whenever a task assigns each entity a new tier, segment, score, or flag, or derives a new property; author it here as a derived property, then query it with rai-querying.