skills/rai-querying/SKILL.md
Covers query construction in PyRel v1 including aggregation, derived concepts, filtering, ordering, multi-concept joins, and data export. Use when building queries or extracting data from RAI models.
npx skillsauth add RelationalAI/rai-agent-skills rai-queryingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
What: Query construction — producing actual table output via select().to_df(). Covers aggregation, filtering, joins, ordering, and data export in PyRel v1.
Scope boundary: A "query" in this skill means code that produces a DataFrame output via select().to_df(). Model logic (definitions, computed properties, derived relationships, entity creation) belongs in rai-pyrel-coding — see its Definitions section. Queries should be simple. Most logic should live in definitions; queries should primarily select and aggregate values that definitions have already computed. If a query has complex logic, consider extracting it into a definition instead.
When to use:
select, where, to_df)count, sum, min, max, avg with per/where)in_, not_, union, rank/top/bottom).ref() in select contextsrai-prescriptive-results-interpretation)When NOT to use:
rai-pyrel-codingrai-prescriptive-problem-formulationrai-prescriptive-results-interpretationOverview: Reference skill. Key lookup areas: Query Basics (where/select/to_df), Aggregation (sum/count/per), Filtering (in_/not_/union), Multi-Concept Joins, Data Export.
from relationalai.semantics.std import aggregates as aggs
# Basic query — prefer model.where/model.select for multi-model safety
result = model.where(Order.status == "active").select(Order.id, Order.total).to_df()
# Grouped aggregation
revenue = model.where(Order).select(
aggs.sum(Order.total).per(Order.customer).alias("revenue"),
Order.customer.name.alias("customer")
).to_df()
# Conditional aggregation
delayed = aggs.count(Shipment).per(Supplier).where(Shipment.is_delayed())
# Distinct wrapper — everything inside distinct()
unique = model.select(distinct(Product.category.alias("cat"))).to_df()
Core pattern: model.where(conditions).select(expressions) chained with .to_df() for execution.
Best practice: Use model.where() / model.select() instead of standalone where() / select(). The standalone forms are convenience wrappers that only work when exactly one Model exists in the process. With two or more models, they raise "Multiple Models have been defined." Using model.where()/model.select() guarantees compatibility in multi-model scenarios, makes code portable when copying between scripts, and reduces import requirements.
# Preferred: model method form
result = model.where(
Business.is_high_value_customer(),
Business.receives_shipment(Shipment),
Shipment.is_delayed()
).select(
Business.name.alias("customer_name"),
Shipment.delay_days.alias("delay_days")
).to_df()
.alias("name") — rename output columns for clean DataFrames:
model.where(
Shipment.supplier(Shipment, Supplier)
).select(
Supplier.name.alias("supplier_name"),
Supplier.id.alias("supplier_id"),
)
.to_df() — execute query and return a pandas DataFrame.
print(expr) — shows the PyRel structure of an expression, concept, or relationship without executing a query. Use to verify expression composition before running:
print(Order.customer) # → Order.customer
print(aggs.sum(Order.total).per(Customer)) # → (sum Order.total (per Customer))
print(model.where(Order.status == "active")) # → (where Order.status == 'active')
.inspect() — convenience for debugging; executes the query and prints the DataFrame to stdout. Equivalent to print(rel.to_df()).
Shipment.supplier.inspect() # Quick check of a relationship's contents
distinct(...) — deduplicate result rows. Frequently needed with aggregations to prevent validation errors.
All columns in a select() must be either all inside distinct() or all outside — mixing causes a runtime error. Grouped aggregation queries (grouping by a property value, not an entity) require distinct() to get one row per group; without it, PyRel returns one row per entity with duplicated aggregation values.
When to use distinct():
weather_condition, category): use distinct() — without it you get duplicated rows.per(Customer)): usually not needed, since each entity is already unique.per(Supplier, Dest.region)): treat as property-value grouping — use distinct()distinct()For detailed distinct() code examples (multi-column, grouped aggregation, common mistakes), see distinct-patterns.md.
Import aggregates via module alias to avoid shadowing Python builtins:
from relationalai.semantics.std import aggregates as aggs
# Use aggs.count, aggs.sum, aggs.max, aggs.min, aggs.avg
from relationalai.semantics.std.aggregates import string_join, rank, desc, asc
Available aggregates: count, sum, min, max, avg, string_join.
avg — available in queries, NOT in solver contexts:
# Query: average order value per customer
aggs.avg(Order.total).per(Customer).alias("avg_order_value")
# WILL RAISE NotImplementedError if used in problem.satisfy() or problem.minimize/maximize
# Solver-supported aggregates: sum, min, max, count only
.per(group) for group-by semantics:
# Count shipments per supplier
aggs.count(Shipment).per(Shipment.supplier).alias("total_shipments")
# Sum with arithmetic in aggregate
aggs.sum(Shipment.quantity * Shipment.delay_days).per(Shipment.supplier).alias("qty_delay_impact")
# Sum per entity
aggs.sum(Shipment.delay_days).per(Shipment.supplier).alias("total_delay_days")
Multi-key grouping:
aggs.sum(customer.receives_shipment.quantity).per(customer, SKU).alias("quantity_at_risk")
Aggregation with unary relationship filter:
# Count only delayed shipments (where is_delayed is a unary relationship)
aggs.count(Shipment.is_delayed()).per(Shipment.supplier).alias("delayed_shipments")
Full query with mixed aggregations:
return model.where(
Business.is_high_value_customer(),
Business.receives_shipment(Shipment),
Shipment.is_delayed()
).select(
aggs.count(Business).alias("high_value_customers"),
aggs.count(Shipment).alias("total_shipments"),
aggs.sum(Shipment.quantity).alias("delayed_quantity")
)
Advanced aggregation: Conditional .where() on aggregates (subquery filters), per(group).sum(expr) standalone form, and conditional count(X, condition).
See aggregation-advanced.md for examples.
Note: These
define()/where()patterns are model logic, not queries. They are included here as a quick reference because they often precede query construction. For the full treatment of definitions — the core of PyRel coding — seerai-pyrel-coding§ Definitions.
model.define(...).where(...) — derive from conditions:
# Unary relationship as boolean flag
model.define(Shipment.is_delayed()).where(Shipment.delay_days > 0)
# Named intermediate relationship for staged computation
relevant_shipment = model.Relationship(f"Delayed Shipment in Last Quarter: {Shipment}")
model.define(relevant_shipment(Shipment)).where(
Shipment.fiscal_quarter == aggs.max(Shipment.fiscal_quarter)
)
model.where(...).define(...) — both directions are valid:
# Define a target subset
target_supplier = model.Relationship(f"Target Supplier: {Business}")
model.define(target_supplier(Business)).where(
Business.name == supplier_name
)
Computed properties from aggregation:
# Computed metric: count of operations per site
model.where(
Operation.destination_site(op, site),
Operation.type(op, "SHIP")
).define(Site.count_is_destination(site, aggs.count(op).per(site)))
For multi-concept joins, reusable query fragments, and Snowflake export patterns, see joins-and-export.md.
Subtypes inherit parent properties, but you MUST bind the subtype to its parent and access properties through the PARENT concept. Accessing properties directly on the subtype causes TyperError.
# CORRECT — bind subtype to parent, access properties via parent
results = model.where(
HighChurnRiskCustomer(Customer),
).select(
Customer.full_name.alias("name"),
Customer.churn_score.alias("churn_score"),
).to_df()
# WRONG — accessing properties on subtype causes TyperError
# results = model.where(HighChurnRiskCustomer).select(
# HighChurnRiskCustomer.full_name.alias("name"), # FAILS!
# ).to_df()
Counting subtypes:
# CORRECT — count parent concept, filter by subtype binding
results = model.select(
aggs.count(FoodTruck).alias("count"),
).where(
UnderutilizedFoodTruck(FoodTruck),
).to_df()
# WRONG — counting subtype directly causes TyperError
# results = model.select(rai.count(UnderutilizedFoodTruck).alias("count")).to_df()
Boolean relationships as filters (not selectable):
Boolean relationships (unary flags like is_event_day, is_active) can ONLY be used as filters in .where(). They CANNOT be placed in select() with .alias().
# CORRECT — boolean flag as filter in where()
results = model.where(
DailyDeployment.is_event_day(),
).select(
DailyDeployment.deployment_date.alias("date"),
).to_df()
# WRONG — boolean relationship in select() causes TyperError
# results = model.select(DailyDeployment.is_event_day.alias("flag")).to_df()
To project a boolean flag as a column, use the two-query pandas pattern: query all entities, query the flagged subset, then merge with df["flag"] = df["id"].isin(flagged_ids).
Chain .where() conditions to filter results. Each .where() adds an AND condition and can introduce new concept refs.
# Numeric and string filtering
model.where(Product.price > 100, Product.category("electronics")).select(Product.name)
# Sorting: order_by() is NOT yet available in v1.
# Use std.aggregates for ranking/ordering:
from relationalai.semantics.std.aggregates import rank, rank_asc, rank_desc, desc, asc, top, bottom, limit
# Rank by descending value
model.where(Order.status("open")).select(
rank(desc(Order.value)).alias("rank"),
Order.id, Order.value
)
# Top-N / Bottom-N
top(3, Order.value) # Top 3 by value
bottom(5, Product.price) # Bottom 5 by price
Set membership with .in_():
model.where(LineItem.ship_mode.in_(["AIR", "AIR REG"])).select(...)
model.where(Part.container.in_(["SM CASE", "SM BOX", "SM PACK"])).select(...)
Negation with model.not_():
# Entities that DON'T have a relationship
model.where(
order := Order.ref(),
model.not_(order.customer), # orders without a customer
).select(order.id.alias("orphan_order_id"))
# Negated condition in aggregation
avg_without = aggs.avg(order.total).per(product).where(
model.not_(order_has_product(order, orderitem, product))
)
not_() — grouped vs. separate negation:
# NOT (A AND B):
model.not_(Person.pets, Person.pets.name == "boots")
# (NOT A) AND (NOT B):
model.not_(Person.pets), model.not_(Person.pets.name == "boots")
model.union() vs |: model.union() collects ALL matching branches (set union / OR-filter). | evaluates left-to-right and picks the first that succeeds (ordered fallback / if-then-else). Use | for defaults and case-when chains; use model.union() for OR-filtering. For full semantics, see rai-pyrel-coding/expression-rules.md.
For extended not_() examples, union() patterns, and the HAVING equivalent, see filtering-advanced.md.
Dynamic query construction: Build a base query and conditionally append .where() clauses. Each additional .where() narrows the result set without modifying prior conditions. This is useful when filter criteria come from user input or runtime parameters.
Requirements are reusable checks that express what must be true in your domain. They fire at materialization time (to_df() or exec()). A failing requirement makes all queries on that model fail.
Three forms with distinct semantics:
# 1. Global — passes if AT LEAST ONE entity satisfies expr
model.require(Order.status == "pending")
# 2. Per-entity — EVERY instance must satisfy; fails if any violates
Order.require(Order.amount >= 0)
# Equivalent to:
model.where(Order).require(Order.amount >= 0)
# 3. Scoped — every match in the where scope must satisfy
model.where(Order.amount > 0).require(Order.customer)
Key distinction: model.require(expr) is an existence check ("at least one"); Concept.require(expr) is a universal check ("all of them"). Choose based on your invariant.
Requirements fire at to_df() / exec() time — not at definition time. There is no .check() or .validate() method.
In solver context: problem.satisfy(model.require(expr)) promotes the requirement to a solver constraint:
problem.satisfy(model.require(supply >= demand)) # hard constraint in the solver
relationalai.semantics.inspect (v1.0.14+) is the recommended public API for discovering model structure at runtime:
inspect.schema(model) — full frozen ModelSchema with concepts, properties (including inherited), relationships, tables, inline data sources, enums, and rules. Supports ms["Person"] dict-style access and ms.to_dict() for JSON-safe serialization.inspect.fields(rel) — tuple[FieldRef, ...] usable directly in select(*inspect.fields(rel)); handles inheritance and alt readings.inspect.to_concept(obj) — resolve any DSL handle (Chain, Ref, FieldRef, Expression) to its underlying Concept.Use inspect-before-authoring to catch duplicate/hallucinated properties, inspect-after-scaffolding to verify what actually registered, and re-inspect after long sessions or /compact to avoid acting on stale mental models. See inspect-module.md.
Lower-level model.concepts / model.relationships / model.tables / model.data_items remain available as a fallback — see model-introspection.md.
| Mistake | Cause | Fix |
|---------|-------|-----|
| Dot-chain returns ALL related entities, not filtered ones | .uses.id in select creates independent lookup, ignoring where bindings | Use bound concept in select (Asset.id) or chain through application (Employee.uses(Asset).id) |
| Empty DataFrame from query | Missing define() for computed values, or .new() matched no rows | Verify entities exist and derived values are define()d before querying |
| ValidationError: Unused variable | Same concept ref reused in independent aggregation contexts | Use separate named refs (Customer.ref("t1"), Customer.ref("t2")) |
| Duplicate rows in aggregation results | Missing distinct() wrapper | Wrap select(distinct(...)) — ALL columns must be inside distinct() |
| Grouped aggregation returns N rows instead of grouped rows | Grouping by a property value without distinct() | Use select(distinct(property.alias(...), agg.per(property).alias(...))) — see Grouped Aggregation pattern above |
| Subtype query returns TypeError | Accessing properties directly on subtype (model.Subtype.prop) | Bind subtype to parent: model.where(model.Subtype(model.Parent)).select(model.Parent.prop.alias(...)) |
| rai.count(model.SubtypeName) in bare select fails | Counting subtype directly without parent binding | Use model.select(rai.count(model.Parent).alias("n")).where(model.Subtype(model.Parent)) |
| Mixing bare select with distinct | select(X.name, distinct(X.cat)) — can't mix | Either wrap ALL columns in distinct() or none |
| .where() on wrong target | Calling .where() directly on a Concept (Site.where(...)) | .where() goes on aggregations, constraints, definitions, or queries — not on bare concepts |
| Using standalone where()/select() with multiple models | "Multiple Models have been defined." error | Use model.where()/model.select() instead of standalone imports |
| Aggregation returns empty DataFrame instead of zero | aggs.count(X).where(no_match) returns no rows, not a row with 0 | Use \| 0 default: aggs.count(Shipment).where(Shipment.supplier.name == "foo") \| 0 |
| .exists() on Properties raises error | Concept.prop.exists() — RAIException: Cannot access relationships on core concept 'Float'. | Use ref binding: r = Float.ref(); model.where(Concept.prop(r)).select(r.alias("val")) |
| Inflated aggregation from multi-relationship select | Binding two relationships through the same concept in one select (e.g., A.r1(C) and B.r2(C)) creates a cartesian product of matching pairs — aggregation counts silently wrong | Split into separate queries per relationship, or pre-aggregate with derived properties. This is distinct from prescriptive cross-product concepts — it applies to any select with multiple join paths through a shared concept |
| Silent column renaming (_2, _3 suffixes) | Two concepts in a query share a property name (e.g., both have .name) — to_df() silently appends _2 to disambiguate | Always apply .alias() to every selected property. Without explicit aliases, downstream code referencing the original column name gets wrong data or KeyError |
| Pattern | Description | File |
|---|---|---|
| Aggregation queries | model.select(), .alias(), sum/count with .per(), multi-hop joins, .to_df() | examples/aggregation_queries.py |
| Computed properties | std.datetime arithmetic, enum-subconcept segmentation, argmax with tiebreaker | examples/datetime_argmax_segmentation.py |
| inspect.schema() summary | Dump registered concepts/properties, dict-style access, does-property-exist check, JSON-safe to_dict() | examples/inspect_schema_summary.py |
| inspect.fields() unpack | select(*inspect.fields(rel)) canonical idiom + include_owner=True variant | examples/inspect_fields_unpack.py |
| Reference | Description | File |
|-----------|-------------|------|
| Joins & export | Multi-concept joins, reusable query fragments, Snowflake write-back | joins-and-export.md |
| Distinct patterns | Code examples for distinct() usage in select queries | distinct-patterns.md |
| Advanced aggregation | Conditional .where() on aggregates, per().sum() standalone, conditional count | aggregation-advanced.md |
| Advanced filtering | Extended not_() examples, union() patterns, HAVING equivalent | filtering-advanced.md |
| inspect module | Public model-introspection API: inspect.schema, inspect.fields, inspect.to_concept | inspect-module.md |
| Model introspection (lower level) | model.concepts/relationships/tables/data_items fallback API | model-introspection.md |
data-ai
Configure and train graph neural network (GNN) models, generate predictions, evaluate results, and manage trained models. Use when ready to train, generate predictions, evaluate, or manage models; for concepts, data loading, edges, and feature configuration, see `rai-predictive-modeling`.
development
Build graph neural network (GNN) models — concepts, Snowflake data loading, task relationships, graph edges, and PropertyTransformer features. Use for node classification, regression, and link prediction tasks; for training, predictions, and evaluation, see `rai-predictive-training`.
development
Setup and configuration for RelationalAI — first-time install walkthrough and all raiconfig.yaml tuning. Use when installing RAI, connecting to Snowflake, or editing raiconfig.yaml. Not for writing PyRel model code (see rai-pyrel-coding) or solver usage and diagnostics (see rai-prescriptive-solver-management).
testing
Converts natural language business rules into PyRel derived properties — validation, classification, derivation, alerting, and reconciliation. Use whenever a task assigns each entity a new tier, segment, score, or flag, or derives a new property; author it here as a derived property, then query it with rai-querying.