skills/rai-prescriptive-problem-formulation/SKILL.md
Formulates optimization problems from ontology models covering decision variables, constraints, objectives, and common patterns. Use when building, reviewing, or debugging a formulation.
npx skillsauth add RelationalAI/rai-agent-skills rai-prescriptive-problem-formulationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
What: Optimization formulation — decision variables, constraints, objectives, and common problem patterns. Assumes a problem has already been selected via discovery.
When to use:
When NOT to use:
rai-prescriptive-results-interpretationrai-discoveryrai-pyrel-codingrai-ontology-designrai-prescriptive-solver-managementrai-queryingOverview:
inspect.schema(model) — concepts, properties, types, relationships you're about to referencefrom relationalai.semantics.reasoners.prescriptive import Problem
problem = Problem(model, Float)
| Method | Signature | Purpose |
|--------|-----------|---------|
| solve_for | (expr, where=, populate=True, name=, type=, lower=, upper=, start=) | Declare decision variable. Returns ProblemVariable (a Concept usable in model.define(), model.select(), .ref()). type: "cont", "int", "bin" |
| satisfy | (expr, name=) | Add constraint. Returns ProblemConstraint (a Concept). |
| minimize | (expr, name=) | Set minimization objective. Returns ProblemObjective (a Concept). |
| maximize | (expr, name=) | Set maximization objective. Returns ProblemObjective (a Concept). |
| solve | (solver, time_limit_sec=, ...) | Execute solve. Solvers: "highs", "minizinc", "ipopt", "gurobi" |
| verify | (*fragments) | Post-solve constraint verification |
| Variable.values | (sol_index, value_ref) | Property on ProblemVariable. Extracts solution values at sol_index (0-based), binding each value to value_ref (a Float.ref() or Integer.ref()). Use inside model.select(...).where(var.values(sol_index, value_ref)). Primary pattern for populate=False workflows. |
| display | (part=) | Print formulation summary |
Interaction mode: Before starting, ask the user which mode they prefer:
After a question is selected (from question discovery) and the ontology is enriched (if needed), build the formulation in this order:
Prescriptive formulations reference concepts, properties, and relationships from an existing model. Before writing solve_for / satisfy / minimize / maximize, confirm every name and type you're about to use against the real schema:
from relationalai.semantics import inspect
schema = inspect.schema(model)
# Every concept referenced in the formulation
for name in referenced_concepts:
assert name in schema, f"Concept {name} not in model"
props = schema[name].properties
# surface any properties used in bounds, constraints, or the objective
Catches two silent-failure modes that account for most prescriptive errors:
Customer.tier in a constraint when the real property is Customer.category. Solver happily runs on wrong variables and returns nonsense.Integer property as if it were Float (or vice versa) when the type now propagates from TableSchema. Silent coercion masks incorrect bound derivation.When to skip: this step is cheap but not free. Skip on small greenfield models or one-shot formulations where the model fits in a single code block you just wrote.
What decisions are being made? What can the solver control?
What rules must the solution satisfy?
What are we optimizing?
Is the formulation complete and correct?
.where() clauses connect to actual dataPre-solver audit: before calling problem.solve(...), run a two-step check.
(a) Registration. solve_for / satisfy / minimize / maximize register concepts named Variable, Constraint, Objective (plus a per-solve Variable_<id> subconcept for each decision variable). They appear in inspect.schema(model).concepts:
from relationalai.semantics import inspect
schema = inspect.schema(model)
variables = [c for c in schema.concepts if "Variable" in c.extends]
constraints = [c for c in schema.concepts if c.name == "Constraint" or "Constraint" in c.extends]
objectives = [c for c in schema.concepts if c.name == "Objective" or "Objective" in c.extends]
# Confirm one Variable_<id> per solve_for call, one Constraint_<id> per satisfy,
# one Objective_<id> per minimize/maximize.
(b) Binding cardinality. Registration does NOT mean the variable binds to any rows. A solve_for(..., where=[always_false]) still registers a Variable_<id> subconcept but has zero bindings — the solver will run on an empty decision set. Check each Variable_<id> for non-empty binding:
for var_concept in variables:
resolved = model.concept_index[var_concept.name]
n = len(model.select(resolved).to_df())
if n == 0:
# The where= clause excluded every row. Fix the predicate
# (wrong property name, wrong threshold, missing join) and re-check.
raise ValueError(f"{var_concept.name} has 0 bindings")
Together, (a) and (b) are the downstream complement to Step 1's base-ontology grounding: Step 1 verifies the inputs to formulation exist; Step 5 verifies the outputs registered correctly and bound to data.
Can we reduce complexity without losing correctness?
Is the formulation complete — including constraints the user couldn't articulate upfront?
Steps 1-6 produce the best formulation you can build from what the user has told you. Step 7 discovers what they couldn't tell you until they saw a concrete result. Most real-world formulations require at least one pass through Step 7.
For detailed patterns for each step, see variable-formulation.md, constraint-formulation.md, and objective-formulation.md.
These are overarching principles that apply to all optimization formulations regardless of problem type or solver.
When presenting variables, constraints, and objectives to the user, describe them in business terms first ("ensure each customer's demand is met," "don't exceed warehouse capacity"), then provide the technical formulation. The analyst selects based on business understanding; the code is generated behind the scenes. Never force users to think in mathematical terms -- business language in, business language out, with valid PyRel as the executable bridge.
Natural language rule for all user-facing text: Use domain-natural language in every description, rationale, business_mapping, problem statement, and explanation field. Technical Concept.property references confuse business users — translate them to readable phrases:
Operation.cost_per_unit -> "cost per unit for each operation"sum(Shipment.quantity) -> "total shipment volume"Site.capacity -> "each site's available capacity"UnmetDemand.x_slack -> "unmet demand quantity"sum(Assignment.x_assigned).per(Worker) -> "number of assignments per worker"Code snippets in solver_registration, expression, and entity_creation fields remain technical (valid PyRel). But every field the user reads should sound like a business analyst wrote it, not a database query.
Constraints are rarely handed to you complete. They emerge through two complementary phases: asking the right questions before solving, and using results to surface preferences the user couldn't articulate in the abstract. This section covers user-facing elicitation techniques. For model-structural constraint discovery (boundary probes, structural probes, multi-concept probes), see constraint-formulation.md > Constraint Discovery Patterns.
Non-OR users rarely describe their problem in terms of "constraints" and "objectives." Use these diagnostic questions to surface the formulation elements:
| Question to ask | What it surfaces | |----------------|-----------------| | "What limits must the solution respect?" | Capacity constraints (budget, headcount, storage, time) | | "What must every solution achieve?" | Forcing/requirement constraints (meet all demand, cover all shifts) | | "What would you prefer if possible, but could live without?" | Soft goals → objective terms, not hard constraints | | "What makes a solution completely unacceptable?" | Hard constraint violations (safety, regulatory, contractual) | | "Are there minimum service or coverage levels?" | Lower-bound forcing constraints |
Technique: Start with "What makes a solution unacceptable?" — this reliably surfaces hard constraints. Then ask "What would make one acceptable solution better than another?" — this surfaces objective terms.
Common business phrases are ambiguous between constraint and objective. Always clarify before formulating.
| Business phrase | Interpretation A (constraint) | Interpretation B (objective) |
|----------------|-------------------------------|------------------------------|
| "Keep costs under $X" | Hard budget: total_cost <= X | Minimize cost (no hard cap) |
| "Each store should get at least 100 units" | Hard minimum: supply[s] >= 100 | Soft target: penalize shortfall in objective |
| "Try to balance across regions" | Hard fairness: max - min <= threshold | Minimize imbalance in objective |
| "We need to cover all shifts" | Hard coverage: sum(assign[s,w]) >= 1 for all s | Maximize coverage (allow gaps) |
| "Don't use more than 3 suppliers" | Hard cardinality: sum(use[s]) <= 3 | Minimize number of active suppliers |
Decision rule: If violating it makes the solution invalid or unacceptable → constraint. If it is a preference or "nice to have" → objective term. When unclear, default to soft (objective) and ask the user: "If the optimizer found a solution that violates this but saves 20% on cost, would that be acceptable?"
Pre-solve elicitation has a fundamental limit: users cannot always articulate preferences until they see a concrete result that violates them. "No constraints" often means "I can't think of any right now," not "anything goes." Preferences may be real but latent — only surfaceable through confrontation with a specific proposal.
Principle: Use results as an elicitation tool. The first solve with minimal constraints is diagnostic, not prescriptive — its purpose is to provoke reactions that reveal the real formulation. This is Step 6 of the Formulation Workflow.
The refinement loop:
Skill routing within the loop: Steps 1 and 5 involve solving — use rai-prescriptive-solver-management for solver execution. If the result is technically wrong (infeasible, all-zero, solver error), route to rai-prescriptive-results-interpretation for diagnosis — it will route back here once the issue is classified as a missing constraint or incomplete formulation. This skill governs steps 2-4: the result is optimal and technically valid, but the user's reaction reveals the formulation is incomplete.
Presenting results to surface latent preferences:
Don't just show optimal values. Frame them to make implicit preferences visible:
Post-solve reaction questions:
| Question | What it surfaces | |----------|-----------------| | "Does anything in this result feel wrong or surprising?" | Latent hard constraints | | "Which value would you change first?" | The tightest latent preference | | "Would you be comfortable acting on this / presenting this?" | Social, organizational, or reputational constraints beyond personal preference | | "If this were the only feasible solution, would you change your requirements?" | Whether the discomfort is a hard constraint or a negotiable preference |
Disambiguating the rejection:
When a user rejects an aspect of the result, the rejection is ambiguous. Before adding a constraint, determine:
When to stop iterating:
Feasibility pressure: If repeated rejections shrink the feasible region toward infeasibility, pause and present the tension explicitly: these preferences conflict, and the user must prioritize. This is itself a form of constraint elicitation — forcing a ranking among competing bounds.
Documenting the trail: Keep a running log of each constraint added and the user reaction that motivated it. This captures the "why" behind bounds that would otherwise look arbitrary in the final formulation — valuable for model maintenance and stakeholder review.
Derive the mapping from the ontology structure and the user's stated goals. The ontology's concepts, properties, and relationships tell you what can be controlled (variables), what has limits (constraints), and what the user wants to achieve (objective). Steps 1-3 of the Formulation Workflow provide the process for this.
If you suggest MULTIPLE cross-product/junction concepts, coordinate them as follows:
1. Flow Networks -- If concepts represent flow at different stages:
These typically need conservation constraints: inflow = outflow at pure transshipment nodes, or inventory balance at storage nodes. In rationale: Note which base entity will need a balance constraint.
2. Selection + Quantity -- If one concept is binary (use/don't use) and another is continuous quantity on related entities:
These typically need linking: quantity <= capacity * selection In rationale: Note the linking relationship needed.
3. Shared Base Entities -- If multiple decision concepts connect to the SAME base entity (e.g., both touch Site via relationships):
These often need a balance/conservation constraint at that entity. In rationale: Explicitly state "Links to [OtherConcept] via [SharedBase]"
NOTE: Without linking constraints, multiple decision concepts may produce:
This is often unintended, but not always wrong — the user may intentionally leave variables unlinked. Flag it as something to verify, not as an error.
RECOMMENDED in rationale for multi-concept suggestions:
Users often propose formulations that seem natural from a business perspective but create unnecessary complexity. Key simplification heuristics:
For detailed heuristics, examples, and the over-specification recognition table, see formulation-simplification.md.
| Mistake | Cause | Fix |
|---------|-------|-----|
| All-zero solution on minimize | Missing forcing constraints (demand satisfaction, coverage) | Add sum(x).per(Entity) >= Entity.demand or equivalent |
| Infeasible after adding constraints | Conflicting bounds or over-specified assignments | Organize constraints into essential/full tiers; add incrementally |
| Variables created but unused in objective | solve_for registered but objective references different properties | Verify objective expression includes all decision variable properties |
| Wrong aggregation scope | .per(Y) but Y not joined to the summed concept | Add explicit relationship join in .where() |
| Big-M too loose -> slow solve | Using arbitrary 999999 instead of data-driven bound | Use M = capacity or M = max_demand from entity properties |
| Missing forcing requirement | MINIMIZE objective with no forcing constraint yields zero | Always identify what real-world requirement forces positive activity |
| Constraint references unwired relationship | Relationship declared but no define() data binding | Verify all relationships in .where() joins have define() rules. Unwired relationships cause TyperError or silently match zero entities. |
| problem.satisfy() or model.define() in a Python loop | Defining constraints per entity in a for loop instead of declaratively | Use vectorized .where().define() or problem.satisfy() with .per(). See rai-pyrel-coding Common Pitfalls for before/after examples |
| Duplicate relationship / FDError on re-solve | Solving multiple scenarios with populate=True (default) writes conflicting results to the graph | Use populate=False + Variable.values() to extract results. Create a fresh Problem per loop iteration. See known-limitations.md > Re-Solve Behavior. |
| TyperError at solve time with concept-type identify_by | Cross-product concept using identify_by={"a": ConceptA, "b": ConceptB} passes queries but fails during problem.solve() type inference | Use flat identity keys (String or Integer). Encode composite keys as strings (e.g., f"{a_id}_{b_id}") or use separate primitive properties for each dimension |
| .per(Concept.property) silently ignored in solver constraints | Property-value grouping (e.g., .per(Slot.group_name)) doesn't translate to solver constraints — produces all-zero "optimal" solution | Use entity-level .per(ParentConcept) with a relationship join: create a parent concept for the grouping dimension and link via Relationship, then group with .per(Parent).where(Child.parent(Parent)) |
| Forcing constraint added when objective already penalizes inaction | Adding >= 1 forcing alongside a cost-penalty objective over-constrains the problem — turns an OPTIMAL-with-cost-tradeoff into INFEASIBLE. Distinct from rows above where forcing IS needed (no penalty mechanism) | Check: does the objective already penalize zero activity? If yes, forcing is redundant. Only add forcing constraints explicitly required by the problem statement |
| Infeasible but not caught before solve | Feasibility arithmetic not validated — e.g., 50 entities need service, 4 periods, max 5/period = 20 slots < 50 needed | Before formulating, verify: entity_count / periods / capacity_per_period fits. If not, adjust parameters or confirm the problem allows partial coverage |
| Linear objective over continuous decision variables collapses to one entity | LP pushes to the boundary — without a per-entity upper cap the max-coefficient entity absorbs all budget/weight. Symptom: "+X% lift" headlines masking a single-winner solution. | Add a per-entity upper cap (e.g., w_i <= 3 * current_i), switch to a concave objective (sqrt, log), or piecewise-linear saturation curves. |
| solve_for(where=expr) raises [Invalid operator] Cannot use python's 'bool check' | where argument is iterated as a tuple; passing a bare expression triggers PyRel's __bool__ guard | Wrap in a list: where=[Concept.prop >= threshold, ...] |
For detailed unwired relationship symptoms, checks, and code examples, see constraint-formulation.md > Unwired Relationships (Detailed).
For all example problems and the patterns they demonstrate, see examples-index.md.
When reviewing an existing formulation, see formulation-analysis-context.md.
model.union()Do not use + to combine cost terms from independent concept groups — this causes AssertionError: Union outputs must be Vars. Use model.union() instead.
Critical: Each branch of model.union() must be a per-entity expression (bound to a concept), NOT a fully-aggregated scalar. Keep costs at concept level and let the outer sum() aggregate:
# CORRECT: per-entity cost expressions inside model.union()
problem.minimize(sum(model.union(
FreightGroup.holding_cost * sum(x_inv).per(FreightGroup).where(...), # per-FreightGroup
Arc.transport_cost * Arc.x_flow, # per-Arc
Factory.unit_cost * Factory.x_production, # per-Factory
)))
# WRONG: scalar sums inside model.union()
problem.minimize(sum(model.union(
sum(x * FreightGroup.cost), # scalar — causes AssertionError
sum(Arc.x_flow * Arc.cost), # scalar — causes AssertionError
)))
For parametric (time-indexed) variables, use sum(var).per(Concept).where(...) to aggregate over time while keeping per-entity:
prod_cost = ProdCapacity.production_cost * sum(x_prod).per(ProdCapacity).where(
ProdCapacity.x_production(t, x_prod))
model.union() collects ALL matching values from each branch (set union semantics). This is distinct from | (pipe), which picks the first successful branch (ordered fallback).
Additional v1 pitfalls with parametric variables:
name=[] must NOT traverse relationships — use identity fields (e.g., ProdCapacity.site_id) not ProdCapacity.site.name (causes FD violation)site_id as identify_by, rename one (e.g., wk_site_id) to avoid ambiguityminimize()/maximize() callsFor constraint naming with lists, re-solve behavior (multi-scenario patterns), | 0 fallback limitation, and numpy type casting, see known-limitations.md.
PyRel's model and problem APIs are append-only. Every call to model.define(), model.Property(), model.Concept(), problem.solve_for(), problem.satisfy(), problem.minimize()/problem.maximize() adds to the model or problem. There is no API to remove, replace, or modify any existing element.
This applies to the entire stack:
model.Property() or model.Relationship() grows the model. You cannot delete or rename an existing property.model.Concept() calls add concepts. Existing concepts cannot be removed.problem.solve_for() registers an additional decision variable. You cannot unregister one.problem.satisfy() accumulates. Adding a "corrected" version does not replace the original — both remain active, and the tighter one binds.problem.minimize() or problem.maximize() per Problem.Practical impact:
Problem per scenario.| Reference | Description | File |
|-----------|-------------|------|
| Variable formulation | Types, bounds, scope, entity creation, slack variables, context integration | variable-formulation.md |
| Constraint formulation | Forcing, capacity, balance, linking, .where() scoping, parameter derivation | constraint-formulation.md |
| Objective formulation | Direction, multi-component, penalty terms, scenario formulation | objective-formulation.md |
| Problem patterns & validation | Common patterns (assignment, flow, knapsack) and the validation checklist | problem-patterns-and-validation.md |
| Global constraints | all_different, implies, SOS1/SOS2 syntax, solver requirements, CP vs MIP guide | global-constraints.md |
| Scenario analysis | Scenario Concept vs Loop + where= patterns, decision matrix, code examples | scenario-analysis.md |
| Formulation simplification | Static vs dynamic parameters, goals vs constraints, grouped constraints, over-specification | formulation-simplification.md |
| Multi-objective formulation | Approach selection, epsilon constraint method, tension heuristics, pitfalls | multi-objective-formulation.md |
| Examples index | All example problems with patterns demonstrated | examples-index.md |
| Formulation analysis context | Naming conventions, alias handling, expression parsing, aggregation patterns for review | formulation-analysis-context.md |
| Known limitations (secondary) | Constraint naming, re-solve behavior, \| 0 fallback limitation, numpy type casting | known-limitations.md |
data-ai
Configure and train graph neural network (GNN) models, generate predictions, evaluate results, and manage trained models. Use when ready to train, generate predictions, evaluate, or manage models; for concepts, data loading, edges, and feature configuration, see `rai-predictive-modeling`.
development
Build graph neural network (GNN) models — concepts, Snowflake data loading, task relationships, graph edges, and PropertyTransformer features. Use for node classification, regression, and link prediction tasks; for training, predictions, and evaluation, see `rai-predictive-training`.
development
Setup and configuration for RelationalAI — first-time install walkthrough and all raiconfig.yaml tuning. Use when installing RAI, connecting to Snowflake, or editing raiconfig.yaml. Not for writing PyRel model code (see rai-pyrel-coding) or solver usage and diagnostics (see rai-prescriptive-solver-management).
testing
Converts natural language business rules into PyRel derived properties — validation, classification, derivation, alerting, and reconciliation. Use whenever a task assigns each entity a new tier, segment, score, or flag, or derives a new property; author it here as a derived property, then query it with rai-querying.