skills/dspy-infer-rules/SKILL.md
Use when you want to extract interpretable decision logic from labeled examples — generating explicit rules that explain patterns in your data. Common scenarios - extracting business rules from labeled classification examples, understanding why a model makes certain predictions, generating human-readable decision criteria from data, building interpretable classifiers, or documenting implicit labeling logic from annotators. Related - ai-following-rules, ai-sorting. Also used for dspy.InferRules, extract rules from examples, interpretable AI decisions, understand classification logic, generate decision rules from labels, explainable AI with DSPy, turn labeled data into explicit rules, human-readable classification rules, rule extraction from training data, when you need to explain why AI decided, interpretable model logic, audit AI decision process, regulatory compliance explainability, extract patterns from labeled data.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-infer-rulesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using dspy.InferRules to discover explicit, human-readable rules from labeled examples and inject them into program instructions.
dspy.InferRules is a DSPy optimizer that analyzes your training examples and extracts natural-language rules describing the decision patterns it finds. These rules are then appended to the instructions of each predictor in your program. The result is a compiled program whose prompts contain explicit, interpretable decision logic -- not opaque few-shot examples.
It inherits from BootstrapFewShot, so it first bootstraps demonstrations and then goes further by inducing rules from those demonstrations.
Key properties:
Use dspy.InferRules when:
Do not use InferRules when:
dspy.BootstrapFewShot insteaddspy.MIPROv2 insteaddspy.BootstrapFinetuneThree things are needed: a DSPy program, a metric function, and a training set.
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or any LiteLLM-supported provider
# 1. Define a program
classify = dspy.ChainOfThought("text -> label")
# 2. Define a metric
def exact_match(example, pred, trace=None):
return pred.label.strip().lower() == example.label.strip().lower()
# 3. Prepare training data
trainset = [
dspy.Example(text="Server is down again", label="urgent").with_inputs("text"),
dspy.Example(text="Update my billing info", label="normal").with_inputs("text"),
dspy.Example(text="Site is completely broken", label="urgent").with_inputs("text"),
dspy.Example(text="How do I change my password?", label="normal").with_inputs("text"),
# ... more labeled examples
]
# 4. Compile with InferRules
optimizer = dspy.InferRules(metric=exact_match, num_rules=10)
compiled = optimizer.compile(classify, trainset=trainset)
# 5. Use the compiled program -- instructions now contain discovered rules
result = compiled(text="Database connection pool exhausted")
print(result.label)
After compilation, inspect the rules that were injected:
# View the enhanced instructions for each predictor
for name, predictor in compiled.named_predictors():
print(f"Predictor: {name}")
print(f"Instructions: {predictor.signature.instructions}")
print()
The compilation process has five stages:
trainset 50/50 into training and validation sets (unless you provide valset separately)BootstrapFewShot.compile() to collect successful input-output demonstrationsRulesInductionProgram that generates natural-language rules describing the patternsnum_candidates times with different samples to produce diverse rule setsThe induced rules look like plain English statements, for example:
"If the text mentions system failures, outages, or data loss, classify as urgent." "If the text is a routine account or billing question, classify as normal."
These rules are appended to the predictor's existing instructions, giving the LM explicit decision logic to follow.
dspy.InferRules(
num_candidates=10, # Number of candidate programs to evaluate
num_rules=10, # Number of rules to induce per predictor
num_threads=None, # Thread count for parallel evaluation
teacher_settings=None, # Config for the teacher model
metric=..., # Evaluation metric (required, via kwargs)
max_errors=..., # Max allowed errors during evaluation (optional, via kwargs)
)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| num_candidates | int | 10 | Number of candidate rule-enhanced programs to generate. More candidates increase the chance of finding better rules but cost more LM calls |
| num_rules | int | 10 | Number of rules to induce per predictor. More rules capture finer patterns but risk overfitting or exceeding context limits |
| num_threads | int | None | Number of threads for parallel evaluation. None uses the default |
| teacher_settings | dict | None | Configuration for the teacher model used during bootstrapping |
| metric | Callable | -- | Evaluation function (example, prediction, trace) -> float. Passed via kwargs |
| max_errors | int | -- | Maximum errors allowed before stopping evaluation. Passed via kwargs |
compiled_program = optimizer.compile(
student, # Your DSPy program to optimize (required)
trainset=trainset, # Training examples (required)
valset=None, # Validation examples (optional -- auto-split if not provided)
)
If valset is not provided, compile automatically splits trainset 50/50 into training and validation sets. Providing your own valset gives you more control over evaluation.
InferRules stands apart from other optimizers because its output is human-readable:
| Optimizer | Output | Interpretable? |
|-----------|--------|----------------|
| BootstrapFewShot | Few-shot examples in the prompt | Somewhat -- you can read the examples |
| MIPROv2 | Optimized instructions + few-shot | Partially -- instructions are readable but auto-generated |
| BootstrapFinetune | Updated model weights | No -- weights are opaque |
| InferRules | Explicit natural-language rules | Yes -- you can read, audit, and edit the rules |
This makes InferRules a good fit for:
num_candidates controls how many different rule sets are generated and compared:
| Value | Use case | |-------|----------| | 3-5 | Quick iteration, prototyping | | 10 (default) | Good balance of quality and cost | | 15-20 | High-stakes applications, when you need the best possible rules |
num_rules controls how many rules are induced per predictor:
| Value | Use case | |-------|----------| | 3-5 | Simple binary tasks (spam/not-spam) | | 10 (default) | Multi-class tasks, moderate complexity | | 15-20 | Tasks with many edge cases or subtle distinctions |
More rules is not always better. Too many rules can overwhelm the LM's context or introduce contradictions. Start with the defaults and adjust based on validation scores.
For more control, provide your own validation set:
optimizer = dspy.InferRules(metric=exact_match, num_rules=10, num_candidates=10)
compiled = optimizer.compile(
classify,
trainset=train_examples,
valset=val_examples,
)
This is recommended when:
# Save the compiled program (includes the discovered rules in instructions)
compiled.save("compiled_with_rules.json")
# Load it later
from your_module import YourProgram
loaded = YourProgram()
loaded.load("compiled_with_rules.json")
# The loaded program has the same enhanced instructions
result = loaded(text="New input here")
optimizer.compile(), always print the enhanced instructions with predictor.signature.instructions. InferRules can generate incorrect or contradictory rules. Read them, edit or remove bad ones before deploying.num_rules too high. More rules is not always better. Too many rules overwhelm the LM's context window or introduce contradictions. Start with 10 (the default) and only increase if validation scores improve. Reduce if you see contradictory behavior.valset, InferRules uses only 20 for training — often too few for good rule induction. Pass valset explicitly to control the split.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-improving-accuracy/ai-improving-accuracy/dspy-evaluate/dspy-data/dspy-signatures/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.