skills/generative-ontology-structured-knowledge/SKILL.md
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".
npx skillsauth add ndpvt-web/arxiv-claude-skills generative-ontology-structured-knowledgeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build systems where domain ontologies encoded as Pydantic schemas constrain LLM generation, while multi-agent specialization drives creative quality. The core insight from the paper "Generative Ontology" (Cheung, 2026) is that ontology provides the grammar and the LLM provides the creativity — combining executable type constraints with specialized agent roles eliminates structural hallucinations (effect size d=4.78) while producing the largest gains in output quality (fun d=1.12, depth d=1.59).
Generative Ontology treats domain knowledge as executable code rather than passive documentation. A domain ontology (the valid concepts, relationships, and constraints of a field) is encoded as nested Pydantic BaseModel classes with Literal types, Enum constraints, min_length validators, and cross-field validators. These schemas become the output specification for DSPy signatures, meaning the LLM must produce output that parses and validates — or the framework retries with error feedback. This eliminates an entire class of failures: mechanisms without components, goals without end conditions, recipes without cooking methods.
Multi-agent specialization is the second pillar. Rather than one LLM call doing everything, a sequential pipeline assigns distinct professional roles — each with a defined "anxiety" (a persistent concern that prevents shallow agreement). For example, a Balance Critic asks "What breaks when optimized?" and a Theme Weaver asks "Does the theme feel alive in every mechanism?" The ablation study showed that schema validation alone eliminates structural errors but does not improve creative quality; multi-agent specialization produces the largest creative gains. Both are needed together.
RAG grounding forms the third pillar. Existing exemplars (e.g., published board games from BoardGameGeek) are embedded and indexed. Retrieval uses a two-phase strategy: first filter by ontology categories (e.g., matching mechanism types), then rank by semantic similarity to the theme. Retrieved exemplars demonstrate successful patterns — how mechanisms combine, which themes pair with which structures — giving the LLM concrete precedents rather than generating from scratch.
Identify the domain ontology. List the core concepts, their valid values, and the relationships between them. Ask: what would a domain expert say makes an artifact structurally valid? For a board game: mechanisms, components, victory conditions, player dynamics. For a recipe: ingredients, techniques, equipment, timing constraints.
Encode the ontology as Pydantic schemas. Create a hierarchy of BaseModel classes. Use Literal types and Enum classes to restrict values to the domain vocabulary. Add Field(min_length=...) constraints to prevent empty placeholders. Use nested models to enforce hierarchical coherence (e.g., a ComponentSet inside a GameOntology).
from pydantic import BaseModel, Field
from enum import Enum
from typing import List, Literal
class MechanismType(str, Enum):
WORKER_PLACEMENT = "worker_placement"
DECK_BUILDING = "deck_building"
AREA_CONTROL = "area_control"
ENGINE_BUILDING = "engine_building"
# ... domain-specific vocabulary
class ComponentSet(BaseModel):
card_types: List[str] = Field(default_factory=list)
board_description: str = Field(default="", min_length=0)
tokens: List[str] = Field(default_factory=list)
class GameOntology(BaseModel):
title: str = Field(min_length=3)
theme: str = Field(min_length=20)
game_type: Literal["cooperative", "competitive", "semi-cooperative"]
goal: str = Field(min_length=20, description="Victory condition")
end_condition: str = Field(min_length=10, description="When the game terminates")
primary_mechanisms: List[MechanismType] = Field(min_items=2, max_items=4)
components: ComponentSet
Add cross-field validators to enforce domain coherence — e.g., if DECK_BUILDING is a mechanism, card_types must be non-empty. These catch semantic inconsistencies that type constraints alone miss.
from pydantic import model_validator
class GameOntology(BaseModel):
# ... fields above ...
@model_validator(mode="after")
def check_mechanism_components(self):
requirements = {
MechanismType.DECK_BUILDING: ("card_types", "Deck building requires card types"),
MechanismType.AREA_CONTROL: ("board_description", "Area control requires a board"),
MechanismType.WORKER_PLACEMENT: ("tokens", "Worker placement requires tokens"),
}
for mech in self.primary_mechanisms:
if mech in requirements:
field, msg = requirements[mech]
if not getattr(self.components, field, None):
raise ValueError(msg)
return self
Define DSPy signatures with the schema as the output type. The signature's docstring becomes the system prompt; field descriptions guide generation. Use dspy.ChainOfThought to add reasoning before structured output.
import dspy
class DesignSignature(dspy.Signature):
"""You are an expert tabletop game designer. Produce complete,
specific, mechanically coherent designs."""
theme = dspy.InputField(desc="The thematic concept for the game")
constraints = dspy.InputField(desc="Design requirements", default="")
game_design: GameOntology = dspy.OutputField()
Build the multi-agent pipeline with professional anxieties. Define 3-5 specialized agents, each with a distinct domain concern. Wire them sequentially so each agent receives the cumulative output of prior agents.
| Agent | Responsibility | Professional Anxiety | |-------|---------------|---------------------| | Mechanics Architect | Core systems, turn structure | "Is there meaningful player agency?" | | Theme Weaver | Narrative integration | "Does theme feel alive in every mechanism?" | | Component Designer | Physical elements | "Can players manipulate this smoothly?" | | Balance Critic | Exploit detection | "What breaks when optimized?" | | Fun Factor Judge | Engagement assessment | "Would I want to play this again?" |
Implement RAG grounding. Embed existing exemplars using a sentence-transformer model, store in a vector database (ChromaDB works well). At generation time, filter by ontology categories first (mechanism types, domain tags), then rank by semantic similarity to the input theme. Inject top-k exemplars into agent context.
Wire the pipeline: sequential generation → critical evaluation → refinement. Phase 1: Mechanics Architect → Theme Weaver → Component Designer (sequential, each builds on prior output). Phase 2: Balance Critic evaluates the assembled design. Phase 3: If issues found, a refinement pass addresses them. Phase 4: Fun Factor Judge scores the result.
Add retry logic with error feedback. When Pydantic validation fails, capture the ValidationError, include it in the retry prompt, and re-run. DSPy's Assert mechanism automates this. Typically 1-2 retries suffice.
Validate the final output end-to-end. Run the complete schema validation plus cross-field checks. Log any warnings. Return the validated, typed artifact.
Evaluate with domain-specific metrics. Define 5-9 evaluation dimensions relevant to the domain (for games: fun, strategic depth, thematic cohesion, elegance, tension, social interaction, player agency, replayability). Use LLM-as-judge with test-retest reliability checks (ICC > 0.75 is the target).
Example 1: Board Game Generation Pipeline
User: "Build a Python system that generates complete board game designs from a theme prompt, validated against a game design ontology."
Approach:
GameOntology Pydantic schema with enums for MechanismType, GameType, nested ComponentSet and PlayerDynamics modelsValidationError, feed error message back to LLM and retry (max 3 attempts)Output structure:
# Usage
pipeline = GameGenerationPipeline(model="claude-sonnet-4-20250514")
result = pipeline.generate(
theme="Ancient Egyptian tomb raiders competing to assemble cursed artifacts",
constraints="2-4 players, 60-90 minutes, medium complexity"
)
# result is a validated GameOntology instance
print(result.title) # "Curse of the Pharaohs"
print(result.primary_mechanisms) # [DECK_BUILDING, SET_COLLECTION, HIDDEN_INFO]
print(result.components.card_types) # ["Artifact Fragment", "Curse", "Tool", "Guardian"]
# Validation guarantees: no mechanisms without components, no empty goals
Example 2: Recipe Generation with Culinary Ontology
User: "I want to generate recipes that always have valid technique-equipment pairings and proper timing sequences."
Approach:
CulinaryOntology schema: TechniqueType enum (sautee, braise, ferment...), EquipmentSet, IngredientList with dietary constraint flags, TimingSequence with step orderingOutput:
class CulinaryOntology(BaseModel):
title: str = Field(min_length=3)
cuisine: Literal["french", "japanese", "mexican", "indian", "italian"]
techniques: List[TechniqueType] = Field(min_items=1, max_items=4)
equipment: EquipmentSet
ingredients: List[Ingredient]
steps: List[Step] # ordered, with time estimates
@model_validator(mode="after")
def check_technique_equipment(self):
requirements = {
TechniqueType.BRAISE: ("dutch_oven", "Braising requires a Dutch oven"),
TechniqueType.DEEP_FRY: ("thermometer", "Deep frying requires a thermometer"),
}
# ... validation logic
Example 3: Adapting to Software Architecture Domain
User: "Generate microservice architecture designs that are structurally valid — no service can depend on a capability that doesn't exist."
Approach:
ArchitectureOntology: ServiceType enum (api_gateway, event_bus, data_store...), CommunicationPattern (sync_rest, async_event, grpc), Service model with depends_on and provides capability listsdepends_on entry must match a provides entry from another service in the architectureValidationError on LLM output: Capture the error, include the specific field failures in the retry prompt. DSPy's Assert automates this. Set max retries to 3 — if it still fails, the schema constraint likely conflicts with the prompt.Enum class rather than loosening to str.MechanismType enum needs periodic updates as new patterns emerge. Build a process for ontology maintenance.Paper: Cheung, B. (2026). "Generative Ontology: When Structured Knowledge Learns to Create." arXiv:2602.05636v2. https://arxiv.org/abs/2602.05636v2
Code: https://github.com/bennycheung/GameGrammarCLI
What to look for: Section 3 for the schema architecture and DSPy integration, Section 4 for the multi-agent pipeline and professional anxiety definitions, Section 5 for the ablation study proving that schema validation and multi-agent specialization address different failure modes (structural vs. creative), and Table 8 for the generalization checklist to new domains.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Build granular error taxonomies from incorrect reasoning traces, then use those rubrics to detect errors in LLM outputs across technical domains. Use when asked to: 'build a rubric for evaluating code solutions', 'create an error taxonomy for math reasoning', 'grade reasoning traces for correctness', 'build a reward function for domain-specific tasks', 'classify errors in chain-of-thought outputs', 'evaluate LLM reasoning without gold labels'.