plugins/yzmir-morphogenetic-rl/skills/using-morphogenetic-rl/SKILL.md
Use when an RL controller decides whether/when/how to mutate a network's topology during training - growing seeds, grafting modules, retiring underperformers - and you need controller action/observation/reward design, governor and safety-gate discipline, rollback-as-RL-signal shaping, deterministic replay across topology change, or growth-aware ablation/evaluation.
npx skillsauth add tachyon-beep/skillpacks using-morphogenetic-rlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A morphogenetic controller is an RL agent acting on a non-stationary environment whose state space includes its own past structural decisions. Its mistakes change the shape of the network it is trying to optimize.
That sentence is the whole problem. It implies four load-bearing properties that this pack designs into the system:
Key tensions: exploration vs. stability, controller autonomy vs. governor veto, reward density vs. reward honesty, determinism vs. performance. Every sheet in the pack resolves one or more of these.
Use this pack when:
Do not use this pack when:
yzmir-dynamic-architectures. That pack covers the network being grown; this pack covers the agent doing the growing.yzmir-deep-rl. This pack assumes you already chose your RL algorithm; it covers its morphogenesis-specific application.yzmir-pytorch-engineering:debug-nan.yzmir-simulation-foundations:check-determinism. For architecture-level determinism in any system that must be re-runnable from inputs, see axiom-determinism-and-replay — this pack's deterministic-morphogenesis cross-links there for the cross-machine and floating-point details.If your input is a greenfield morphogenetic experiment and you have not run this pack before:
deterministic-morphogenesis.md — separate RNG streams, replay log shape, multi-rank sync. Foundation, not optional.growth-telemetry-and-ablation.md — the two-table (step + event) schema design that survives topology change. Foundation; everything downstream depends on it.rl-controller-for-morphogenesis.md — define action / observation / reward before you wire any RL framework.governor-and-safety-gates.md — wrap the controller in a veto layer before any catastrophic action can land.rollback-as-rl-signal.md — wire governor decisions back into the training signal.evaluation-under-topology-change.md — plan the four required baselines (off-switch, static-initial, static-final, fixed-schedule) before running anything.Steps 1–2 are the spike: if determinism and telemetry are wrong, no later sheet can save you. Steps 3–5 design the agent. Step 6 is what proves the agent did anything at all.
All reference sheets are in the same directory as this SKILL.md. When you see a link like [rl-controller-for-morphogenesis.md](rl-controller-for-morphogenesis.md), read the file from the same directory.
yzmir-dynamic-architectures (the network) yzmir-morphogenetic-rl (the controller)
growable substrates: FSM, gradient ←-cross-ref-→ action/observation/reward,
isolation, alpha blending, lifecycle governor/safety, rollback
as RL signal, evaluation
under topology change
─────────────────────────────────────────────────────────────────────
↓
yzmir-deep-rl provides the algorithm (PPO / SAC / DQN);
axiom-determinism-and-replay provides cross-machine
determinism semantics this pack's deterministic-
morphogenesis sheet cross-links to
The boundary with yzmir-dynamic-architectures is sharp:
| Question | Pack |
|----------|------|
| "How does the growable network train?" | yzmir-dynamic-architectures |
| "How does the controller decide to grow it?" | this pack |
| "What state machine governs the lifecycle?" | yzmir-dynamic-architectures/ml-lifecycle-orchestration |
| "What action/observation/reward space drives the policy?" | this pack |
| "How do I freeze gradients for a new module?" | yzmir-dynamic-architectures/gradient-isolation-techniques |
| "How does an RL signal feed back into the controller when isolation fails?" | this pack |
| "How do I blend a seed in with alpha?" | yzmir-dynamic-architectures/gradient-isolation-techniques |
| "How does the controller learn to set alpha?" | this pack |
This pack ships eight novel sheets and two bridge sheets. Numbered artifacts are not used (controllers are designed, not assembled from a fixed numbered set); the catalog is grouped by concern.
Foundations (read first when starting greenfield):
| Sheet | Concern |
|-------|---------|
| deterministic-morphogenesis | Separate RNG streams, replay logs, multi-rank sync, validating determinism in CI |
| growth-telemetry-and-ablation | Two-table (step + event) schema design that survives topology change |
Controller and governor:
| Sheet | Concern |
|-------|---------|
| rl-controller-for-morphogenesis | Action / observation / reward design, counterfactual baselines, algorithm choice |
| governor-and-safety-gates | Non-policy veto layer, panic detection, NaN/Inf gates, gate-gaming resistance |
| rollback-as-rl-signal | Reward shaping when the governor reverts a decision; conservative-collapse avoidance |
Coordination and evaluation:
| Sheet | Concern |
|-------|---------|
| multi-seed-coordination-rl | Slot contention, simultaneous actions, credit assignment, factored joint actions |
| evaluation-under-topology-change | The four required baselines, per-FLOP/per-param normalization, multi-seed reporting |
| when-not-to-grow | Off-switch baseline, six failure modes where morphogenesis hurts, the discipline of stopping |
Bridges to yzmir-dynamic-architectures:
| Sheet | Concern |
|-------|---------|
| safety-gated-seed-fsm | Governor verdicts as FSM transitions; defers FSM mechanics to the sibling pack |
| rl-driven-alpha-blending | α as a learned controller output rather than fixed schedule; defers blending mechanics |
| Symptom or question | Primary sheet |
|---------------------|---------------|
| "What should my controller's action space look like?" | rl-controller-for-morphogenesis |
| "Reward function rewards loss-down even when growth hurt" | rl-controller-for-morphogenesis |
| "Controller is growing during loss spikes" | governor-and-safety-gates |
| "Policy is learning to game the gates" | governor-and-safety-gates |
| "How do I shape reward when the governor reverts a decision?" | rollback-as-rl-signal |
| "PPO is ignoring rollback events because they're rare" | rollback-as-rl-signal |
| "Two seeds want the same slot — who wins?" | multi-seed-coordination-rl |
| "Same seed, same data, different grow events on rerun" | deterministic-morphogenesis |
| "Logged metrics break when shape changes" | growth-telemetry-and-ablation |
| "How do I compare a 4M-param checkpoint to a 4.3M-param checkpoint?" | evaluation-under-topology-change |
| "Tried morphogenesis, it made things worse" | when-not-to-grow |
| "My seed lifecycle FSM needs safety overrides" | safety-gated-seed-fsm |
| "Controller should learn α, not have it scheduled" | rl-driven-alpha-blending |
| "Set up the seed lifecycle FSM itself" | → yzmir-dynamic-architectures/ml-lifecycle-orchestration |
| "Implement gradient detach/freezing for the new module" | → yzmir-dynamic-architectures/gradient-isolation-techniques |
| "Pick a PPO/SAC implementation" | → yzmir-deep-rl/policy-gradient-methods |
agent: morphogenesis-reviewer — Reviews morphogenetic-RL designs and code for the disciplines this domain demands: separate RNG streams, ablation-friendly schemas, governor independence, baselines run, replay log completeness. Invoked via Task tool.agent: governor-design-reviewer — Critiques a governor design for non-policy independence, gate completeness, hysteresis, gate-gaming resistance, and rollback-path coverage. Invoked via Task tool./scaffold-morphogenetic-experiment — Drops in determinism log, two-table telemetry, controller skeleton, governor skeleton, and the four-baseline evaluation harness for a greenfield experiment./diagnose-growth-pathology — Runs the triage sequence against an existing morphogenetic system that is misbehaving (catastrophic actions, conservative collapse, non-reproducibility, or unfair evaluation).Agents vs. skills: Skills design the controller and its surrounding discipline. Agents audit or critique an existing design. Load a skill when designing; dispatch an agent when reviewing.
deterministic-morphogenesis — RNG streams, replay log, before any other infrastructuregrowth-telemetry-and-ablation — Two-table schemas before any other loggingrl-controller-for-morphogenesis — Define action / observation / rewardgovernor-and-safety-gates — Wrap the controller in a veto layerrollback-as-rl-signal — Wire governor decisions back into PPOevaluation-under-topology-change — Plan the four baselines before running anythingThen for the host-side: yzmir-dynamic-architectures/ml-lifecycle-orchestration (FSM) and yzmir-dynamic-architectures/gradient-isolation-techniques (training mechanics).
deterministic-morphogenesis — Is the run reproducible at all?growth-telemetry-and-ablation — Are the schemas intact?governor-and-safety-gates — Are gates wired up; are panic rules complete?when-not-to-grow — Has the off-switch baseline been run?rl-controller-for-morphogenesis — Audit reward shaperollback-as-rl-signal — Are rollbacks reaching the policy?rollback-as-rl-signal — Probably asymmetric reward → conservative collapserl-controller-for-morphogenesis — Reward shaping auditwhen-not-to-grow — Confirm Failure Mode 6: conservative collapseyzmir-deep-rl/exploration-strategies for general exploration deficitgrowth-telemetry-and-ablation — Schemas first; everything downstream depends on themdeterministic-morphogenesis — Replay log; counterfactual replay capabilityevaluation-under-topology-change — The four required baselines and how to comparemulti-seed-coordination-rl — If the experiment has K-slot decisionsmulti-seed-coordination-rl — The "everyone grows at once" failure modegovernor-and-safety-gates — Multi-action pre-flight, priority vetorl-controller-for-morphogenesis — Factored joint action space auditwhen-not-to-grow — Is the off-switch baseline run? If not, stop.evaluation-under-topology-change — Are all four baselines run? Multi-seed?growth-telemetry-and-ablation — Are schemas additive across the runs being compared?Designing the controller from scratch?
├─ Yes → deterministic-morphogenesis → growth-telemetry-and-ablation
│ → rl-controller → governor → rollback-as-rl-signal
│ → evaluation-under-topology-change (plan baselines)
└─ No → continue
Controller takes catastrophic actions? → governor-and-safety-gates
Controller has stopped exploring (always-roll)? → rollback-as-rl-signal,
rl-controller (reward audit),
when-not-to-grow (confirm FM6)
Multiple seeds want the same slot? → multi-seed-coordination-rl
Cannot reproduce a topology / controller call? → deterministic-morphogenesis
Logged metrics break across grow events? → growth-telemetry-and-ablation
Comparing checkpoints with different shapes? → evaluation-under-topology-change
Morphogenesis making things worse — or nothing? → when-not-to-grow
FSM design with safety overrides? → safety-gated-seed-fsm
(then dynamic-architectures/
ml-lifecycle-orchestration)
α as a learned controller output? → rl-driven-alpha-blending
(then dynamic-architectures/
gradient-isolation-techniques)
| Rationalization | Reality | Counter-guidance |
|-----------------|---------|------------------|
| "The controller will learn to avoid catastrophic actions" | Some catastrophic actions destroy training before any gradient feedback arrives | Add a governor — see governor-and-safety-gates |
| "Loss went down after grow, so growth was good" | Counterfactual: loss may have gone down anyway. Network grew → optimization improved → confound | See rl-controller-for-morphogenesis on counterfactual baselines |
| "I'll let the controller decide whether to roll back" | A controller given veto over its own gates will eventually disable them | Governor must be outside policy — see governor-and-safety-gates |
| "Rollbacks are rare, no need to weight them" | Rare events with large effect dominate true return; vanilla PPO underweights them | See rollback-as-rl-signal on advantage normalization |
| "Determinism is a nice-to-have, I'll add it later" | Without it you can't reproduce any failure, can't ablate, can't debug | See deterministic-morphogenesis — pay the cost upfront |
| "I'll compare runs by final loss" | Different shapes, different parameter counts, comparison is meaningless | See evaluation-under-topology-change for fair comparison protocols |
| "Morphogenesis is always worth trying" | Many domains have a known small optimal architecture; morphogenesis adds variance for no return | See when-not-to-grow first |
| "A bigger reward signal will fix the controller" | Larger reward magnitudes amplify shaping bugs; honest sparse signal beats dense distorted signal | See rl-controller-for-morphogenesis on reward decomposition |
Watch for these signs of an unsafe morphogenetic system:
yzmir-dynamic-architectures)That pack covers everything about the network being grown — FSM, gradient isolation, alpha blending mechanics. This pack covers everything about the agent doing the growing. Bridge sheets safety-gated-seed-fsm and rl-driven-alpha-blending cross-link rather than duplicate.
yzmir-deep-rl)This pack assumes you already chose your RL algorithm. For PPO/SAC implementation, general reward shaping, exploration strategies, counterfactual reasoning, and multi-agent primitives → yzmir-deep-rl.
axiom-determinism-and-replay)deterministic-morphogenesis cross-links to that pack for cross-machine determinism, floating-point policy, GPU determinism, and the canonical-state-encoding overlap when morphogenesis output must be replay-comparable.
| Request | Primary pack |
|---------|--------------|
| Continual learning, catastrophic forgetting | yzmir-dynamic-architectures/continual-learning-foundations |
| Determinism in physics simulation | yzmir-simulation-foundations:check-determinism |
| Debug NaN in PyTorch | yzmir-pytorch-engineering:debug-nan |
| Train PPO faster (FSDP, FP8) | yzmir-training-optimization |
| Deploy a morphogenetic model in production | yzmir-ml-production |
| Need | Use this |
|------|----------|
| Foundations: determinism + telemetry | deterministic-morphogenesis, growth-telemetry-and-ablation |
| Controller spaces (action / observation / reward) | rl-controller-for-morphogenesis |
| Non-policy veto layer | governor-and-safety-gates |
| Wire governor decisions into PPO | rollback-as-rl-signal |
| Slot contention / simultaneous actions | multi-seed-coordination-rl |
| Compare checkpoints with different shapes | evaluation-under-topology-change |
| Decide whether to grow at all | when-not-to-grow |
| Bridge to FSM mechanics | safety-gated-seed-fsm (→ dynamic-architectures) |
| Bridge to alpha-blending mechanics | rl-driven-alpha-blending (→ dynamic-architectures) |
| Scaffold a greenfield experiment | command: /scaffold-morphogenetic-experiment |
| Diagnose an existing system that is misbehaving | command: /diagnose-growth-pathology |
| Review a design for the seven disciplines | agent: morphogenesis-reviewer |
| Critique a governor design | agent: governor-design-reviewer |
A morphogenetic controller is an RL agent whose mistakes change the shape of the network it is optimizing. That is the whole problem. The pack designs the four properties that make such a system survivable: counterfactual-aware reward, non-policy governor, determinism across topology change, and evaluation under topology change. Skip any of the four and the system will eventually grow itself into a worse state that you cannot reproduce, ablate, or compare.
After routing, load the appropriate specialist sheet:
Foundations (read first when starting greenfield):
Controller and governor:
Coordination and evaluation:
Bridges to yzmir-dynamic-architectures:
tools
Use when designing, implementing, or auditing an MCP (Model Context Protocol) server — tool API design, idempotency under agent retry, structured error envelopes agents can recover from, schema versioning across model drift, transport reliability (stdio / HTTP), output-shape and pagination discipline, and choosing between tools / resources / prompts / sampling. Also use when an MCP server's tools confuse agents, return unstructured errors, deadlock under concurrent calls, double-execute under retry, or lose state across reconnects. Do not use for general REST/GraphQL API design (use `/web-backend`), for client-side prompt engineering or tool-loop design (use `/llm-specialist`), for general in-process plugin architecture (use `/system-architect`), or for cryptographic-provenance audit trails (use `/audit-pipelines`).
development
Use when running **SQLite or DuckDB inside an application process** as the durable store — not as a development convenience but as the production database. Use when scaling an SQLite layer that worked at low concurrency and is now hitting SQLITE_BUSY, WAL bloat, lock contention, schema-migration ceremony, or correctness gaps under multi-process writers. Use when introducing DuckDB as an OLAP complement to an OLTP SQLite store, or when picking between the two for a new component. Pairs with `/web-backend` (the API surface above the DB) and `/audit-pipelines` (when the DB is also the audit trail). Do not load for server databases (Postgres, MySQL), key-value stores, or ORM choice in isolation.
development
Use when designing or critiquing the structure of a staged procedure — a wizard, configuration flow, troubleshooting tree, training curriculum, multi-stage approval pipeline, decision pipeline, or any decomposition of expert work into composable stages. Use for both producer work (build the decomposition) and critic work (audit a proposed decomposition). Use when reasoning about capacity, bottlenecks, or soundness of a procedural flow. Do not use for implementation-plan critique of code changes (use `/axiom-planning` instead), for execution-time dynamics (use `/simulation-foundations`), or for rendering an already-designed procedure as docs or UI (use `/technical-writer` or `/ux-designer`).
testing
Use when the user wants to draft fiction or creative nonfiction prose, get craft critique on prose they have written, or plan story structure, outline, or premise. Workshop-voiced. Three explicit modes (draft, critique, plan) and the router will refuse to begin work without a declared mode.