plugins/agent-agentic-os/skills/self-evolution/SKILL.md
Self-healing and self-evolving pattern for agents operating against external systems (CDP automation, DOM-dependent tooling, web APIs). Classifies failures into three tiers — Gap / Failure / Regression — applies repo-profile-gated edits with appropriate autonomy, verifies the fix, and updates domain reference files ("The Map, not the Diary"). Invoke whenever a tool call or subprocess returns a failure that a patched helper could fix.
npx skillsauth add richfrem/agent-plugins-skills self-evolutionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Trigger: Any tool call or subprocess returns a failure that may be caused by a stale selector, missing helper, or broken script — and the fix is within the agent's allowed edit boundaries.
Core principle: The agent does not just retry — it learns. Every fix either patches a helper (so the failure can't recur) or updates a reference file (so future agents avoid the same dead end). Fixes that aren't recorded are not fixes; they are patches waiting to become the same bug again.
Before doing anything else, locate and read the repo's self-evolution profile:
<repo-root>/plugins/<plugin>/references/self-evolution-profile.md
If no profile exists for the current repo/plugin, create a minimal one now using the template in Phase 0.1 below, then continue.
The profile defines:
If no profile exists, write one at <plugin>/references/self-evolution-profile.md:
# Self-Evolution Profile — <Plugin Name>
## Allowed Edit Directories
# List dirs the agent may edit autonomously (relative to repo root)
## Error Pattern Classification
| Pattern | Tier |
|---------|------|
| element not found / selector missing | Regression |
| function not exported / module not found | Gap |
| TypeError / syntax error | Failure |
| subprocess timeout | Regression |
| JSON parse error | Failure |
## Domain Playbook Location
<path to references/ folder>
## Evolution Log
<path>/evolution-log.md
Using the error message, stack trace, and context, classify into exactly one tier:
"The capability doesn't exist yet."
Signals:
function not found, is not a function, module has no exportResponse: Build the missing piece. No evidence collection needed.
"Code exists but is broken."
Signals:
TypeError, SyntaxError, ReferenceError inside our own codeResponse: Debug the code. Read the relevant source, identify the bug, patch it.
"This worked before. Something external changed."
Signals:
element not found, cannot read property of null on a well-used DOM pathResponse: Collect evidence first (screenshot + DOM snapshot), then patch with a fallback selector or updated timing. Document the change in The Map.
If ambiguous between Failure and Regression: default to Regression and collect evidence — the cost of a screenshot is lower than patching the wrong layer.
Evidence collection is tier-dependent:
| Tier | Evidence to collect |
|------|-------------------|
| Gap | None — log the capability boundary in the evolution log |
| Failure | Error message + stack trace (last 20 lines of stderr) + relevant source lines |
| Regression | Screenshot of current UI state + DOM snapshot of the failing selector area + git log --oneline -5 on the affected file |
For Regression, run the DOM snapshot before touching any code:
// Inline Node snippet to dump selector context
const els = document.querySelectorAll('[data-name]');
console.log(JSON.stringify([...els].map(e => e.getAttribute('data-name')).filter(Boolean)));
Save evidence to temp/self-evolution/<timestamp>-evidence/.
Based on tier and evidence:
Gap: Identify the exact file and function to create. Check the allowed edit directories from the profile. If the target file is outside those dirs, escalate to the user (Phase 6).
Failure: Read the failing function. Identify the minimal fix. Prefer adding a guard or correcting an argument over rewriting logic.
Regression: Identify the old selector/timing from git history or the domain playbook. Find a new stable selector from the DOM snapshot. Plan a two-path patch: primary (new selector) + fallback (broader query with filter).
Write the plan as 3–5 bullet points before touching any file.
Check the edit type before writing:
| Edit type | Gate | |-----------|------| | Add new function / export | Auto-approved — proceed | | Add new selector / fallback path | Auto-approved — proceed | | Modify existing function logic | Auto-approved — append git diff to evolution log after edit | | Rename or move a file | Confirm with user: "About to rename X → Y. Confirm?" | | Delete any file or function | Hard stop — always confirm with user before proceeding |
Steps:
git diff <file> and save output to evolution log.Re-run the exact operation that originally failed:
# Re-run the specific command / test that triggered self-evolution
Pass: Proceed to Phase 6.
Fail (attempt 1): Return to Phase 3, reconsider the diagnosis. Try a different repair approach.
Fail (attempt 2): Return to Phase 3, broaden evidence collection.
Fail (attempt 3 — final): Escalate to user (Phase 6, escalation path). Do not make further edits. Present the full evidence bundle and the three approaches tried.
Whether or not the fix succeeded, update the domain reference files:
If fix succeeded:
references/*.md with the new selector
and a note: <!-- updated <date>: old=[...] new=[...] TV regression --><playbook-location> from the profile (see Playbook Format below)If escalating to user:
UNRESOLVED and the
three approaches tried — so the next agent doesn't repeat the same dead endsCreate <playbook-location>/<topic>-playbook.md:
# Playbook: <Topic>
**Status:** ACTIVE | UNRESOLVED
**Last verified:** YYYY-MM-DD
**Relevant files:** list of files
## What This Covers
One sentence.
## The Mechanics
Step-by-step: what works, what the exact selectors/timing are, why.
## Known Failure Modes
| Symptom | Tier | Fix applied |
|---------|------|-------------|
## Change History
| Date | What changed | Tier | Outcome |
Append one row to the evolution log (evolution-log.md from profile):
| <date> | <tier> | <what failed (one line)> | <what was patched> | <edit type> | <outcome: FIXED/ESCALATED> |
If the log file doesn't exist yet, create it with the header:
# Evolution Log
| Date | Tier | Failure | Patch | Edit Type | Outcome |
|------|------|---------|-------|-----------|---------|
When escalating to the user after 3 failed attempts:
Self-Evolution Escalation — [Tier: Regression/Failure/Gap]
Operation that failed:
<command>Error:<error message>Evidence:temp/self-evolution/<timestamp>-evidence/Three approaches tried:
<approach 1>→<result><approach 2>→<result><approach 3>→<result>What I need from you:
<specific question — e.g., "What is the new selector for the Indicators dialog?">Once you provide it, I will apply the fix and update The Map.
tools
Ingests repository files into the ChromaDB vector store. Builds or updates the vector index from a manifest or directory scan using ingest.py. Use when new files need to be indexed or the vector store is out of date. <example> user: "Index these new plugin files into the vector database" assistant: "I'll use vector-db-ingest to add them to the vector store." </example> <example> user: "The vector store is missing recent files -- update it" assistant: "I'll use vector-db-ingest to re-index the changes." </example>
data-ai
Removes stale and orphaned chunks from the ChromaDB vector store for files that have been deleted or renamed. Use after files are removed or moved to keep the vector index in sync with the filesystem. <example> user: "Clean up the vector store after I deleted some files" assistant: "I'll use vector-db-cleanup to remove orphaned chunks." </example> <example> user: "The vector database has chunks for files that no longer exist" assistant: "I'll run vector-db-cleanup to prune them." </example>
testing
Audit Vector DB coverage -- compares the live filesystem manifest against the ChromaDB index to identify coverage gaps.
development
3-Phase Knowledge Search strategy for the RLM Factory ecosystem. Auto-invoked when tasks involve finding code, documentation, or architecture context in the repository. Enforces the optimal search order: RLM Summary Scan (O(1)) -> Vector DB Semantic Search -> Grep/Exact Match. Never skip phases.