plugins/agent-agentic-os/skills/os-improvement-report/SKILL.md
Trigger with "show me the improvement chart", "how are we improving", "progress report", "graph the eval scores", "show cycle of improvement", "what's the trend", "are we getting better". Produces a visual/text summary of how the agentic loop is improving across cycles. Do NOT use this to run the learning loop or evaluate a specific skill change.
npx skillsauth add richfrem/agent-plugins-skills os-improvement-reportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill requires Python 3.8+, pandas, and matplotlib.
To install this skill's dependencies:
pip-compile ./requirements.in
pip install -r ./requirements.txt
See ./requirements.txt for the dependency lockfile.
Visual and text reporting on the agentic loop improvement cycle — across any plugin that
maintains an improvement-ledger.md and results.tsv per skill.
The reference output is the autoresearch progress chart: green KEEP dots on a timeline, gray DISCARD dots, running-best step line, annotations showing what each improvement was. This skill produces the same chart for agentic-os and exploration-cycle-plugin improvement cycles.
| Source | Priority | Content |
|--------|----------|---------|
| context/experiment-log/index.md | Primary | All logged runs; filter result_type: numeric for KEEP/DISCARD/score data from orchestrator runs |
| context/memory/improvement-ledger.md | Legacy fallback | Eval score progression written by os-improvement-loop Stage 4.7; used if experiment log has no numeric entries |
| .agents/skills/*/evals/results.tsv | Supplement | Per-skill detailed eval score history |
The experiment log is the unified source of truth for numeric results. The improvement ledger is a legacy format maintained for backward compatibility with older loop runs.
| Output | Description |
|--------|-------------|
| context/memory/reports/progress_YYYYMMDD_HHMM.png | Progress chart: KEEP/DISCARD timeline, running-best step line, change annotations |
| context/memory/reports/summary_YYYYMMDD_HHMM.md | Text summary: baseline vs best, top hits by delta, survey effectiveness, north star trend |
python3 plugins/agent-agentic-os/scripts/experiment_log.py summary
Then read context/experiment-log/index.md and filter for rows where the Result Type
column is numeric. For each matching row, read the linked .md file and extract from
its YAML header:
keeps: (integer — from verdict string "NNK/NND ...")
discards: (integer)
baseline: (float)
best_score: (float)
delta: (float, signed)
target: (string — the skill/agent under test)
date: (string)
Parse the verdict string with this pattern:
(\d+)K/(\d+)D baseline=([0-9.]+) best=([0-9.]+) delta=([+-][0-9.]+)
If 1+ numeric entries exist, use them as the primary data source for the chart. If 0 numeric entries exist, fall through to Phase 1 (legacy ledger).
Bridge step: If the legacy generate_report.py script is being used, write the
extracted numeric data into improvement-ledger.md Section 1 format so the script
can consume it. Each numeric experiment log entry maps to one row:
| <date> | <target> | <baseline> | <best_score> | <delta> | <keeps> KEEP, <discards> DISCARD |
LEDGER="${CLAUDE_PROJECT_DIR}/context/memory/improvement-ledger.md"
if [ ! -f "$LEDGER" ]; then
echo "No improvement ledger found. Run at least one full loop cycle first."
echo "The ledger is created at Stage 4.7 of os-improvement-loop."
exit 0
fi
wc -l "$LEDGER"
If the ledger exists but Section 1 table is empty (no rows beyond the header), inform the user that no cycles have been completed yet and the first loop run will establish the baseline. Do not run the report script on an empty ledger — it will produce an empty chart.
PLUGIN_DIR="${CLAUDE_PLUGIN_ROOT:-$(pwd)/.agents/skills/agent-agentic-os}"
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(pwd)}"
python "${PLUGIN_DIR}/skills/os-improvement-report/scripts/generate_report.py" \
--project-dir "$PROJECT_DIR" \
--plugin-dir "$PLUGIN_DIR" \
[--skill SESSION-MEMORY-MANAGER] # optional: filter to one skill
The script exits 0 on success and prints the chart path and text summary to stdout.
After the script completes:
context/memory/reports/progress_[TIMESTAMP].pngIf the user wants improvement tracking across both agent-agentic-os AND exploration-cycle-plugin,
run the report twice — once per plugin — passing each plugin's project dir:
# agentic-os cycles
python "$SCRIPT" --project-dir "$AGENTIC_OS_PROJECT" --plugin-dir "$AGENTIC_OS_PLUGIN"
# exploration-cycle cycles
python "$SCRIPT" --project-dir "$EXPLORATION_PROJECT" --plugin-dir "$EXPLORATION_PLUGIN"
Both plugins write to context/memory/improvement-ledger.md in their respective project dirs.
Each produces its own chart. The text summaries can be concatenated for a combined view.
The chart mirrors the autoresearch progress.png:
A flat or declining step line = the loop is not improving the skill. Frequent DISCARD clusters = hypothesis quality needs work (check test scenarios seed). Steep step-line rises = the survey-to-action trace is working.
Any plugin that runs eval cycles can plug into this report by:
context/memory/improvement-ledger.md with the three-section format
(see references/memory/improvement-ledger-spec.md — includes a bash init snippet).The generate_report.py script works on any ledger with this format — it is not
tied to agent-agentic-os specifically.
tools
Ingests repository files into the ChromaDB vector store. Builds or updates the vector index from a manifest or directory scan using ingest.py. Use when new files need to be indexed or the vector store is out of date. <example> user: "Index these new plugin files into the vector database" assistant: "I'll use vector-db-ingest to add them to the vector store." </example> <example> user: "The vector store is missing recent files -- update it" assistant: "I'll use vector-db-ingest to re-index the changes." </example>
data-ai
Removes stale and orphaned chunks from the ChromaDB vector store for files that have been deleted or renamed. Use after files are removed or moved to keep the vector index in sync with the filesystem. <example> user: "Clean up the vector store after I deleted some files" assistant: "I'll use vector-db-cleanup to remove orphaned chunks." </example> <example> user: "The vector database has chunks for files that no longer exist" assistant: "I'll run vector-db-cleanup to prune them." </example>
testing
Audit Vector DB coverage -- compares the live filesystem manifest against the ChromaDB index to identify coverage gaps.
development
3-Phase Knowledge Search strategy for the RLM Factory ecosystem. Auto-invoked when tasks involve finding code, documentation, or architecture context in the repository. Enforces the optimal search order: RLM Summary Scan (O(1)) -> Vector DB Semantic Search -> Grep/Exact Match. Never skip phases.