Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

phrazzld/agent-readiness

Name: agent-readiness
Author: phrazzld

skills/agent-readiness/SKILL.md

npx skillsauth add phrazzld/spellbook agent-readiness

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/agent-readiness

Assess how well this codebase supports autonomous AI coding agents. Then fix the highest-impact gaps.

Target: $ARGUMENTS

Execution Stance

You are the executive orchestrator.

Keep prioritization, remediation approval, and final readiness judgment on the lead model.
Delegate pillar assessments and bounded fixes to focused subagents.
Use parallel fanout by default for independent pillars.

Delegation Judgment

delegate on judgment per the shared Roster contract: native subagents by default; add cross-model critics, roster providers, or sprite lanes (/sprites) only when they answer a distinct question. See harnesses/shared/AGENTS.md (Roster).

Local lane guidance: Use specialized assessor lanes per readiness pillar and separate remediation-builder lanes for approved fixes. Split by pillar or ecosystem, prefer provider diversity for architecture/security/CI judgment, and record waivers when profile CRUD is purely mechanical.

Core Insight

The agent is not broken. The environment is. A codebase with fast feedback loops and clear instructions makes any agent dramatically more effective. A codebase with poor feedback loops defeats any agent you throw at it.

Profile Contract

Assessment mode scores the repo. Profile mode manages the durable SDLC contract at .harness-kit/agent-readiness.yaml. Route profile create, profile read, profile update, profile delete, and profile validate to cargo run --locked -p harness-kit-checks -- agent-readiness-profile; the schema lives in references/profile-schema.yaml. Missing profiles degrade to assessment-only behavior, but stale, expired, blank, or placeholder-only waivers are invalid once a profile exists.

Workflow

Phase 0: Profile CRUD

If the user asks to create, read, update, delete, or validate the readiness profile, run the deterministic profile script first and stop after reporting the result. Do not run a pillar assessment just to mutate the profile.

Phase 1: Assess

Dispatch parallel subagents — one per pillar — to evaluate the codebase. Each subagent runs the checks from references/pillar-checks.md for its assigned pillar and returns a structured verdict.

Launch all pillar assessments simultaneously:

| Subagent | Pillar | What it checks | |----------|--------|---------------| | 1 | Style & Validation | Linters, formatters, type checkers, pre-commit hooks | | 2 | Build & CI | Build commands, dependency pinning, CI config, feedback speed | | 3 | Testing | Coverage, speed, local execution, unit/integration/E2E layers | | 4 | Documentation | CLAUDE.md/AGENTS.md, README, setup guide, architecture docs, ADRs | | 5 | Dev Environment | Reproducibility, env templates, devcontainers, isolated workspaces | | 6 | Code Quality | Modularity, complexity, file organization, coupling, dead code | | 7 | Observability | Structured logging, error handling, metrics, tracing | | 8 | Security & Governance | Branch protection, CODEOWNERS, secret scanning, dependency audit | | 9 | Agent-Legible Source Of Truth | External state inventory, agent access path, verification command, waiver expiry |

Each subagent uses Explore agent type and reads references/pillar-checks.md for its pillar's specific pass/fail criteria. Output format per subagent:

## [Pillar Name]
Score: X/Y checks passed
Maturity: L1-L5

### Passing
- [check]: [evidence]

### Failing
- [check]: [what's missing] → [recommended fix]

### Highest-Impact Fix
[The single change that would most improve this pillar]

Phase 2: Report

Synthesize subagent results into a single readiness report:

# Agent Readiness Report: [project-name]

## Overall: Level X — [Maturity Name] (XX%)

| Pillar | Score | Level | Top Fix |
|--------|-------|-------|---------|
| Style & Validation | 4/6 | L3 | Add pre-commit hooks |
| Testing | 2/7 | L1 | Add unit test runner |
| ... | ... | ... | ... |

## Maturity Levels
- L1 Functional: Code runs, but agents need hand-holding
- L2 Documented: Processes written down, some automation
- L3 Standardized: Enforced automation, agents handle routine work
- L4 Optimized: Fast feedback, data-driven improvement
- L5 Autonomous: Self-improving, sophisticated orchestration

## Top 5 Recommendations (by impact)
1. [highest impact fix across all pillars]
2. ...

## Detailed Findings
[per-pillar sections from subagents]

Maturity level is gated: must pass 80% of criteria at current level and all previous levels before advancing. This prevents cherry-picking.

Phase 3: Clarify

Present the report and top 5 recommendations to the user. Ask:

Which recommendations should we execute now?
Any pillars to skip or deprioritize?
Any constraints (don't change CI provider, keep current test framework, etc.)?

One question at a time. Don't dump all three at once.

Phase 4: Fix

For each approved recommendation, spawn a builder subagent (or use worktrees for parallel fixes on disjoint files). Each fix follows the project's existing patterns — don't introduce new tools the team hasn't chosen.

Typical fix categories:

| Fix type | Example | Subagent approach | |----------|---------|-------------------| | Config addition | Add .editorconfig, pre-commit hooks | Single builder, quick | | Documentation | Create/update CLAUDE.md, AGENTS.md | Single builder | | Test infrastructure | Add test runner, coverage config | Builder + TDD cycle | | CI enhancement | Add lint/typecheck/test to CI | Single builder, verify locally | | Architecture doc | Create ADR, architecture overview | Builder reads codebase first | | Security hardening | Add CODEOWNERS, branch protection | Builder + gh CLI |

After fixes: re-run the failing checks to verify improvement. Report the before/after delta.

Phase 5: Re-assess (optional)

If --fix was used, re-run the full assessment to show the improved score. Present the before/after comparison.

Phase 6: Persist Contract Delta

When fixes or approved waivers change the durable readiness contract, update the profile with harness-kit-checks agent-readiness-profile and re-run profile validate. Report whether the profile was improved, preserved, or regressed. Regressions require an explicit contract-change note and waiver expiry.

Routing

| Argument | Behavior | |----------|----------| | (none) | Full assess → report → clarify → fix cycle | | --assess-only | Assess and report only, no fixes | | --fix | Skip clarification, fix all top 5 recommendations | | --pillar <name> | Assess only the named pillar | | profile <create|read|update|delete|validate> | Manage .harness-kit/agent-readiness.yaml through harness-kit-checks agent-readiness-profile |

Pillar Check Reference

All specific checks are in references/pillar-checks.md. Each check is:

Binary: pass or fail (no subjective scoring)
Evidence-based: the subagent must cite the file/config/output that proves pass/fail
Actionable: every failing check has a concrete remediation

See references/agent-readiness-principles.md for the deeper "why" behind each pillar — useful when explaining recommendations to the user.

Gotchas

Scoring without fixing is theater. The report is only useful if it leads to action. Always push toward Phase 4.
Don't introduce tools the team hasn't chosen. If they use Jest, don't suggest Vitest. If they use ESLint, don't add Biome. Work within existing choices.
Pre-commit hooks are the highest-leverage single fix for most codebases. They give agents instant feedback instead of waiting for CI.
CLAUDE.md/AGENTS.md is the highest-leverage documentation fix. It's the file agents actually read. A perfect README that agents ignore is worth less than a scrappy CLAUDE.md they always load.
Hidden source-of-truth is readiness debt. CMS-only, admin-UI-only, or unknown state must be exposed through code, local files, MCP, CLI, API, or skill, or carry a future-expiring waiver. Route external-system remediation through meta/INTEGRATION_GUIDE.md before choosing a mechanism.
Don't conflate coverage percentage with test quality. 80% coverage with shallow tests is worse than 50% coverage with behavior-focused tests. Check for assertion density, not just line coverage.
Monorepos need per-app assessment. A monorepo score is the floor of its worst app, not the average.
Speed matters as much as existence. A test suite that takes 20 minutes is nearly as bad as no tests for agent workflows. Measure execution time.
The codebase is the product, not the agent. Every fix here improves the experience for ALL agents and ALL developers, not just one tool.

phrazzld/agent-readiness

skills/agent-readiness/SKILL.md

Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.

13 stars

development

Updated Jun 10, 2026

$ install --global

skillsauth

npx skillsauth add phrazzld/spellbook agent-readiness

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 10, 2026, 6:23 AM50.5s4 files scanned

SKILL.md

name:: agent-readiness
description:: |
Use when:: agent readiness", "is this codebase agent-ready",
Trigger:: /agent-readiness, /readiness.
argument-hint:: [--assess-only] [--fix] [--pillar <name>] [profile create|read|update|delete|validate]

/agent-readiness

Assess how well this codebase supports autonomous AI coding agents. Then fix the highest-impact gaps.

Target: $ARGUMENTS

Execution Stance

You are the executive orchestrator.

Keep prioritization, remediation approval, and final readiness judgment on the lead model.
Delegate pillar assessments and bounded fixes to focused subagents.
Use parallel fanout by default for independent pillars.

Delegation Judgment

Core Insight

Profile Contract

Workflow

Phase 0: Profile CRUD

Phase 1: Assess

Launch all pillar assessments simultaneously:

Each subagent uses Explore agent type and reads references/pillar-checks.md for its pillar's specific pass/fail criteria. Output format per subagent:

## [Pillar Name]
Score: X/Y checks passed
Maturity: L1-L5

### Passing
- [check]: [evidence]

### Failing
- [check]: [what's missing] → [recommended fix]

### Highest-Impact Fix
[The single change that would most improve this pillar]

Phase 2: Report

Synthesize subagent results into a single readiness report:

# Agent Readiness Report: [project-name]

## Overall: Level X — [Maturity Name] (XX%)

| Pillar | Score | Level | Top Fix |
|--------|-------|-------|---------|
| Style & Validation | 4/6 | L3 | Add pre-commit hooks |
| Testing | 2/7 | L1 | Add unit test runner |
| ... | ... | ... | ... |

## Maturity Levels
- L1 Functional: Code runs, but agents need hand-holding
- L2 Documented: Processes written down, some automation
- L3 Standardized: Enforced automation, agents handle routine work
- L4 Optimized: Fast feedback, data-driven improvement
- L5 Autonomous: Self-improving, sophisticated orchestration

## Top 5 Recommendations (by impact)
1. [highest impact fix across all pillars]
2. ...

## Detailed Findings
[per-pillar sections from subagents]

Maturity level is gated: must pass 80% of criteria at current level and all previous levels before advancing. This prevents cherry-picking.

Phase 3: Clarify

Present the report and top 5 recommendations to the user. Ask:

Which recommendations should we execute now?
Any pillars to skip or deprioritize?
Any constraints (don't change CI provider, keep current test framework, etc.)?

One question at a time. Don't dump all three at once.

Phase 4: Fix

Typical fix categories:

After fixes: re-run the failing checks to verify improvement. Report the before/after delta.

Phase 5: Re-assess (optional)

If --fix was used, re-run the full assessment to show the improved score. Present the before/after comparison.

Phase 6: Persist Contract Delta

Routing

Pillar Check Reference

All specific checks are in references/pillar-checks.md. Each check is:

Binary: pass or fail (no subjective scoring)
Evidence-based: the subagent must cite the file/config/output that proves pass/fail
Actionable: every failing check has a concrete remediation

See references/agent-readiness-principles.md for the deeper "why" behind each pillar — useful when explaining recommendations to the user.

Gotchas

Scoring without fixing is theater. The report is only useful if it leads to action. Always push toward Phase 4.
Don't introduce tools the team hasn't chosen. If they use Jest, don't suggest Vitest. If they use ESLint, don't add Biome. Work within existing choices.
Pre-commit hooks are the highest-leverage single fix for most codebases. They give agents instant feedback instead of waiting for CI.
CLAUDE.md/AGENTS.md is the highest-leverage documentation fix. It's the file agents actually read. A perfect README that agents ignore is worth less than a scrappy CLAUDE.md they always load.
Hidden source-of-truth is readiness debt. CMS-only, admin-UI-only, or unknown state must be exposed through code, local files, MCP, CLI, API, or skill, or carry a future-expiring waiver. Route external-system remediation through meta/INTEGRATION_GUIDE.md before choosing a mechanism.
Don't conflate coverage percentage with test quality. 80% coverage with shallow tests is worse than 50% coverage with behavior-focused tests. Check for assertion density, not just line coverage.
Monorepos need per-app assessment. A monorepo score is the floor of its worst app, not the average.
Speed matters as much as existence. A test suite that takes 20 minutes is nearly as bad as no tests for agent workflows. Measure execution time.
The codebase is the product, not the agent. Every fix here improves the experience for ALL agents and ALL developers, not just one tool.

Related Skills

phrazzld/compound

testing

VerifiedTrustedCommunity

Capture one compounding repo-technical learning while a solved problem is still fresh. Use when: after a bug fix, diagnosis, delivery, review, or incident reveals a reusable pattern worth adding to `docs/solutions/`. Trigger: /compound, /capture-learning, /learning.

13SKILL.mdUpdated Jul 5, 2026

phrazzld/factory-apps

testing

VerifiedTrustedCommunity

Route Misty Step factory application capabilities. Use when choosing, auditing, integrating, or operating Canary, Powder, Landmark, Aesthetic, or Bitterblossom: production observability, incidents, health checks, error logging, backlog/work-card state, release intelligence, UI/UX system adoption, or supervised/unsupervised agent dispatch. Trigger: /factory-apps, /factory-stack.

13SKILL.mdUpdated Jul 4, 2026

phrazzld/factory-apps

phrazzld/skill-eval

testing

VerifiedTrustedCommunity

Prove a skill beats no-skill with a falsifiable A/B eval, or retire it. Design, generate, run, and maintain a skill-specific eval: name the one claim the skill must earn, run it skill-on vs raw same-model, grade blind with objective checks first, return a keep/adapt/cut verdict. Use when: "eval this skill", "does this skill help", "prove the skill beats no skill", "write an eval for", "benchmark a skill", "is this skill worth it", "skill A/B", "skill regression test", "generate skill evals". Trigger: /skill-eval, /eval-skill, /prove-skill.

13SKILL.mdUpdated Jul 2, 2026

phrazzld/skills/harness-engineering/templates/repo-local-skill

tools

VerifiedTrustedCommunity

> Template. Copy to `<target-repo>/.agents/skills/<repo>-<domain>/SKILL.md` > and fill every bracketed placeholder from the live target repo. Delete this > line and every other `> ` guidance line before committing. See > `../../references/repo-local-skill-generation.md` for the full process. --- name: <repo>-<domain> description: | [One paragraph: what this skill verifies/runs/operates for <repo>, stated in terms of the repo's real shape (service/CLI/library/etc.), not generic process. En

13SKILL.mdUpdated Jul 2, 2026

phrazzld/skills/harness-engineering/templates/repo-local-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/phrazzld/spellbook.git

# Copy into Claude Code skills folder (global)
cp -r spellbook/skills/agent-readiness ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

phrazzld/spellbook

13 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT