skills/local/cali-agents-md-validator/SKILL.md
[Cali] Validate project AGENTS.md files against best practices and the canonical template. Use when: user says 'validate agents md', 'check agents md quality', 'audit my agents md', or after creating/updating AGENTS.md. Checks: structure, size, content quality, template compliance, internal-reference consistency, and provides portable fix recommendations. Works standalone across agent ecosystems (pi, Claude Code, OpenAI Codex, custom harnesses) — no agent-harness-specific tool dependencies.
npx skillsauth add renatocaliari/agent-sync-public-skills cali-agents-md-validatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validates AGENTS.md files against 13 criteria from industry research and the canonical cali-agents-md-generator template. Pure bash + grep — no agent harness dependencies. Works on any POSIX shell.
This skill makes no assumptions about the agent harness. It does not call
ask_user_question, /skill:, or any harness-specific API. All checks are
pure bash / grep / awk. The skill body instructs the agent in prose; the
agent decides how to surface questions to the user based on its own tools.
Severity legend: FAIL = must fix, WARN = should fix, INFO = observation.
test -f AGENTS.md && echo "pass" || echo "fail"
lines=$(wc -l < AGENTS.md)
if [ "$lines" -gt 150 ]; then echo "fail"; else echo "pass"; fi
Source: ETH Zurich research shows >150 lines causes "silent rule dropout". 150 is the hard safety limit. See R13 for the soft target.
grep -qE "^##[[:space:]]+Commands" AGENTS.md && echo "pass" || echo "fail"
Source: OpenAI Codex, GitHub analysis — copy-pasteable commands are essential.
grep -qE "^##[[:space:]]+Don'ts" AGENTS.md && echo "pass" || echo "fail"
Source: GitHub 2500+ repos analysis — explicit prohibitions prevent common errors.
grep -qiE "(api_key|secret|password|token|credential).*=.*['\"][a-zA-Z0-9]{20,}" AGENTS.md && echo "fail" || echo "pass"
Source: GitHub analysis — #1 most common constraint in AGENTS.md files.
# Catches: TypeScript 5.7, Go 1.26, Node >=20, vitest ^3.1.1, python 3.12
# Excludes: prose mentions like "see section 2.1" or "3.5x speedup"
grep -qE "[A-Za-z][A-Za-z0-9.+_-]*[[:space:]]*[v^~>=<]*[0-9]+\.[0-9]+" AGENTS.md && echo "pass" || echo "warn"
False positive risk: low — pattern requires word boundary before version. Worst case is "step 3.2" or "section 2.1" matching in prose; not harmful (WARN only, doesn't fail).
Source: Augment Code, Inngest repo — "React 19" not just "React". Applies to any language: TypeScript, Go, Node, Python, Rust, Swift, Kotlin, etc.
# Detects sections > 20 lines of prose (excluding tables and code)
awk '/^##[^#]/{section=$0; lines=0; next}
/^##[^#]/{if(lines>20) print section": "lines" lines"; lines=0; next}
{lines++}' AGENTS.md | head -3
Source: ETH Zurich — arch sections increase cost without improving success.
Move deep details to README or docs/.
head -50 AGENTS.md | grep -qE "^##[[:space:]]+(Don'ts|Commands|Rule)" && echo "pass" || echo "warn"
Source: "Lost in the middle" phenomenon — LLMs attend most to first/last 25%.
# Detects kebab-case backtick strings (works for any skill naming convention):
# `my-team-workflow`, `cali-agents-md-validator`, `user-defined-skill`
# Pattern: 3+ words separated by hyphens, lowercase + digits
grep -oE '`[a-z][a-z0-9]+(-[a-z0-9]+){2,}`' AGENTS.md | sort -u | head -5
If at least one match: pass. Otherwise warn.
Why this works for any ecosystem:
cali-... (Cali ecosystem)my-org-... (other org ecosystems)user-defined-... (any custom prefix)step-1 (only 2 words), myvar (no hyphens), /path/to/file.md (paths)has_commands=$(grep -qE "^##[[:space:]]+Commands" AGENTS.md && echo 1 || echo 0)
has_donts=$(grep -qE "^##[[:space:]]+Don'ts" AGENTS.md && echo 1 || echo 0)
has_arch=$(grep -qE "^##[[:space:]]+(Architecture|Stack)" AGENTS.md && echo 1 || echo 0)
[ "$has_commands" -eq 1 ] && [ "$has_donts" -eq 1 ] && [ "$has_arch" -eq 1 ] && echo "pass" || echo "warn"
Source: cali-agents-md-generator template — Commands + Don'ts + Architecture are the core sections.
# Check that files referenced in AGENTS.md actually exist
# Pattern: `path/to/file.md` or `path/to/file.ts` (any relative file mention)
missing=$(grep -oE '`[a-zA-Z0-9._/-]+\.(md|ts|tsx|js|sh|json|yml|yaml)`' AGENTS.md \
| tr -d '`' | sort -u \
| while read f; do [ -f "$f" ] || echo "$f"; done)
[ -z "$missing" ] && echo "pass" || echo "fail: missing:$missing"
Why this rule: Agents reading AGENTS.md may follow file references. A broken reference wastes context and erodes trust. This rule catches stale mentions after renames or moves.
Source: Microsoft Learn skills guidance — "verify references resolve".
# For every local file reference in AGENTS.md, check it's also documented
# in a "References" or "Index" section (if present). If no such section,
# this check is a no-op.
if grep -qE "^##[[:space:]]+(References|Index|Related)" AGENTS.md; then
body_refs=$(grep -oE '`[a-zA-Z0-9._/-]+\.(md|ts|tsx|js|sh|json|yml|yaml)`' AGENTS.md \
| tr -d '`' | sort -u)
index_section=$(awk '/^##[[:space:]]+(References|Index|Related)/,/^##[^#]/' AGENTS.md)
missing_in_index=""
for ref in $body_refs; do
echo "$index_section" | grep -q "$ref" || missing_in_index="$missing_in_index $ref"
done
[ -z "$missing_in_index" ] && echo "pass" || echo "warn: not in index:$missing_in_index"
else
echo "skip: no References section"
fi
Why this rule: If AGENTS.md has a References section, every file mentioned in the body should be listed there. Otherwise the section is incomplete and misleading. Skip if no References section.
lines=$(wc -l < AGENTS.md)
if [ "$lines" -gt 150 ]; then echo "fail (see R2)"; \
elif [ "$lines" -gt 100 ]; then echo "warn: over soft target"; \
elif [ "$lines" -lt 20 ]; then echo "info: minimal-viable file"; \
else echo "pass: in sweet spot"; fi
Source: Augment Code study (100-150 line AGENTS.md + reference docs = 10-15% improvement), aihackers.net (start with 10-15 lines, iterate), Anthropic docs (progressive disclosure), ETH Zurich (rule dropout >150 lines).
Sweet spot: 70-100 lines is the optimal range per multiple independent studies. 20-30 is too aggressive for real projects; 150+ causes rule dropout.
bash references/validate-agents-md.sh [path-to-AGENTS.md]
Default: ./AGENTS.md. Script outputs ✅/❌/⚠️/ℹ️ per rule + summary.
Show the user a clear report. Use this format (works in any harness):
📊 AGENTS.md Validation Report
Checking: ./AGENTS.md
✅ R1: File exists
✅ R2: 88 lines (≤150 hard limit)
✅ R3: Commands section
✅ R4: Don'ts section
✅ R5: No secrets
⚠️ R6: No exact versions — add TypeScript 5.7, Vitest 3.1.1
✅ R7: No arch bloat
✅ R8: Critical rules in first 50 lines
⚠️ R9: No skill references — add `my-skill-name` for extended docs
✅ R10: Template compliance
✅ R11: All internal references resolve
⚠️ R12: Index drift — `docs/foo.md` mentioned in body, not in References
✅ R13: 88 lines (in 70-100 sweet spot)
Result: 10/13 passed, 3 warnings
Ask the user how to proceed. Phrase as plain prose — the agent decides how
to surface this question based on its harness (some have ask_user_question,
some use a confirmation prompt, some just describe in text).
Suggested choices:
For each warning/failure, generate a fix:
| Rule | Fix |
|------|-----|
| R2 (too long) | Trim sections, move content to docs/ |
| R3 (no Commands) | Generate Commands section from package.json / Makefile |
| R4 (no Don'ts) | Generate Don'ts section from common project errors |
| R5 (secrets) | Immediately remove and warn |
| R6 (no versions) | Detect versions from go.mod / package.json |
| R7 (arch bloat) | Suggest moving to README or docs/ |
| R8 (rules buried) | Reorder sections (Don'ts + Commands to top) |
| R9 (no skill refs) | Add skill references for related workflows |
| R10 (template) | Restructure to match template |
| R11 (broken refs) | Rename or remove stale file references |
| R12 (index drift) | Add missing entries to References section |
| R13 (off-target) | Move content to docs/ (if over) or expand (if under) |
Re-run validation. Confirm all FAILs resolved and WARNs reduced.
After validation, suggest adding a Last validated: YYYY-MM-DD line near the
top of AGENTS.md. Format:
<!-- Last validated: 2026-06-04 by cali-agents-md-validator -->
The agent (or user) notices when the timestamp is stale and re-runs validation. No telemetry required — this is a passive nudge, not tracking.
Use this decision tree to decide where new content goes. The validator may suggest moves based on these criteria when AGENTS.md grows past 100 lines.
Is it triggered by explicit user intent ("validate", "create", "fix")?
YES → SKILL (workflow-shaped, has trigger description)
NO ↓
Is it a multi-step workflow with decision points?
YES → SKILL
NO ↓
Is it a rule the agent must never violate?
YES → AGENTS.md (Don'ts section)
NO ↓
Is it a command the agent runs frequently?
YES → AGENTS.md (Commands section)
NO ↓
Is it static knowledge consumed by skills on demand?
YES → REFERENCE (docs/<topic>.md or skill's references/)
NO ↓
If unsure: start in AGENTS.md, move to reference when AGENTS.md > 100 lines,
move to skill when it becomes a multi-step workflow.
Sources: Anthropic skill authoring best practices, Microsoft Learn (adding skills), AgentSkills.io (progressive disclosure), AgentPatterns.ai (skill design patterns).
The canonical template structure (from cali-agents-md-generator):
# AGENTS.md
<!-- Last validated: YYYY-MM-DD by cali-agents-md-validator -->
## Project Overview
[Brief description with stack + versions]
## Commands
| Command | Description |
|---------|-------------|
| `npm test` | Run all tests |
## Architecture
[1-3 line summary — link to `docs/architecture.md` for details]
## Don'ts
- Things to never do
## References
- `docs/<topic>.md` — description
- `my-skill-name` — when to use
Report R1 fail. Suggest running cali-agents-md-generator skill (if
available in the user's ecosystem — describe in prose, do not hardcode
a slash command).
R2 passes, R13 may warn. No action needed for R2; consider trimming for R13.
R10 still passes (any Don'ts is fine). Suggest consolidation in the report.
R11 may false-positive on ${HOME}/.config/foo. Skip pattern if matches ${.
cali-agents-md-generator)references/validate-agents-md.sh — Bash validation script (R1-R13)cali-agents-md-generator — Canonical template + scaffolding skillcali-skill-validator — Validates skills themselves (covers this skill)agents.md — AGENTS.md standard (github.com/agentsmd/standard)tools
Auto-initialize structured documentation for any project using lat.md (knowledge graph of markdown files with [[wiki links]], // @lat: code refs, and semantic search). Detects cali-product-workflow artifacts (spec-product.md, spec-tech.md, critiques) and uses them as seed material. Falls back to extracting business rules, architecture, and design decisions directly from the codebase. Use when a project lacks structured documentation or when lat.md/ is missing. After seeding, lat.md extension hooks keep documentation alive automatically.
testing
[Cali] Server security audit and hardening for private servers behind Tailscale. Use when: auditing server security, hardening SSH/firewall/Docker, checking for vulnerabilities, setting up fail2ban, reviewing port exposure, or responding to security alerts. Covers 6 layers: CloudFlare, UFW, Tailscale, SSH, Docker, Application. Triggers: "server security", "security audit", "harden server", "SSH hardening", "firewall rules", "UFW config", "fail2ban", "port security", "Docker security", "vulnerability check", "security review".
tools
Run supply chain security scans before installing packages or before releases. Triggers when: user installs a package (npm, pip, go get, brew), user asks to 'scan dependencies', 'check vulnerabilities', 'supply chain', 'security audit', 'run trivy', 'run socket', or before any release/deployment. Also triggers on mentions of: socket.dev, trivy, OSV-scanner, dotenvx, CVE, dependency audit. Covers all four tools with concrete commands.
tools
Create GitHub releases following project conventions. Triggers when: user says 'release', 'create release', 'push release', 'deploy to main', 'merge to main', user merges a PR to main, or when git push to main is detected. Also triggers on mentions of: gh release, semver, version bump, changelog, release-please. Covers: config-driven (read .release.yml and execute) and fallback (gh CLI) release flows, versioning rules, tag management, and the mandatory release-on-merge convention.