skills/frontmatter-guard/SKILL.md
Validate and auto-repair YAML frontmatter on brain pages. Catches malformed pages before they enter the brain (missing closing ---, nested quotes, slug mismatches, null bytes, empty frontmatter, YAML parse failures). Wraps the `gbrain frontmatter` CLI for agent-driven workflows.
npx skillsauth add garrytan/gbrain frontmatter-guardInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convention: see
skills/conventions/quality.mdfor citation rules; this skill is structural validation, not citation auditing.
This skill guarantees:
---, null bytes, slug mismatch) are auto-repairable on demand with .bak backupsgbrain doctor's frontmatter_integrity subcheck — single source of truthBrain pages pile up over months. Agents write them with malformed frontmatter:
--- (entity detector bugs)title: "Phil "Nick" Last")Without a guard, these accumulate silently until gbrain sync chokes or search returns garbage. The guard makes the failure visible at audit time and trivially fixable.
| Code | Meaning | Auto-fixable? |
|------|---------|---------------|
| MISSING_OPEN | File doesn't start with --- | No (needs human) |
| MISSING_CLOSE | No closing --- before first heading | Yes |
| YAML_PARSE | YAML failed to parse | Sometimes (depends on cause) |
| SLUG_MISMATCH | Frontmatter slug: differs from path-derived slug | Yes (removes the field) |
| NULL_BYTES | Binary corruption (\x00) | Yes |
| NESTED_QUOTES | title: "outer "inner" outer" shape | Yes |
| EMPTY_FRONTMATTER | Open + close present but nothing between | No (needs human) |
Run a read-only scan across all registered sources (or one with --source <id>).
gbrain frontmatter audit --json
Reports:
Output is JSON; agents parse errors_by_code and per_source to decide next steps.
Validate a single file or directory (does not require source registration):
gbrain frontmatter validate <path> --json
Exit code 0 = clean; 1 = errors found. Use this in CI pipelines or pre-commit hooks.
When issues are found:
gbrain frontmatter validate <path> --fix
--fix writes <file>.bak for every modified file before mutating. The backup is the safety contract — works whether the brain is a git repo or a plain directory.
--dry-run previews without writing. Use this before applying fixes in batch.
For brain repos that ARE git repos, install the pre-commit hook to block malformed pages from being committed in the first place:
gbrain frontmatter install-hook [--source <id>]
The hook runs gbrain frontmatter validate against staged .md/.mdx files. Bypass with git commit --no-verify.
When the user says any of these, route here:
gbrain frontmatter audit --json first; never assume a brain is clean.--fix operations: state how many files will be modified BEFORE running, then confirm.SLUG_MISMATCH fixes remove the frontmatter slug: field — gbrain derives slug from path. Mention this when the user's title is intentionally renamed.MISSING_OPEN or EMPTY_FRONTMATTER without explicit user input — these usually mean a human author started a page and didn't finish.gbrain doctor — the frontmatter_integrity subcheck reports the same counts as audit.skills/maintain/SKILL.md — broader brain health audit; chain after this skill if other classes of issue are suspected.skills/lint/SKILL.md (via gbrain lint) — overlapping rules for skill-file lint; the frontmatter-* rule names in lint output come from this skill's validation surface.Audit summary (terse, agent-friendly):
Frontmatter audit — 17 issue(s) across 1 source(s)
[default] /Users/me/brain
17 issue(s)
MISSING_CLOSE: 8
NESTED_QUOTES: 5
NULL_BYTES: 4
sample:
people/jane.md — MISSING_CLOSE
companies/acme.md — NESTED_QUOTES
(+ 12 more)
Fix with: gbrain frontmatter validate /Users/me/brain --fix
JSON envelope (when --json is passed):
{
"ok": false,
"total": 17,
"errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
"per_source": [
{
"source_id": "default",
"source_path": "/Users/me/brain",
"total": 17,
"errors_by_code": { "MISSING_CLOSE": 8, "NESTED_QUOTES": 5, "NULL_BYTES": 4 },
"sample": [{ "path": "people/jane.md", "codes": ["MISSING_CLOSE"] }]
}
],
"scanned_at": "2026-04-25T22:30:00.000Z"
}
gbrain frontmatter validate <path> --json returns a similar envelope keyed on per-file results instead of per-source.
This is the most important section. Fixing broken frontmatter is good. Not writing broken frontmatter in the first place is better.
# Correct: single-quoted YAML flow (canonical form gbrain emits)
tags: ['yc', 'w2025', 'ai']
# Correct: unquoted scalars (fine when values have no special chars)
tags: [yc, w2025, ai]
# Correct: block style
tags:
- yc
- w2025
# Tolerated post-v0.37.5.0 but non-canonical: JSON-style double quotes
tags: ["yc", "w2025"]
# Broken: mixed JSON objects and strings (invalid YAML)
tags: [{"name": "sports"}, "posterous"]
Why this used to break: before v0.37.5.0, the validator counted unescaped " characters and flagged any line with 3+. A flow sequence like tags: ["yc", "w2025"] has 4 unescaped " by design — it's valid YAML, but the dumb counter flagged it anyway. One brain saw 6,981 of these on a single doctor run. v0.37.5.0 parses suspicious values with js-yaml.safeLoad before flagging, so JSON-style arrays no longer trigger NESTED_QUOTES.
Why you should still write the canonical form: the auto-fix engine (gbrain frontmatter validate --fix) and the inferred-frontmatter serializer both emit single-quoted YAML for tags: / aliases:. Writing the canonical form in new content keeps the source files stylistically consistent and makes diffs against --fix runs empty.
The classic LLM trap: code like tags: [${items.map(t => JSON.stringify(t)).join(', ')}] produces tags: ["yc", "w2025"]. Use single quotes with an apostrophe fallback: tags: [${items.map(t => t.includes("'") ? JSON.stringify(t) : "'" + t + "'").join(', ')}]. Or use a YAML library that knows how to emit canonical YAML.
# Correct: single quotes for values with special chars
title: 'My "Quoted" Title'
# Correct: double quotes when value has apostrophes
title: "Men's Fashion Guide"
# Broken: double quotes wrapping inner double quotes
title: "My "Quoted" Title"
type: person, batch: w2025: " ' # [ ] { } | > & * ! ? , or starts with @Don't auto-fix MISSING_OPEN or EMPTY_FRONTMATTER without user input. These usually mean a human author started a page and didn't finish — silently inserting --- markers around an unfinished draft is wrong.
Don't use --fix to "make doctor green" without reading the audit first. SLUG_MISMATCH cases are surfaced for manual review specifically because gbrain derives the slug from path. A mismatch usually means the user renamed a file intentionally; auto-removing the slug field is the right outcome only when you've confirmed the rename was deliberate.
Don't skip the .bak backups. The .bak is the safety contract for non-git brain repos. If .bak files accumulate after a fix run, that's a feature, not a bug — the user can review the diffs and delete the backups when satisfied.
Don't run audit on a brain where sources aren't registered. The CLI returns "no registered sources to audit" gracefully, but the migration emits a skipped: no_sources phase result. Don't paper over this with a manual path-walk; the right fix is to register the source via gbrain sources add.
Don't install the pre-commit hook on non-git brain dirs. The install-hook command skips them automatically with a one-line note. If you see "skipped — not a git repo" and want validation at write time anyway, use the audit command on a cron schedule.
research
Self-evolving skill optimization via SkillOpt-paper-grounded text-space optimizer.
development
Keep gbrain current. When a `gbrain` invocation prints an `UPGRADE_AVAILABLE <old> <new>` marker (or `gbrain self-upgrade --check-only` reports an update), apply it per the configured self_upgrade.mode: notify (prompt the operator with a 4-option question + snooze) or auto (apply silently). The action is always the hardcoded `gbrain self-upgrade` — never a command read from the marker.
data-ai
Set up GBrain with auto-provision Supabase or PGLite, AGENTS.md injection, first import
tools
--- name: query-helper triggers: - find a page tools: - search - query writes_pages: false --- # query-helper This skill helps you query the brain. The first prose line becomes the description when no `description:` frontmatter is present.