codex/skills/ghost/SKILL.md
Create a language-agnostic ghost package (spec + portable tests) from a repo: SPEC.md, exhaustive tests.yaml, INSTALL.md, README.md, VERIFY.md, and upstream LICENSE provenance/regeneration. Use for `$ghost`, ghostify, spec-ify/spec-package this library, ghost library, or portable spec/tests for libraries or tool-using agent loops. For Lean-aided/formal/proved/machine-checked ghost extraction, keep Ghost as artifact authority and route Lean modeling/proof through `$lean`; do not use for implementation or skill edits.
npx skillsauth add tkersey/dotfiles ghostInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a ghost package (spec + tests + install prompt) from an existing repo.
Preserve behavior, not prose:
tests.yaml is the behavior contract (operation cases and/or scenarios)The output is language-agnostic so it can be implemented in any target language or harness.
When formal assurance is requested, use $lean as a proof workbench for the Ghost contract: formalize the portable model, prove selected invariants/case obligations, and feed any proof failures back into SPEC.md, tests.yaml, or VERIFY.md. Lean proof work strengthens extraction quality, but tests.yaml and upstream evidence remain the portable contract.
Scenario testing frame (for agentic systems / tool loops):
This approach works best when the system’s behavior can be expressed as deterministic data:
It also works for agentic systems when behavior can be expressed as controlled, replayable scenarios:
It also works for layered interface-heavy agentic systems when the source exposes named runtime seams that must survive extraction:
It gets harder (but is still possible) when the contract depends on time, randomness, IO, concurrency, global state, or platform details. In those cases, make assumptions explicit in SPEC.md + VERIFY.md, and normalize nondeterminism into explicit inputs/outputs.
LICENSE*.VERIFY.md (adapter runner preferred; sampling fallback allowed).VERIFY.md (upstream repo + revision, how artifacts were produced, and how to rerun verification).tests.yaml contract shape that matches the system style (functional API vs protocol/CLI vs scenario) and keep it consistent across SPEC.md, INSTALL.md, and VERIFY.md.tests.yaml harness schema when it is non-trivial (callbacks, mutation steps, warnings, multi-step protocol setup, etc.).
TESTS_SCHEMA.md.INSTALL.md MUST reference it when present.skip cases; only skip when deterministic setup is currently infeasible, and record why.tests.yaml and verification proof before calling extraction done.verification/evidence/ and fail extraction unless it passes uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence.verification/evidence/inventory.json synchronized with tests.yaml: public_operations must match non-workflow operation ids and primary_workflows must match workflow/scenario ids (coverage_mode defaults to exhaustive; when sampled, include sampled_case_ids).traceability.csv and has at least one baseline (mutated=false) pass row in adapter_results.jsonl (all tests.yaml cases for exhaustive; inventory.json.sampled_case_ids for sampled).exhaustive; sampled ids for sampled), plus mutation sensitivity and independent regeneration parity passes.VERIFY.md: default exhaustive; sampled is allowed only when full adapter execution is infeasible and must list sampled case ids plus rationale (including inventory.json.sampled_case_ids).coverage_mode=exhaustive as "all required cases execute and pass"; if a case cannot run, move to coverage_mode=sampled with explicit sampled_case_ids or remove it from the required set instead of leaving it as an unresolved skip.Core Conformance, Extension Conformance, and Real Integration Profile.Conformance Profile, Validation Matrix, and Definition of Done sections in SPEC.md.Summary, Regenerate, Validation Matrix, Traceability Matrix, Mutation Sensitivity, Regeneration Parity, and Limitations sections in VERIFY.md.SPEC.md: State Model, Transition Triggers, Recovery/Idempotency, and Reference Algorithm.layered_agentic when the public contract materially depends on named runtime seams such as provider-specific interfaces, extension registries, execution environments, event surfaces, or persisted artifact contracts.verification/evidence/inventory.json.contract_class=layered_agentic for those extractions and keep default behavior unchanged for ordinary ghost packages.layered_agentic extractions to produce verification/evidence/interface_inventory.json and verification/evidence/contract_traceability.csv.layered_agentic extractions to map named surfaces, boundary invariants, and persisted artifact contracts to explicit case_id values; generated fallback case ids are not sufficient in this mode.--legacy-allow --legacy-reason "<rationale>") and never default.$lean as the formal proof partner when the user requests Lean-aided Ghost extraction, formal verification, proof of a Ghost spec, or machine-checked invariants; do not presume file-system access to another skill definition such as ../lean/SKILL.md.tests.yaml, upstream tests/traces, or adapter verification against the upstream implementation.$lean proof work to follow normal Lean proof discipline: pinned toolchain, project-aware build command, no unsolved goals, no sorry, no admit, no new axioms, no unexplained partial, and no unsound shortcuts.$lean proof pass/fail in VERIFY.md, including proof targets, source revision modeled, Ghost artifacts modeled, toolchain, commands, theorem/case counts, placeholder/axiom check, and limitations.Core Conformance:
Extension Conformance:
Real Integration Profile:
VERIFY.mdProfile usage rules:
SPEC.md and VERIFY.md must state which profile each validation requirement belongs to.Validation Matrix and Definition of Done must align with the selected profile labels.<repo-name>-ghost)For scenario-heavy (agentic) extractions, also collect:
For layered_agentic extractions, also collect:
tests.yaml organizes cases by operation ids (stable identifiers for public API entries). Use a naming scheme that survives translation across languages:
foo (top-level function)module.foo (namespaced function)Class#method (instance method)Class.method (static/class method)Avoid language-specific spellings in ids (e.g., avoid snake_case vs camelCase wars). Prefer the canonical name used by the source library’s docs.
For agentic scenario suites, operation ids SHOULD match tool names as the agent sees them (e.g. orders.lookup, tickets.create).
When using scenario testing, keep scenario ids stable and descriptive:
refund.create_ticket_with_guardrailscalendar.reschedule_with_rate_limitsecurity.prompt_injection_from_tool_outputEvery executable case SHOULD carry a stable case_id and use it as the primary key across evidence artifacts.
<operation-id>.<behavior> for operation cases.case_id is acceptable.traceability.csv and adapter_results.jsonl MUST use the same case_id tokens.layered_agentic extractions MUST provide explicit case_id on every executable case.Pick one schema and stay consistent:
{name,input,output|error} cases.meta + operations, where operation ids live under operations and cases include command/state assertions.meta + scenarios, where scenario ids live under scenarios and each scenario defines environment + tools + goal + oracles.tests.yaml versiontests.yaml MUST include a source version identifier that ties cases to upstream evidence.
git:<short-sha> or git describe).version.meta.version for test schema version and include meta.source_version for upstream evidence version.meta.version for schema version and include meta.source_version for upstream evidence version.$lean (optional)Use this mode only when the user asks for Lean-aided, formal, proved, machine-checked, theorem-backed, or high-assurance Ghost extraction. Do not make Lean a required part of ordinary Ghost packages.
The Ghost skill remains responsible for language-agnostic artifacts. Route Lean-specific design, theorem statements, proof repair, theorem search, Lake/toolchain handling, and placeholder checks through $lean. Do not assume local access to $lean source files; invoke/use the skill by name and apply its guidance.
A $lean pass should answer one or more of these questions:
SPEC.md rules internally coherent?tests.yaml case satisfy the formal model?A $lean pass must not override upstream evidence. If Lean, tests.yaml, and upstream adapter results disagree, fix the model/spec/test extraction or record a limitation; do not silently privilege the proof model.
Lean may certify:
Lean does not automatically certify:
Use precise wording: “Lean proved the Ghost model satisfies invariant X and all modeled tests.yaml cases conform to that model.” Do not write “Lean proved the upstream library correct” unless that is literally true.
When formalization is useful, create a temporary Lean workbench outside the Ghost repo, for example:
.ghost-lean-workbench/ # ignored / scratch
lean-toolchain
lakefile.lean
GhostFormal/
Portable.lean
Model.lean
Cases.lean
Invariants.lean
Workflows.lean
Soundness.lean
GhostFormal.lean
Do not ship this workbench in the Ghost repo by default. If the user explicitly requests retained formal files, put them in a separate adjunct package or clearly marked folder and explain that they are supporting formal evidence, not required implementation artifacts.
Lean types must mirror the portable Ghost contract, not source-language internals.
Prefer:
hex or base64 in tests.yaml)Except Error Output for deterministic success/error behaviorAvoid:
Float unless IEEE-754 behavior is truly contractualtests.yamlFor deterministic operations:
inductive ContractError
| invalidInput
| outOfRange
| unsupported
deriving DecidableEq, Repr
structure Input where
-- portable fields from tests.yaml
deriving DecidableEq, Repr
structure Output where
-- portable fields from tests.yaml
deriving DecidableEq, Repr
def spec (i : Input) : Except ContractError Output :=
-- declarative model or reference algorithm
For nondeterministic/spec-only behavior:
def SpecRel (i : Input) (o : Output) : Prop :=
-- allowed outputs under explicit assumptions
theorem deterministic_if_claimed :
∀ i o1 o2, SpecRel i o1 -> SpecRel i o2 -> o1 = o2 := by
-- only prove and claim this if true
For stateful workflows:
structure State where
-- portable state fields
deriving DecidableEq, Repr
structure StepResult where
output : Output
state' : State
deriving DecidableEq, Repr
def step (s : State) (i : Input) : Except ContractError StepResult :=
-- transition model
def StateInvariant (s : State) : Prop :=
-- lifecycle/continuity invariant
theorem step_preserves_invariant :
∀ s i r, StateInvariant s -> step s i = Except.ok r -> StateInvariant r.state' := by
-- proof via $lean
For agentic scenarios, model controlled traces and hard safety predicates:
inductive TraceEvent
| toolCall : String -> TraceEvent
| stateMutation : String -> TraceEvent
| userMessage : String -> TraceEvent
deriving DecidableEq, Repr
def NoForbiddenTools (tr : List TraceEvent) : Prop :=
∀ ev, ev ∈ tr -> ¬ match ev with
| TraceEvent.toolCall "payments.charge" => True
| _ => False
tests.yamlEvery modeled executable case should map to a stable Lean obligation. Generate or transcribe obligations from tests.yaml, not from a separate interpretation of the source tests.
Recommended theorem convention:
/-- case_id: normalize.trims_ascii_space -/
theorem case_normalize_trims_ascii_space :
normalize " abc " = "abc" := by
rfl
For finite ground cases, native_decide or decide is acceptable when all involved definitions are transparent, finite, and auditable. For laws and invariants, prefer structural proofs over proof-by-computation.
Select proof targets that materially improve reimplementation fidelity:
Do not promote an attractive theorem to SPEC.md unless upstream evidence or an explicit user requirement justifies making it normative.
Use failed proof attempts as extraction feedback. Classify each failure as:
SPEC.md overclaimtests.yaml contradictionResolution order:
SPEC.md if it overclaims.tests.yaml if the portable case is wrong or contradictory.VERIFY.md.Do not leave a claimed invariant unproved and still mark it verified.
VERIFY.md proof auditWhen $lean was used, add a formal proof audit under VERIFY.md Summary or Validation Matrix:
### Lean formal proof audit
- Skill used: `$lean`
- Workbench: scratch|retained adjunct|not retained
- Lean toolchain: ...
- Build/check command: `lake build`
- Source revision modeled: ...
- Ghost artifacts modeled: `SPEC.md`, `tests.yaml`
- Proof result: pass|fail
- Proof targets:
- case conformance: pass|fail|not attempted, N modeled cases
- model determinism: pass|fail|not claimed
- normalization idempotence: pass|fail|not claimed
- invariant preservation: pass|fail|not claimed
- workflow/trace safety: pass|fail|not claimed
- Placeholder/axiom check: no `sorry`, no `admit`, no new `axiom`
- Limitations: ...
In traceability.csv, use proof_artifact values like:
lean:GhostFormal.Soundness.case_parse_valid_minimallean:GhostFormal.Invariants.normalize_idempotentadapter:<run_id>manual:<source-ref> only when unavoidable and clearly limitedLean proof evidence and upstream adapter evidence are complementary; both should be visible in VERIFY.md when both were used.
tests.yaml layout (functional, protocol/CLI, or scenario) and keep it consistent across SPEC.md, INSTALL.md, and VERIFY.md.VERIFY.md.$lean proof targets up front (case conformance, invariant preservation, determinism, normalization, parser/serializer laws, state-machine safety, or trace safety) and keep those targets aligned with SPEC.md claims.For agentic systems, define success criteria as:
__all__pub items re-exported from lib.rsbuild.zig module graph (root_source_file, addModule, pub usingnamespace) is source of truth; defaults are often src/root.zig (library) and src/main.zig (exe) but repos vary; treat C ABI export as public only if documentedpublic types/members in the target package/namespaceFor agentic systems:
For layered_agentic systems:
contract_class=layered_agentic only when the contract materially depends on named interfaces, extension seams, or persisted artifacts.interface, provider_edge, extension_point, event_surface, or artifact_boundary.For scenario suites, also harvest:
For layered_agentic suites, also harvest:
SPEC.md (strict, language-agnostic)Conformance Profile, Validation Matrix, and Definition of Done sections.tests.yaml.$lean formalization is requested, write invariants as precise, evidence-backed rules with explicit preconditions so they can become theorem statements; do not include attractive but unproved laws as normative claims.layered_agentic specs, also include:
Interface Surfaces with named surfaces, kinds, layers, and source referencesBoundary Contracts describing the explicit cross-surface rules that implementations must preserveExtension Points describing registries, hooks, or provider-specific seams that are part of the public contractPersistent Artifacts describing contract-bearing files or event payloads and their required fieldsreferences/templates.md for structure.tests.yaml (exhaustive)version or meta.source_version).name, input, and a stable case_id (recommended)output or error: truemeta + operationscase_id, name, input, and deterministic expected outcomes (for example exit_code, machine-readable stdout assertions, and state assertions)yes, no, on, off, null) to avoid parser disagreementsTESTS_SCHEMA.md (root object, dot segments, [index] arrays, and "missing path fails assertion").tests.yaml includes harness directives beyond basic {name,input,output|error} (e.g. callbacks by label, mutation steps, warning sinks, setup scripts), document them in TESTS_SCHEMA.md.skip rare; every skip must include a concrete reason and be accounted for in VERIFY.md.layered_agentic, require explicit case_id on every executable case and map those ids into surface/invariant/artifact evidence; do not rely on verifier-generated fallback ids.output is exact.references/templates.md.$lean formalization is requested, ensure modeled cases have stable case_id values and portable inputs/outputs that can be translated directly into Lean obligations without depending on source-language internals.$lean$lean; do not read or assume local skill files.SPEC.md + tests.yaml), not from a separate, drifting interpretation.case_id.SPEC.md claim that fails under the formal model.VERIFY.md.sorry, admit, new axioms, unsolved goals, or unsound shortcuts before marking any formal target as passed.INSTALL.md + README.md + VERIFY.md + LICENSE*INSTALL.md: a short prompt for implementing the library in any language, referencing SPEC.md and tests.yaml.README.md: explain what the ghost library is, list operations, and describe the included files.TESTS_SCHEMA.md (when needed): define the tests.yaml harness schema and any callback catalogs or side-effect capture requirements.VERIFY.md: describe provenance + how the ghost artifacts were produced and verified against the source library (adapter-first; sampling fallback).
Summary, Regenerate, Validation Matrix, Traceability Matrix, Mutation Sensitivity, Regeneration Parity, and Limitations sectionslayered_agentic extractions, also include Normative Source Map, Surface Coverage Matrix, Boundary Invariants, and Artifact Contract Coverageverification/evidence/ and the verifier command/result$lean was used, include a Lean formal proof audit with the skill name $lean, toolchain, build command, modeled artifacts, proof targets, theorem/case counts, placeholder/axiom check, pass/fail result, and limitationsLICENSE*: preserve the upstream repo’s license files verbatim.
LICENSE, LICENSE.md, COPYING*LICENSE file stating that no upstream license was foundtests.yaml parses and case counts match or exceed the source tests covering the public API.skip) case unless infeasible, and list any exceptions in VERIFY.md.tests.yaml against the upstream system (library or agent).
VERIFY.mdverification/evidence/:
inventory.json (public operations + primary workflows, including reset requirements; optional coverage_mode, and sampled_case_ids when coverage_mode=sampled; optional contract_class=default|layered_agentic, defaulting to default)traceability.csv (operation/workflow -> case ids -> proof artifact -> adapter run id; when $lean is used, proof_artifact may cite a named Lean theorem such as lean:GhostFormal.Soundness.case_<id>)workflow_loops.json (loop cases + continuity assertions + reset assertions when required)adapter_results.jsonl (case-level results with run_id, case_id, status, and mutation marker)mutation_check.json (required mutation count + detected failures + pass/fail)parity.json (independent regeneration parity verdict + diff count)interface_inventory.json (layered_agentic only: surfaces, boundary_invariants, and artifact_contracts, each with source_refs and required_case_ids)contract_traceability.csv (layered_agentic only: target_type,target_id,case_id,proof_artifact,adapter_run_id, where target_type is surface|invariant|artifact)uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence; non-zero exit means extraction is incomplete.--legacy-allow --legacy-reason "<rationale>" only for explicit manual break-glass migrations.VERIFY.mdreferences/verification.md for a checklist and VERIFY.md template.VERIFY.md instead.$lean was used, the regeneration recipe must say how to recreate the formal workbench or proof obligations, run the Lean build/check command, and reconcile proof failures with Ghost artifacts.Produce only these artifacts in the ghost repo:
README.mdSPEC.mdtests.yamlTESTS_SCHEMA.md (optional; include when tests.yaml has non-trivial harness semantics)INSTALL.mdVERIFY.mdverification/evidence/inventory.jsonverification/evidence/traceability.csvverification/evidence/workflow_loops.jsonverification/evidence/adapter_results.jsonlverification/evidence/mutation_check.jsonverification/evidence/parity.jsonverification/evidence/interface_inventory.json (layered_agentic only)verification/evidence/contract_traceability.csv (layered_agentic only)verification/evidence/structure_contract.json (optional, recommended for explicit structure policy)LICENSE* (copied from upstream).gitignore (optional, minimal)zig build test (if build.zig defines tests); otherwise zig test path/to/file.zig for the library root and any test entrypoints.self of type T/*T as an instance method (T#method); otherwise use T.method.comptime parameters: record allowed values in SPEC.md, and represent them as ordinary fields in tests.yaml inputs.std.mem.Allocator or caller-provided buffers, specify ownership and mutation rules; assume allocations succeed unless tests cover OOM.tests.yaml strict (error: true only); in a Zig adapter, treat "any error return" as a passing error case and rely on SPEC.md for exact conditions.tests.yaml to JSON (or JSONL) as an intermediate and have a Zig runner parse it via std.json.references/templates.md (artifact outlines and YAML format)references/verification.md (verification checklist + VERIFY.md template)tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.