skills/harness/SKILL.md
Agent harness architecture — structure a project's agent context across layers for effective AI-assisted development. Covers CLAUDE.md, skills, design docs, hooks, and all artifacts that shape how an agent understands and operates in a codebase. Use when setting up or improving agent configuration, when agent context feels bloated or disorganized, when onboarding a project for AI-assisted development, or when the agent keeps losing architectural awareness mid-task. Triggers on "set up claude", "improve CLAUDE.md", "agent keeps forgetting", "context is too long", "harness setup", "organize agent context", "how should I structure my prompts". Args — `/harness audit` to evaluate an existing project's context architecture, `init` to set up harness from scratch.
npx skillsauth add lidessen/skills harnessInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
An agent's context window is its working memory — finite and precious. The craft of harness programming is migrating the right information to the right context layer, so the agent always has enough awareness to make good decisions without drowning in details it doesn't yet need.
Two concerns, one discipline: context architecture (what the agent knows) and agent lifecycle (how the agent works across time). They meet at artifacts — an artifact is both information (context) and a mechanism for continuity (lifecycle).
When invoked with an argument, dispatch to the corresponding file:
/harness audit → Read and follow commands/audit.md in this skill directory.
Evaluate an existing project's context architecture and suggest improvements./harness init → Read and follow commands/init.md in this skill directory.
First-time project setup — bootstrap a project's harness from scratch.How to structure what the agent knows.
Every piece of information an agent might need belongs at one of three abstraction levels:
┌─────────────────────────────────────────────────────┐
│ L1 Architecture │
│ System shape, boundaries, invariants, principles │
│ Always in context. Small, stable, high-leverage. │
│ ≈ 100–500 tokens per artifact │
├─────────────────────────────────────────────────────┤
│ L2 Design │
│ Patterns, mechanisms, approach, task plan │
│ Loaded on activation. The working blueprint. │
│ ≈ 1000–5000 tokens per artifact │
├─────────────────────────────────────────────────────┤
│ L3 Implementation │
│ Concrete code, scripts, reference data, examples │
│ Loaded on demand. The raw material. │
│ Size varies — only what's needed right now │
└─────────────────────────────────────────────────────┘
The higher the layer, the smaller and more stable it is. L1 gives the agent orientation. L2 gives it a plan. L3 gives it the details to execute.
The key insight: most harness problems come from layer violations — L3 details polluting L1 (bloated CLAUDE.md full of implementation notes), or L1 context missing entirely (agent has no architectural awareness and makes decisions that break system boundaries).
L1 (always present) L2 (on activation) L3 (on demand)
───────────────────── ────────────────────── ──────────────
CLAUDE.md Skill body (SKILL.md) scripts/
Skill metadata design/DESIGN.md references/
(name + description) blueprints/ assets/
Hook triggers Task plans Code files
Project-level invariants Decision records Test fixtures
CLAUDE.md is the most critical L1 artifact. It's always loaded, so every token must earn its place. A good CLAUDE.md contains:
A bad CLAUDE.md contains: file-by-file breakdowns (agent can read the tree), generic best practices (agent already knows), implementation details that change frequently (belongs in L2/L3).
Litmus test: if removing a line from CLAUDE.md wouldn't cause the agent to make a worse architectural decision, the line doesn't belong.
A skill naturally spans all three layers:
name + description in frontmatter (~100 tokens). Loaded
at startup for all installed skills. This is how the agent decides
whether to activate a skill — make it precise.Keep SKILL.md under 500 lines. If it's longer, something belongs in L3.
Smallest effective context — Every token in L1 competes with the agent's working space for the current task. Write L1 artifacts ruthlessly — include only what changes the agent's decisions. Details that are nice-to-know but don't affect judgment belong in L2 or L3.
Stable layers, volatile details — L1 should change rarely (project architecture doesn't shift daily). L2 changes per-task (each blueprint is different). L3 changes constantly (code evolves). If you find yourself updating CLAUDE.md frequently, the information probably belongs at a lower layer.
Pointers over content — When L1 needs to reference complex information, point to it rather than inlining it. "See design/DESIGN.md for module boundaries" is better than copying the module list into CLAUDE.md. The agent loads L2/L3 when needed.
| Symptom | Likely cause | Fix | |---------|-------------|-----| | Agent forgets project architecture mid-task | L1 too thin or missing | Add architectural context to CLAUDE.md | | Agent drowns in context, slow responses | L1 too thick — L3 details leaking up | Audit CLAUDE.md, move details to L2/L3 files | | Agent breaks module boundaries | No design docs or CLAUDE.md lacks boundaries | Add design/ or architectural section to CLAUDE.md | | Agent loads unnecessary files | Skill body has too many inline references | Split into supporting files, load on demand | | Agent repeats same mistakes | Missing hook or missing L1 principle | Add a hook (mechanical) or CLAUDE.md rule (judgment) |
How the agent works across time.
Every agent instance is ephemeral — it lives for one session, then its context is gone. Don't fight this. Design for succession: knowledge survives through artifacts, not through any single agent's memory.
The unit of continuity is the artifact chain, not the agent instance. L1 and L2 artifacts (CLAUDE.md, design docs, blueprints) are the institutional memory that outlives every session. Commit messages are the archaeological record. Blueprint State sections are handoff documents from one generation to the next. Verification criteria are how the next generation trusts the previous one's work.
To give an "agent" a longer effective lifecycle, don't extend the session — raise the abstraction level. An agent operating at L1 (architecture) spans the lifetime of the project. An agent operating at L3 (implementation details) lives and dies within one task. The layers aren't just about context efficiency — they're about temporal scope.
A single task should fit within one context window. If it can't, it's two tasks. This is the fundamental unit of agent work — each task gets a focused context with only the information it needs, preventing earlier work from polluting later decisions. When scoping tasks, ask: can the agent complete this without its context degrading?
Hooks shape agent behavior from outside the context window — always active, zero-cost in tokens. Two flavors:
When you change something that other files reference — a path, a name, a term, a structure — check every file that depends on it. Stale references are a common failure mode: you rename a directory but leave old paths in SKILL.md, change a convention but leave the old wording in CLAUDE.md. A prompt hook that reminds "did you update everything that references what you just changed?" is one of the highest-value hooks you can add to a project.
Agent throughput keeps rising; human review capacity doesn't. Once agents start producing more output than a human can scan in detail, the bottleneck shifts from "can the agent do it" to "can a human catch what matters before the wrong thing ships." Skills and outputs that ignore this constraint scale until exactly the moment they fail.
The principle: agent outputs intended for human review should be skeleton-grade, with details treated as cheaply replaceable. Like code architecture, the human's job is to ensure the skeleton holds — not to inspect every line. Modules can be rewritten with low cost as long as the architecture is right.
This is the meta-pattern the existing methodology skills already practice without naming it:
<details> blocks, references-on-demand. Detail is not the
problem — detail forced into the human's primary view is.If agent output volume grows linearly with agent work but the must- review volume grows sub-linearly, the design is on track. If they grow together, the skill is preparing the human for review failure — sooner or later the human starts skimming the architecture too, and the whole collaboration loses its safety net.
The skeleton/detail split has always existed in software architecture (interfaces vs implementations, public APIs vs internals, design docs vs code). What's new is the ratio. When humans wrote all the code, the ratio of skeleton-level decisions to detail-level decisions roughly matched what humans could review. With agents producing the detail, that ratio explodes — and a discipline that was implicit becomes load-bearing. Skills that don't internalize this principle will scale only until human review caves under the volume.
An agent that understands the reasoning behind a constraint exercises better judgment in novel situations than one following a rigid rule. When writing any harness artifact, explain the why — it costs a few extra tokens but compounds into better decisions across every task.
"If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize — to apply broad principles rather than mechanically following specific rules." — Anthropic's constitution
testing
Operational deployer for the lidessen skills collection — wires harness config (CLAUDE.md / AGENTS.md / .cursor/) in a target project, injects cross-cutting principles (e.g. principal contradiction first), and reconciles when lidessen evolves. Triggers on "/setup-lidessen-skills", "set up lidessen skills", "wire lidessen into this project", "sync lidessen principles", "install lidessen skills". Use after cloning or symlinking lidessen skills into a project, when adopting the collection, or when lidessen has new content the project hasn't picked up. Args — `init` to scaffold, `sync` to re-align with current lidessen, `audit` to check drift without writing. Pairs with harness (portable methodology); this is the lidessen-specific application layer.
development
Designing in territory where the industry is still groping for shape — AI-native systems, agent-first interfaces, any domain whose category is forming. Triggers on "AI native X", "agent-first X", "redefine X", "rebuild X from scratch under Y", "reframe X for Y", "what should X look like in the new paradigm", "design a system with no precedent", or the tension between "new shoes on the old path" and "a skeleton that holds on its own". Method — strip to 3-5 abstract functions, redraw the load-bearing skeleton from the new paradigm's primitives, stress-test without traditional crutches, then add familiar flesh as projection. Do NOT trigger for incremental redesigns within an existing paradigm (use design-driven), explanatory writing (use technical-article-writing), or vague "make it AI" requests. Pairs with design-driven (upstream) and goal-driven (parallel). Args — `/reframe init`, `close`, `explain [for <audience>]`.
development
Goal-driven methodology for multi-week initiatives where the destination is clearer than the path — GOAL.md as stable compass (General Line plus falsifiable success criteria), record captures what was tried and observed. Triggers on "set a goal", "track my progress on X", "this is exploratory", "I know the goal but not the path", or starting a months-long initiative without a clear technical shape. Use for research, exploratory features, learning projects with a shippable output, book/article series, job search, side-business launches. Do NOT trigger for single-task work, bug fixes, week-long features with a clear plan, vague aspirations ("be healthier"), habit tracking, or general life management. Pairs with design-driven (why/how-far vs what-shape) and runs parallel to reframe. Args — `/goal-driven set`, `review`, `close`.
development
Evidence-driven methodology for the execution layer — every claim of progress requires a falsifiable observation; "looks right to me" is rejected. Use for production code, regression-prone systems, or any task where build-time discipline materially affects outcome quality. Triggers on "set up TDD", "build discipline", "no progress without evidence", "test-first", "verify rigorously", "production code workflow". Do NOT trigger for prototypes, exploratory spikes, throwaway scripts, or doc-only changes. Pairs with design-driven (which defines what to verify; evidence-driven defines how) — each works alone. Args — `/evidence-driven init` to wire up agent configs and optional pre-commit hooks. No periodic-audit command; it's an always-on overlay.