Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lidessen/harness

Name: harness
Author: lidessen

skills/harness/SKILL.md

npx skillsauth add lidessen/skills harness

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Harness Architecture

An agent's context window is its working memory — finite and precious. The craft of harness programming is migrating the right information to the right context layer, so the agent always has enough awareness to make good decisions without drowning in details it doesn't yet need.

Two concerns, one discipline: context architecture (what the agent knows) and agent lifecycle (how the agent works across time). They meet at artifacts — an artifact is both information (context) and a mechanism for continuity (lifecycle).

Commands

When invoked with an argument, dispatch to the corresponding file:

/harness audit → Read and follow commands/audit.md in this skill directory. Evaluate an existing project's context architecture and suggest improvements.
/harness init → Read and follow commands/init.md in this skill directory. First-time project setup — bootstrap a project's harness from scratch.
No argument → Continue with the methodology below.

Part I: Context Architecture

How to structure what the agent knows.

The Three Layers

Every piece of information an agent might need belongs at one of three abstraction levels:

┌─────────────────────────────────────────────────────┐
│  L1  Architecture                                   │
│  System shape, boundaries, invariants, principles   │
│  Always in context. Small, stable, high-leverage.   │
│  ≈ 100–500 tokens per artifact                      │
├─────────────────────────────────────────────────────┤
│  L2  Design                                         │
│  Patterns, mechanisms, approach, task plan           │
│  Loaded on activation. The working blueprint.       │
│  ≈ 1000–5000 tokens per artifact                    │
├─────────────────────────────────────────────────────┤
│  L3  Implementation                                 │
│  Concrete code, scripts, reference data, examples   │
│  Loaded on demand. The raw material.                │
│  Size varies — only what's needed right now          │
└─────────────────────────────────────────────────────┘

The higher the layer, the smaller and more stable it is. L1 gives the agent orientation. L2 gives it a plan. L3 gives it the details to execute.

The key insight: most harness problems come from layer violations — L3 details polluting L1 (bloated CLAUDE.md full of implementation notes), or L1 context missing entirely (agent has no architectural awareness and makes decisions that break system boundaries).

Mapping Artifacts to Layers

L1 (always present)          L2 (on activation)         L3 (on demand)
─────────────────────        ──────────────────────      ──────────────
CLAUDE.md                    Skill body (SKILL.md)       scripts/
Skill metadata               design/DESIGN.md            references/
  (name + description)       blueprints/                 assets/
Hook triggers                Task plans                  Code files
Project-level invariants     Decision records            Test fixtures

CLAUDE.md — the L1 anchor

CLAUDE.md is the most critical L1 artifact. It's always loaded, so every token must earn its place. A good CLAUDE.md contains:

What this system is — one sentence
How to build/test/run — the commands, nothing more
Architectural shape — module boundaries, data flow, key patterns (or a pointer to design/ if using design-driven)
Non-obvious conventions — things the agent can't derive from code

A bad CLAUDE.md contains: file-by-file breakdowns (agent can read the tree), generic best practices (agent already knows), implementation details that change frequently (belongs in L2/L3).

Litmus test: if removing a line from CLAUDE.md wouldn't cause the agent to make a worse architectural decision, the line doesn't belong.

Skills — L1 metadata, L2 body, L3 files

A skill naturally spans all three layers:

L1: name + description in frontmatter (~100 tokens). Loaded at startup for all installed skills. This is how the agent decides whether to activate a skill — make it precise.
L2: The markdown body of SKILL.md (<5000 tokens). Loaded when activated. Contains the methodology, the loop, the principles.
L3: Supporting files (commands/, scripts/, references/). Loaded only when the skill dispatches to them.

Keep SKILL.md under 500 lines. If it's longer, something belongs in L3.

Context Principles

Smallest effective context — Every token in L1 competes with the agent's working space for the current task. Write L1 artifacts ruthlessly — include only what changes the agent's decisions. Details that are nice-to-know but don't affect judgment belong in L2 or L3.

Stable layers, volatile details — L1 should change rarely (project architecture doesn't shift daily). L2 changes per-task (each blueprint is different). L3 changes constantly (code evolves). If you find yourself updating CLAUDE.md frequently, the information probably belongs at a lower layer.

Pointers over content — When L1 needs to reference complex information, point to it rather than inlining it. "See design/DESIGN.md for module boundaries" is better than copying the module list into CLAUDE.md. The agent loads L2/L3 when needed.

Diagnosing Layer Problems

| Symptom | Likely cause | Fix | |---------|-------------|-----| | Agent forgets project architecture mid-task | L1 too thin or missing | Add architectural context to CLAUDE.md | | Agent drowns in context, slow responses | L1 too thick — L3 details leaking up | Audit CLAUDE.md, move details to L2/L3 files | | Agent breaks module boundaries | No design docs or CLAUDE.md lacks boundaries | Add design/ or architectural section to CLAUDE.md | | Agent loads unnecessary files | Skill body has too many inline references | Split into supporting files, load on demand | | Agent repeats same mistakes | Missing hook or missing L1 principle | Add a hook (mechanical) or CLAUDE.md rule (judgment) |

Part II: Agent Lifecycle

How the agent works across time.

Succession over persistence

Every agent instance is ephemeral — it lives for one session, then its context is gone. Don't fight this. Design for succession: knowledge survives through artifacts, not through any single agent's memory.

The unit of continuity is the artifact chain, not the agent instance. L1 and L2 artifacts (CLAUDE.md, design docs, blueprints) are the institutional memory that outlives every session. Commit messages are the archaeological record. Blueprint State sections are handoff documents from one generation to the next. Verification criteria are how the next generation trusts the previous one's work.

To give an "agent" a longer effective lifecycle, don't extend the session — raise the abstraction level. An agent operating at L1 (architecture) spans the lifetime of the project. An agent operating at L3 (implementation details) lives and dies within one task. The layers aren't just about context efficiency — they're about temporal scope.

One task, one context

A single task should fit within one context window. If it can't, it's two tasks. This is the fundamental unit of agent work — each task gets a focused context with only the information it needs, preventing earlier work from polluting later decisions. When scoping tasks, ask: can the agent complete this without its context degrading?

Hooks — lifecycle guardrails

Hooks shape agent behavior from outside the context window — always active, zero-cost in tokens. Two flavors:

Prompt hooks — inject a reminder, let the agent apply judgment. Best for checks that need context awareness (layer integrity, consistency, architectural boundaries).
Script hooks — run a command, pass or block mechanically. Best for checks that don't need judgment (linting, format validation, forbidden patterns).

Consistency after change

When you change something that other files reference — a path, a name, a term, a structure — check every file that depends on it. Stale references are a common failure mode: you rename a directory but leave old paths in SKILL.md, change a convention but leave the old wording in CLAUDE.md. A prompt hook that reminds "did you update everything that references what you just changed?" is one of the highest-value hooks you can add to a project.

Part III: Designing for finite human bandwidth

Agent throughput keeps rising; human review capacity doesn't. Once agents start producing more output than a human can scan in detail, the bottleneck shifts from "can the agent do it" to "can a human catch what matters before the wrong thing ships." Skills and outputs that ignore this constraint scale until exactly the moment they fail.

The principle: agent outputs intended for human review should be skeleton-grade, with details treated as cheaply replaceable. Like code architecture, the human's job is to ensure the skeleton holds — not to inspect every line. Modules can be rewritten with low cost as long as the architecture is right.

This is the meta-pattern the existing methodology skills already practice without naming it:

design-driven's 30/70 split — humans curate the 30% that defines shape; agents own the 70% within. The 70% is throwable if shape holds.
reframe's skeleton vs flesh — review happens at the skeleton level (essence, primitives, projection logic); flesh details project from the skeleton and can be redrawn cheaply if user comprehension fails.
goal-driven's GOAL.md / record / stories asymmetry — humans line-edit the compass; the record is append-only event log; stories are opt-in narrative. Volume is asymmetric and review attention is asymmetric to match.
evidence-driven's State as evidence trail — auditable on demand, but agent-maintained at execution layer, not human-reviewed line-by-line.

Practical consequences when designing a new skill or output

Identify the skeleton early. What 20% of the output, if wrong, would make the rest worthless? That's what humans must see prominently. Everything else is detail.
Make details collapsible, not omitted. Tables, append-only logs, <details> blocks, references-on-demand. Detail is not the problem — detail forced into the human's primary view is.
Make detail-level mistakes cheap. If a detail is wrong, the fix should be local. If correcting one detail forces re-review of the whole output, the architecture is wrong, not the detail.
Treat thoroughness as a non-strategy at scale. "The human should review everything" predicts review fatigue, then either rubber-stamping or bottleneck. Pick what humans must catch and design the output's shape to surface exactly that.

The rule of thumb

If agent output volume grows linearly with agent work but the must- review volume grows sub-linearly, the design is on track. If they grow together, the skill is preparing the human for review failure — sooner or later the human starts skimming the architecture too, and the whole collaboration loses its safety net.

Why this matters more over time

The skeleton/detail split has always existed in software architecture (interfaces vs implementations, public APIs vs internals, design docs vs code). What's new is the ratio. When humans wrote all the code, the ratio of skeleton-level decisions to detail-level decisions roughly matched what humans could review. With agents producing the detail, that ratio explodes — and a discipline that was implicit becomes load-bearing. Skills that don't internalize this principle will scale only until human review caves under the volume.

Meta-principle

Understand why, not just what

An agent that understands the reasoning behind a constraint exercises better judgment in novel situations than one following a rigid rule. When writing any harness artifact, explain the why — it costs a few extra tokens but compounds into better decisions across every task.

"If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize — to apply broad principles rather than mechanically following specific rules." — Anthropic's constitution

lidessen/harness

skills/harness/SKILL.md

Agent harness architecture — structure a project's agent context across layers for effective AI-assisted development. Covers CLAUDE.md, skills, design docs, hooks, and all artifacts that shape how an agent understands and operates in a codebase. Use when setting up or improving agent configuration, when agent context feels bloated or disorganized, when onboarding a project for AI-assisted development, or when the agent keeps losing architectural awareness mid-task. Triggers on "set up claude", "improve CLAUDE.md", "agent keeps forgetting", "context is too long", "harness setup", "organize agent context", "how should I structure my prompts". Args — `/harness audit` to evaluate an existing project's context architecture, `init` to set up harness from scratch.

development

Updated May 9, 2026

$ install --global

skillsauth

npx skillsauth add lidessen/skills harness

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 9, 2026, 8:09 AM166.6s3 files scanned

SKILL.md

name:: harness
description:: Agent harness architecture — structure a project's agent context across layers for effective AI-assisted development. Covers CLAUDE.md, skills, design docs, hooks, and all artifacts that shape how an agent understands and operates in a codebase. Use when setting up or improving agent configuration, when agent context feels bloated or disorganized, when onboarding a project for AI-assisted development, or when the agent keeps losing architectural awareness mid-task. Triggers on "set up claude", "improve CLAUDE.md", "agent keeps forgetting", "context is too long", "harness setup", "organize agent context", "how should I structure my prompts". Args — `/harness audit` to evaluate an existing project's context architecture, `init` to set up harness from scratch.
argument-hint:: [audit | init]

Harness Architecture

Commands

When invoked with an argument, dispatch to the corresponding file:

/harness audit → Read and follow commands/audit.md in this skill directory. Evaluate an existing project's context architecture and suggest improvements.
/harness init → Read and follow commands/init.md in this skill directory. First-time project setup — bootstrap a project's harness from scratch.
No argument → Continue with the methodology below.

Part I: Context Architecture

How to structure what the agent knows.

The Three Layers

Every piece of information an agent might need belongs at one of three abstraction levels:

┌─────────────────────────────────────────────────────┐
│  L1  Architecture                                   │
│  System shape, boundaries, invariants, principles   │
│  Always in context. Small, stable, high-leverage.   │
│  ≈ 100–500 tokens per artifact                      │
├─────────────────────────────────────────────────────┤
│  L2  Design                                         │
│  Patterns, mechanisms, approach, task plan           │
│  Loaded on activation. The working blueprint.       │
│  ≈ 1000–5000 tokens per artifact                    │
├─────────────────────────────────────────────────────┤
│  L3  Implementation                                 │
│  Concrete code, scripts, reference data, examples   │
│  Loaded on demand. The raw material.                │
│  Size varies — only what's needed right now          │
└─────────────────────────────────────────────────────┘

The higher the layer, the smaller and more stable it is. L1 gives the agent orientation. L2 gives it a plan. L3 gives it the details to execute.

Mapping Artifacts to Layers

L1 (always present)          L2 (on activation)         L3 (on demand)
─────────────────────        ──────────────────────      ──────────────
CLAUDE.md                    Skill body (SKILL.md)       scripts/
Skill metadata               design/DESIGN.md            references/
  (name + description)       blueprints/                 assets/
Hook triggers                Task plans                  Code files
Project-level invariants     Decision records            Test fixtures

CLAUDE.md — the L1 anchor

CLAUDE.md is the most critical L1 artifact. It's always loaded, so every token must earn its place. A good CLAUDE.md contains:

What this system is — one sentence
How to build/test/run — the commands, nothing more
Architectural shape — module boundaries, data flow, key patterns (or a pointer to design/ if using design-driven)
Non-obvious conventions — things the agent can't derive from code

A bad CLAUDE.md contains: file-by-file breakdowns (agent can read the tree), generic best practices (agent already knows), implementation details that change frequently (belongs in L2/L3).

Litmus test: if removing a line from CLAUDE.md wouldn't cause the agent to make a worse architectural decision, the line doesn't belong.

Skills — L1 metadata, L2 body, L3 files

A skill naturally spans all three layers:

L1: name + description in frontmatter (~100 tokens). Loaded at startup for all installed skills. This is how the agent decides whether to activate a skill — make it precise.
L2: The markdown body of SKILL.md (<5000 tokens). Loaded when activated. Contains the methodology, the loop, the principles.
L3: Supporting files (commands/, scripts/, references/). Loaded only when the skill dispatches to them.

Keep SKILL.md under 500 lines. If it's longer, something belongs in L3.

Context Principles

Diagnosing Layer Problems

Part II: Agent Lifecycle

How the agent works across time.

Succession over persistence

One task, one context

Hooks — lifecycle guardrails

Hooks shape agent behavior from outside the context window — always active, zero-cost in tokens. Two flavors:

Prompt hooks — inject a reminder, let the agent apply judgment. Best for checks that need context awareness (layer integrity, consistency, architectural boundaries).
Script hooks — run a command, pass or block mechanically. Best for checks that don't need judgment (linting, format validation, forbidden patterns).

Consistency after change

Part III: Designing for finite human bandwidth

This is the meta-pattern the existing methodology skills already practice without naming it:

design-driven's 30/70 split — humans curate the 30% that defines shape; agents own the 70% within. The 70% is throwable if shape holds.
reframe's skeleton vs flesh — review happens at the skeleton level (essence, primitives, projection logic); flesh details project from the skeleton and can be redrawn cheaply if user comprehension fails.
goal-driven's GOAL.md / record / stories asymmetry — humans line-edit the compass; the record is append-only event log; stories are opt-in narrative. Volume is asymmetric and review attention is asymmetric to match.
evidence-driven's State as evidence trail — auditable on demand, but agent-maintained at execution layer, not human-reviewed line-by-line.

Practical consequences when designing a new skill or output

Identify the skeleton early. What 20% of the output, if wrong, would make the rest worthless? That's what humans must see prominently. Everything else is detail.
Make details collapsible, not omitted. Tables, append-only logs, <details> blocks, references-on-demand. Detail is not the problem — detail forced into the human's primary view is.
Make detail-level mistakes cheap. If a detail is wrong, the fix should be local. If correcting one detail forces re-review of the whole output, the architecture is wrong, not the detail.
Treat thoroughness as a non-strategy at scale. "The human should review everything" predicts review fatigue, then either rubber-stamping or bottleneck. Pick what humans must catch and design the output's shape to surface exactly that.

The rule of thumb

Why this matters more over time

Meta-principle

Understand why, not just what

"If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize — to apply broad principles rather than mechanically following specific rules." — Anthropic's constitution

Related Skills

lidessen/setup-lidessen-skills

testing

VerifiedTrustedCommunity

Operational deployer for the lidessen skills collection — wires harness config (CLAUDE.md / AGENTS.md / .cursor/) in a target project, injects cross-cutting principles (e.g. principal contradiction first), and reconciles when lidessen evolves. Triggers on "/setup-lidessen-skills", "set up lidessen skills", "wire lidessen into this project", "sync lidessen principles", "install lidessen skills". Use after cloning or symlinking lidessen skills into a project, when adopting the collection, or when lidessen has new content the project hasn't picked up. Args — `init` to scaffold, `sync` to re-align with current lidessen, `audit` to check drift without writing. Pairs with harness (portable methodology); this is the lidessen-specific application layer.

SKILL.mdUpdated May 9, 2026

lidessen/setup-lidessen-skills

lidessen/reframe

development

VerifiedTrustedCommunity

Designing in territory where the industry is still groping for shape — AI-native systems, agent-first interfaces, any domain whose category is forming. Triggers on "AI native X", "agent-first X", "redefine X", "rebuild X from scratch under Y", "reframe X for Y", "what should X look like in the new paradigm", "design a system with no precedent", or the tension between "new shoes on the old path" and "a skeleton that holds on its own". Method — strip to 3-5 abstract functions, redraw the load-bearing skeleton from the new paradigm's primitives, stress-test without traditional crutches, then add familiar flesh as projection. Do NOT trigger for incremental redesigns within an existing paradigm (use design-driven), explanatory writing (use technical-article-writing), or vague "make it AI" requests. Pairs with design-driven (upstream) and goal-driven (parallel). Args — `/reframe init`, `close`, `explain [for <audience>]`.

SKILL.mdUpdated May 9, 2026

lidessen/goal-driven

development

VerifiedTrustedCommunity

Goal-driven methodology for multi-week initiatives where the destination is clearer than the path — GOAL.md as stable compass (General Line plus falsifiable success criteria), record captures what was tried and observed. Triggers on "set a goal", "track my progress on X", "this is exploratory", "I know the goal but not the path", or starting a months-long initiative without a clear technical shape. Use for research, exploratory features, learning projects with a shippable output, book/article series, job search, side-business launches. Do NOT trigger for single-task work, bug fixes, week-long features with a clear plan, vague aspirations ("be healthier"), habit tracking, or general life management. Pairs with design-driven (why/how-far vs what-shape) and runs parallel to reframe. Args — `/goal-driven set`, `review`, `close`.

SKILL.mdUpdated May 8, 2026

lidessen/evidence-driven

development

VerifiedTrustedCommunity

Evidence-driven methodology for the execution layer — every claim of progress requires a falsifiable observation; "looks right to me" is rejected. Use for production code, regression-prone systems, or any task where build-time discipline materially affects outcome quality. Triggers on "set up TDD", "build discipline", "no progress without evidence", "test-first", "verify rigorously", "production code workflow". Do NOT trigger for prototypes, exploratory spikes, throwaway scripts, or doc-only changes. Pairs with design-driven (which defines what to verify; evidence-driven defines how) — each works alone. Args — `/evidence-driven init` to wire up agent configs and optional pre-commit hooks. No periodic-audit command; it's an always-on overlay.

SKILL.mdUpdated May 8, 2026

lidessen/evidence-driven

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lidessen/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/harness ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lidessen/skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT