skills/harness-engineering/SKILL.md
Design and implement harnesses for AI-agent-driven software development. A harness is the scaffolding—context engineering, architectural constraints, feedback loops, and entropy management—that enables coding agents to produce reliable, maintainable software at scale. Use when assessing a codebase's agent-readiness, building repository knowledge bases, designing layered architectures with mechanical enforcement, setting up application legibility (observability, browser automation), implementing "garbage collection" agents, or adopting an agent-first development workflow. Triggers: "harness engineering", "agent-first development", "harness", "agent-readiness", "agent legibility", "repository knowledge base", "coding agent scaffolding", "AI-maintainable codebase", "garbage collection agents", "context engineering for agents", "architectural enforcement", "structural tests", "custom linters for agents", "agent autonomy", "harness assessment".
npx skillsauth add arisng/github-copilot-fc harness-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design environments, feedback loops, and control systems that enable AI coding agents to produce reliable software at scale. A harness is everything around the code that keeps agents effective: structured knowledge, mechanical constraints, application legibility, and entropy management.
Core insight: When agents struggle, the fix is almost never "try harder." Ask: what capability is missing, and how do we make it both legible and enforceable for the agent?
Variant: harness-engineering-copilot — strategies and patterns specialized for GitHub Copilot agent customization.
Sources: OpenAI — Harness Engineering | Martin Fowler — Harness Engineering
| Component | What it does | Deterministic? | |---|---|---| | Context Engineering | Curated, structured knowledge base in the repo + dynamic context (observability, browser) | Mixed | | Architectural Constraints | Layered architecture enforced by custom linters and structural tests | Deterministic | | Garbage Collection | Periodic agents that detect drift, stale docs, and constraint violations | LLM-based |
Before building, audit what exists. Run through this checklist with the user:
AGENTS.md or equivalent entry point? Is it a concise map (~100 lines) or a bloated monolith?docs/ directory with architecture, design docs, product specs, and plans? Are they versioned and cross-linked?Rate each 0–2 (absent / partial / enforced). Total ≤ 5 = early stage; 6–8 = developing; 9–12 = mature.
The repository is the agent's only world. Anything not accessible in-context effectively does not exist.
Anti-pattern: One giant AGENTS.md with everything. It crowds out task context, rots instantly, and is unverifiable.
Pattern: AGENTS.md as table of contents (~100 lines), pointing to structured docs/:
AGENTS.md ← map, not manual
ARCHITECTURE.md ← top-level domain + layer map
docs/
├── design-docs/ ← indexed, with verification status
│ ├── index.md
│ └── core-beliefs.md
├── exec-plans/ ← active, completed, tech-debt
│ ├── active/
│ └── completed/
├── product-specs/ ← requirements + acceptance criteria
├── references/ ← external docs, llms.txt files
├── DESIGN.md
├── FRONTEND.md
├── QUALITY_SCORE.md
├── RELIABILITY.md
└── SECURITY.md
Key principles:
Extend what agents can observe beyond static files:
Agents replicate patterns that exist in the repository — including bad ones. Mechanical enforcement prevents drift.
Divide each business domain into fixed layers with strictly validated dependency directions:
Types → Config → Repo → Service → Runtime → UI
↑
Providers (auth, connectors, telemetry, feature flags)
| Mechanism | Purpose | Examples | |---|---|---| | Custom linters | Catch violations with remediation instructions in error messages | Dependency direction, structured logging, naming conventions, file size limits | | Structural tests | Validate architecture invariants | ArchUnit-style tests, import graph validation | | Pre-commit hooks | Block violations before they enter the repo | Schema validation, boundary checks | | CI validation | Catch what slips through locally | Full lint + test suite, doc freshness checks |
Critical detail: Write custom lint error messages that include remediation instructions. These messages become agent context when violations occur, teaching the agent how to fix issues.
Be prescriptive about boundaries (e.g., "parse data shapes at the boundary"), but not about specific tools. Specify what, let agents decide how within boundaries.
Full agent autonomy introduces entropy. Without active management, drift compounds.
Define opinionated, mechanical rules that keep the codebase legible:
Schedule recurring background tasks that:
Cadence: Daily or per-sprint. Treat like garbage collection — continuous small increments, not painful bursts.
Maintain a quality document that grades each product domain and architectural layer, tracking gaps over time. This gives both humans and agents a map of where debt lives.
When adopting harness engineering, the development loop changes:
Human: Define task → Write acceptance criteria
↓
Agent: Validate codebase state → Implement → Self-review → Request agent reviews
↓
Agent: Respond to feedback → Iterate until reviewers satisfied
↓
Agent: Detect/remediate build failures → Escalate only when judgment required
↓
Merge (minimal blocking gates — corrections are cheap, waiting is expensive)
Humans work at a different abstraction layer: prioritize work, translate user feedback into acceptance criteria, validate outcomes. When agents struggle, identify what's missing and feed it back into the repo.
Not every technique applies to brownfield code. Assess feasibility:
| Technique | Retrofit difficulty | Start here | |---|---|---| | AGENTS.md + docs/ structure | Low | Create entry point, incrementally document | | Custom linters | Medium | Start with 2–3 high-value rules | | Structural tests | Medium | Enforce module boundaries first | | App legibility (CDP, observability) | High | Requires infra investment | | Full garbage collection agents | High | Start with doc-gardening only |
Pragmatic approach: Start with context engineering (cheapest, highest ROI), add architectural constraints incrementally, garbage collection last.
For organizations with 2–3 main tech stacks, consider packaging harnesses as reusable templates:
This parallels "golden path" service templates, but optimized for agent-driven development.
devops
Programmatically create tldraw whiteboards and visualize them with a self-hosted tldraw instance. Create boards with shapes, text, and connectors, then deploy to a self-hosted server for collaborative editing and gallery management.
tools
Execute Google Cloud Platform operations using the gcloud CLI (and gsutil/bq where applicable). Use when the user wants to: authenticate with GCP, manage GCP resources, deploy applications, configure projects or IAM, view logs, run SQL/BigQuery, or interact with any GCP service from the command line. Triggers on phrases like "gcloud", "Google Cloud CLI", "deploy to GCP", "create a VM", "Cloud Run", "GKE cluster", "Cloud Storage bucket", "set GCP project", "service account", "Cloud Functions", "App Engine deploy", or any request to manage Google Cloud resources via command line.
testing
Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
development
Session-scoped git commit orchestrator that commits only current-session changes and leaves unrelated dirty worktree edits untouched. Inherits git-atomic-commit for atomic grouping and commit message execution, and git-commit-scope-constitution for scope governance and validation. Use when asked to commit this session only or isolate commits from mixed worktree state.