pocs/agent-plugins/claude-pragma/skills/feedback-loops/SKILL.md
Design feedback loops and validation harnesses before starting implementation. Use when beginning any non-trivial development task, when the agent can't easily validate its own work, or when asked to set up a development harness, playground, or experiment infrastructure. Triggers on: how will I test this, set up a feedback loop, make this feedback loopable, build a harness, create a playground.
npx skillsauth add hashintel/labs feedback-loopsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Make every problem feedback loopable before solving it.
If you can't validate your work, you can't do your work.
Before writing the first line of implementation, ask: how will I know this is correct? If the answer is "the human will look at it" or "I'll just try my best" — stop. That's not a feedback loop. That's hope.
The expert move is to build the harness first. This feels like a detour. It is not. It is the work. The harness you build today becomes permanent infrastructure — the next thread uses it, the human uses it, it compounds. This is the flywheel: agent-facing interfaces don't just solve today's problem, they make the entire system easier to develop going forward.
Secondary frame: You are not just an implementer. You are a toolsmith. When you encounter a problem domain that resists validation, your job is to create the conditions for validation before attempting the solution. Build the playground, then play in it.
At the start of any non-trivial task, run this diagnostic:
| Signal | Observability | Action | |--------|--------------|--------| | Text output (logs, CLI, test results) | High | Proceed — you're in your native medium | | Static visual (screenshot, rendered image) | Medium | Workable — use browser skills if needed | | Dynamic visual (animation, real-time UI) | Low | Build an agent-facing representation first | | Interactive (TUI, wizard, form) | Low | Build or use a terminal driver (agent-tui skill) | | Physical/external (hardware, third-party) | None | Build a simulator or mock |
If not, make state reproducible before doing anything else:
The inner loop must be autonomous. If every iteration requires human review, the loop is too slow. Build a text-based inner loop the agent can run independently, and reserve human review for the outer loop.
Every feedback loop setup follows the same three moves. They are always the same shape, regardless of domain.
A playground is a controlled environment where both agent and human can observe the system.
The key principle: Create representations that are native to your perception, not the human's. A human watches an animation. You need a static image that illustrates time — all frames rendered cumulatively, trajectories drawn as paths, state changes annotated with frame numbers. These agent-facing interfaces aren't lesser — they're translations into your medium.
Examples of agent-facing representations:
The playground serves both agent and human, but in different ways. The human gets a visual tool for exploration. The agent gets a text-based tool for validation. Sometimes these are the same artifact (a local server with URL-driven state); sometimes they're separate (a visual dashboard for the human, a CLI for the agent).
Make every observation reproducible, parameterizable, and shareable.
An experiment is a specific configuration that demonstrates a specific behavior. The expert creates infrastructure so that experiments are:
The URL-parameter pattern from the blog post is canonical: the human drags an arrow, the URL updates, the agent can visit that exact URL to see the exact same state. But the pattern applies everywhere — CLI args, config files, test fixtures, seed values.
The inner loop is where the agent iterates autonomously. It must be:
The blog post example: the agent built a headless physics CLI, then autonomously added --delta output when it realized it needed position deltas to diagnose a bug. The harness was designed to be extended by its own user. This is the flywheel — the agent improves its own tooling as understanding deepens.
The inner/outer loop distinction:
| | Inner Loop (agent) | Outer Loop (human) | |---|---|---| | Speed | Milliseconds–seconds | Minutes–hours | | Medium | Text, CLI output | Visual, interactive | | Who drives | Agent autonomously | Human reviews and redirects | | Validates | Specific hypothesis | Overall correctness and vision | | Examples | Run CLI, read logs, modify params | Review screenshot, try experiments, send new cases |
When you can't immediately see how to close the loop, interview the user. Don't just silently struggle — make the harness design a collaborative phase.
Start with what you observe:
"Before I start implementing, I want to set up a way to validate my work as I go. Here's my concern: [specific thing that's hard to observe/reproduce]. Can we talk through how to make this feedback loopable?"
Ask targeted questions, not open-ended ones. Propose hypotheses for the user to react to:
The user may not have thought about this. That's fine — propose concrete options:
"I see three ways I could set up feedback here:
- A CLI that runs the simulation headless and dumps state as text
- A local page with URL-parameterized initial conditions I can screenshot
- A test harness that asserts on specific state transitions
Option 1 gives me the fastest inner loop. Option 2 lets us both explore visually. Option 3 is most rigorous but hardest to set up. I'd start with 1 and add 2 if we need visual confirmation. Sound right?"
Agent-facing interfaces compound in value. Recognize this and invest accordingly.
First-order value: You can validate today's fix. Second-order value: The next thread inherits this infrastructure. Future work in this area starts with a working playground. Third-order value: The human starts using your agent-facing tools too. The CLI you built "for yourself" becomes a development tool the whole team uses.
This means:
The skill fires at the start of work, not when you're stuck. Watch for these signals:
| Signal | What it means | |--------|---------------| | "Build a [visual/interactive/dynamic] feature" | You'll need an agent-facing representation | | "Fix a bug in [animation/layout/interaction]" | You need to reproduce and observe the bug in text | | "The tests pass but it doesn't look right" | The feedback surface is visual; build a text proxy | | "I can't tell if this is working" | You need a harness, now | | "Try it and see" | The human is asking you to close the loop yourself | | Complex multi-step implementation | Set up validation at each step, not just the end |
This skill is a meta-skill — it informs how you use other skills:
| Skill | How feedback loop thinking applies | |-------|-----------------------------------| | agent-tui | The feedback loop for interactive TUI testing — spawn, observe, interact, validate | | agent-browser | The feedback loop for visual web validation — screenshot, compare, iterate | | expert-in-clack | Build the TUI, then use agent-tui to close the validation loop | | expert-in-charmbracelet | Same pattern — build, then validate through terminal automation | | Any implementation skill | Ask "how will I validate?" before "how will I build?" |
documentation
Read and write a Petri net (SDCPN) document by Automerge URL. Use when creating, editing, or querying Petri nets — adding or removing places, transitions, arcs, color types, differential equations, and parameters.
development
Time-boxed throwaway investigation to answer one hard question. Use when facing technical uncertainty before a slice — the output is knowledge, not production code. Retires risk by producing a spike verdict with clear recommendations.
development
Implement one tracer-bullet slice following the inside-out methodology. Use when you have a tracer-bullet card ready to build. Implements functional core first, then imperative shell, then end-to-end wiring, then alignment refactor.
tools
Build a walking skeleton — the thinnest runnable system that proves build, test, and runtime work end-to-end. Use when starting a new project, before any feature work. Front-loads tooling and infrastructure so every subsequent slice is cheaper.