plugins/plugin-creator/skills/arl/SKILL.md
Knowledge reference for Autonomous Refinement Loop research — pattern research into prerequisites for autonomous execution without synchronous human blocking gates. Defines failure categories, prerequisites, and conditions for replacing human judgment with machine-verifiable checks. Use when designing or evaluating autonomous agent loops, gate conditions, or HOOTL execution patterns.
npx skillsauth add jamie-bitflight/claude_skills arlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous Refinement Loop (ARL) is pattern research into what an AI assistant needs — in information, tools, verification mechanisms, access to external resources, and knowledge of past failures — to produce outcomes that match the user's vision without requiring the human to be a synchronous blocking gate during execution.
The foundational question:
What determines whether an AI can produce a satisfactory outcome for a given piece of work, and how do we ensure those prerequisites are met before and during execution?
ARL is not a process to run. It is a body of research that informs how processes (like SAM) should be designed, and what conditions enable autonomous execution.
SOURCE: Autonomous Refinement Loop
This document provides:
When working on improvements, refinement, or autonomous execution tasks, consult this reference to understand what prerequisites must be in place, what could fail without them, and how the gates work together.
DESIGN GOAL — The concept describes the desired outcome of ARL research.
HOOTL means achieving human-in-the-loop outcome quality with human-out-of-the-loop execution.
Breaking this down:
The quality bar is HOOTL success: the artifact meets the same acceptance criteria as if a human reviewed every intermediate step, but the human did not have to be present synchronously to approve that work.
SOURCE: HOOTL: Human Out Of The Loop
DESIGN GOAL — This architecture describes how HOOTL execution is designed. The research body is observed and evidence-backed. The execution model and observation layer are design goals being researched.
The empirical findings from cross-framework analysis:
How HOOTL execution works in practice:
Passive agents monitoring execution in real-time:
agentskill-kaizen is the current implementation of this layer in post-hoc mode (mining historical transcripts after sessions complete). The ARL vision extends it to real-time observation during execution.
SOURCE: Three Layers of ARL
The three together form a research cycle: ARL hypothesizes, SAM applies, agentskill-kaizen validates.
SOURCE: Relationship Triangle
ARL researches what it takes to move AI-human interactions from blocking, high-friction to asynchronous, low-friction. The spectrum, from worst to best:
The goal is moving as many interactions as possible to level 4. Level 4 is HOOTL: the human gets quality outcomes without being a synchronous blocking gate.
SOURCE: Interaction Spectrum
These gates formalize the machine-verifiable conditions that replace human judgment at key points in an iterative refinement loop.
| Gate | What It Checks | When It Fires | What Failure Looks Like | |------|----------------|---------------|------------------------| | R1: Information Completeness | Sufficient context to operate loop without escalation | Loop entry, re-entry after escalation | Loop proceeds with gaps, agent hallucinate-fills missing information, produces fluent but wrong artifacts | | R2: Loop Detection | Oscillating, stalling, or exceeding resource bounds | Start of each iteration before assessment | Loop runs indefinitely without converging, fix A breaks B repeatedly, same findings recurring | | R3: Validity Filtering | Findings have verifiable evidence (file:line citations) | After assessment, before planning | False positives consume iteration budget, regressions introduced, phantom issues trigger changes | | R4: Plan Quality | Plan internally consistent, addresses actual findings | After planning, before implementation | Inconsistent plan proceeds, addresses wrong findings, changes must be reverted | | R5: Purpose Anchor | Artifact still serves original stated purpose | Captured at iteration 0, checked each iteration | After N iterations, artifact optimized for assessment metrics but no longer serves original use case | | R6: Content-Loss Detection | All semantic units preserved after changes | After implementation, before next iteration | Refactoring removes sections deemed "redundant", no gate catches removal, human discovers loss later | | R7: Convergence Tracking | Findings decreasing, stable, or alternating across iterations | Each iteration boundary after assessment | Loop cannot determine progress, fixes trivial issues indefinitely, or oscillates without converging | | R8: Proportionality Check | Proposed fix proportional to finding severity | During plan quality gate (R4) | Low-severity finding triggers high-scope change that introduces risk without proportional benefit | | R9: Downstream Impact | All references still resolve after changes | After implementation, alongside R6 | Refactoring renames a file, breaks three other components linking to old path, not detected until runtime | | R10: Split Justification | New component independently viable, not just parent-dependent | When plan proposes splitting content into separate artifacts | Component split into three pieces, two only used from parent, adds navigation complexity without value |
SOURCE: The 10 Gates
| Coverage Level | R-Requirements | What Exists Today | |---|---|---| | Import directly | R1, R3, R4 | RT-ICA (SAM), GAN-inspired validation (Octocode/BMAD), 7-dimension plan checking (GSD) | | Partial coverage | R2, R5, R9 | Bounded iteration count (GSD), objective injection (Ralph), downstream impact analysis (Octocode) | | Build from scratch | R6, R7, R8, R10 | No framework provides content-loss detection, convergence tracking, proportionality checks, or split justification |
The 4 build-from-scratch requirements all emerge specifically from iterative refinement — they are invisible to single-pass pipeline designs.
SOURCE: Gate Coverage Across Existing Frameworks
A human gate can potentially be replaced by machine-verifiable conditions when ALL of the following hold:
When ANY of these conditions fails, the gate requires either human judgment or a more sophisticated verification mechanism (adversarial review, cross-examination between independent agents, or escalation).
Evidence status: These 4 conditions were synthesized from cross-framework evidence. They correlate with gates classified as eliminable, but have not been tested as a predictive model.
SOURCE: Decision Tree
Seven patterns discovered across all 6 frameworks that apply to any autonomous development system:
SOURCE: Universal Principles
The same type of human judgment can be eliminable in one context and irreducible in another. The determining factors are:
| Scope Clarity | Goal Measurability | Data Enumeration | Human Gates Eliminable? | |---|---|---|---| | High (specific tool, platform, use case) | High (binary pass/fail, checklist) | High (official docs, known examples) | Yes — Autonomous loop feasible | | Medium (domain-specific best practices) | Medium (scoring with weights) | Medium (reference examples, community patterns) | Partial — Autonomous with periodic human checkpoints | | Low (general improvement, meta-goals) | Low (subjective, emergent criteria) | Low (unknown what "complete" means) | No — Requires human at each decision point |
This means ARL cannot be applied uniformly. A scope-classification step must precede any attempt at autonomous operation.
SOURCE: Key Findings
Complete detail on each gate, framework patterns, and prerequisites:
development
When an application needs to store config, data, cache, or state files. When designing where user-specific files should live. When code writes to ~/.appname or hardcoded home paths. When implementing cross-platform file storage with platformdirs.
testing
Enforce mandatory pre-action verification checkpoints to prevent pattern-matching from overriding explicit reasoning. Use this skill when about to execute implementation actions (Bash, Write, Edit) to verify hypothesis-action alignment. Blocks execution when hypothesis unverified or action targets different system than hypothesis identified. Critical for preventing cognitive dissonance where correct diagnosis leads to wrong implementation.
tools
Reference guide for the Twelve-Factor App methodology — 15 principles (12 original + 3 modern extensions) for building portable, resilient, cloud-native applications. Use when evaluating application architecture, designing cloud-native services, reviewing codebases for methodology compliance, advising on configuration, scaling, observability, security, and deployment patterns. Incorporates the 2025 open-source community evolution and cloud-native reinterpretations of each factor.
tools
Converts user-facing documentation (how-to guides, tutorials, API references, examples) in any format — Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, man pages, TOML/YAML/JSON configs, and plain text — into Claude Code skill directories with SKILL.md plus thematically grouped references/*.md files. Use when given a docs directory or mixed-format documentation to transform into an AI skill. Uses MCP file-reader server for binary formats.