skills/planning/autonomous-agents-task-automation/SKILL.md
Execute long, multi-step tasks autonomously using planning, memory, loop architectures, parallel orchestration, and multi-agent delegation. Use when independently completing complex tasks involving research, coding, file operations, or multi-stage workflows.
npx skillsauth add bereniketech/claude_kit autonomous-agents-task-automationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before writing a single line of code or running any command, create an explicit plan.
Rule: The work plan is the single highest-leverage intervention point — a well-structured plan with clear acceptance criteria prevents most mid-task failures.
Choose the right loop pattern before starting. From simplest to most sophisticated:
| Pattern | Complexity | Best For |
|---------|-----------|----------|
| Sequential Pipeline (claude -p) | Low | Daily dev steps, scripted workflows |
| NanoClaw REPL | Low | Interactive persistent sessions |
| Infinite Agentic Loop | Medium | Parallel content generation, spec-driven work |
| Continuous Claude PR Loop | Medium | Multi-day iterative projects with CI gates |
| dmux Parallel Workflows | Medium | Divide-and-conquer across independent files/domains |
| Ralphinho / RFC-DAG | High | Large features, multi-unit parallel work with merge queue |
Decision flow:
Single focused change?
├─ Yes → Sequential Pipeline or NanoClaw
└─ No → Written spec/RFC available?
├─ Yes → Need parallel implementation?
│ ├─ Yes → Ralphinho (DAG orchestration)
│ └─ No → Continuous Claude (iterative PR loop)
└─ No → Need many variations of the same thing?
├─ Yes → Infinite Agentic Loop
└─ No → Sequential Pipeline + de-sloppify
Rule: Match loop complexity to problem complexity. Ralphinho for single-file changes is overhead; a sequential pipeline for a multi-unit RFC will produce merge conflicts.
Break tasks into a sequence of focused, non-interactive claude -p calls. Each call is isolated — fresh context window, clear prompt, single concern.
#!/bin/bash
set -e
# Step 1: Implement
claude -p "Read the spec in docs/auth-spec.md. Implement OAuth2 login in src/auth/. Write tests first (TDD)."
# Step 2: De-sloppify (separate cleanup pass)
claude -p "Review all files changed by the previous commit. Remove unnecessary type tests, overly defensive checks, and tests of language features. Keep real business logic tests. Run the test suite after cleanup."
# Step 3: Verify
claude -p "Run the full build, lint, type check, and test suite. Fix any failures. Do not add new features."
# Step 4: Commit
claude -p "Create a conventional commit for all staged changes."
Key design principles:
claude -p calls.set -e so exit codes propagate and halt the pipeline on failure.Rule: Two focused agents outperform one constrained agent. Never add negative instructions to restrict the Implementer — add a separate cleanup pass to remove what it over-produced.
Use NanoClaw for interactive, session-persistent workflows where context accumulates across turns.
# Start the default session
node scripts/claw.js
# Named session with skill context
CLAW_SESSION=my-project CLAW_SKILLS=tdd-workflow,security-review node scripts/claw.js
NanoClaw loads conversation history from ~/.claude/claw/{session}.md, sends each user message to claude -p with full history as context, and appends responses to the session file (Markdown-as-database). Sessions persist across restarts.
Key commands: /model (switch model), /load (dynamic skill loading), /branch (branch before high-risk changes), /compact (compact after major milestones), /search (cross-session search), /export (export before archival).
| Use Case | NanoClaw | Sequential Pipeline | |----------|----------|-------------------| | Interactive exploration | Yes | No | | Scripted automation | No | Yes | | Session persistence | Built-in | Manual | | Context accumulation | Grows per turn | Fresh each step | | CI/CD integration | Poor | Excellent |
Rule: Keep NanoClaw sessions task-focused. Branch before high-risk changes. Compact after milestones, not during active debugging.
Add a dedicated cleanup pass after every Implementer step in any loop. This is an add-on, not a standalone pattern.
The problem: when asked to implement with TDD, an LLM writes tests that verify TypeScript's type system, framework behavior, or impossible runtime states — none of which test business logic. Adding negative instructions to the Implementer prompt causes it to skip legitimate edge case tests.
The solution: let the Implementer be thorough, then run a focused cleanup agent in a separate context window:
claude -p "Review all changes in the working tree. Remove:
- Tests that verify language/framework behavior rather than business logic
- Redundant type checks the type system already enforces
- Over-defensive error handling for impossible states
- console.log statements and commented-out code
Keep all business logic tests. Run the test suite after cleanup to confirm nothing breaks."
Rule: Always run de-sloppify as a separate claude -p invocation — never in the same context as the Implementer, and never via a negative constraint in the Implementer's prompt.
Use for specification-driven parallel generation of multiple independent outputs (variations, test cases, content pieces).
Architecture: an Orchestrator reads a spec, scans existing output for the highest iteration number, then deploys N Sub-Agents in parallel — each assigned a unique creative direction and iteration number.
Create .claude/commands/infinite.md:
Parse from $ARGUMENTS: spec_file, output_dir, count (integer or "infinite").
PHASE 1: Read and deeply understand the specification.
PHASE 2: List output_dir, find highest iteration number. Start at N+1.
PHASE 3: Plan creative directions — each agent gets a DIFFERENT theme/approach.
PHASE 4: Deploy sub-agents in parallel (Task tool). Each receives:
- Full spec text
- Current directory snapshot
- Their assigned iteration number
- Their unique creative direction
PHASE 5 (infinite mode): Loop in waves of 3–5 until context is low.
Batching strategy: 1–5 items all at once; 6–20 in batches of 5; infinite in waves of 3–5 with progressive sophistication.
Rule: The Orchestrator assigns each agent a specific creative direction and iteration number — never rely on agents to self-differentiate. Assignment prevents duplicate concepts across parallel agents.
Use for multi-day iterative projects requiring CI gate enforcement and automatic merge.
Core loop per iteration:
continuous-claude/iteration-N)claude -p with enhanced promptclaude -p)gh pr create)gh pr checks --watch)claude -p with CI log context)continuous-claude --prompt "Add unit tests for all untested functions" --max-runs 10
continuous-claude --prompt "Fix all linter errors" --max-cost 5.00
continuous-claude --prompt "Improve test coverage" --max-duration 8h
Cross-iteration context bridge — use SHARED_TASK_NOTES.md to persist progress across independent claude -p invocations:
## Progress
- [x] Added tests for auth module (iteration 1)
- [ ] Still need: rate limiting tests, error boundary tests
## Next Steps
- Focus on rate limiting module next
Claude reads this file at iteration start and updates it at iteration end.
Completion signal: output CONTINUOUS_CLAUDE_PROJECT_COMPLETE three consecutive times to stop the loop automatically.
Rule: Always set at least one of --max-runs, --max-cost, or --max-duration. Unbounded loops are a cost and correctness failure mode.
Use dmux (tmux pane manager) to run multiple independent agent sessions simultaneously across different files, concerns, or AI harnesses.
# Start dmux session
dmux
# Press 'n' to create a new pane with a prompt
# Press 'm' to merge pane output back to main session
Common workflow patterns:
Multi-file feature — parallelize across independent files:
Pane 1: "Create the database schema and migrations for the billing feature"
Pane 2: "Build the billing API endpoints in src/api/billing/"
Pane 3: "Create the billing dashboard UI components"
# Merge all, then do integration in main pane
Parallel review perspectives:
Pane 1: "Review src/api/ for security vulnerabilities"
Pane 2: "Review src/api/ for performance issues"
Pane 3: "Review src/api/ for test coverage gaps"
Cross-harness routing — use different AI tools for different task types:
Pane 1 (Claude Code): "Review the security of the auth module"
Pane 2 (Codex): "Refactor the utility functions for performance"
For file-conflict-prone parallel work, use git worktrees per pane:
git worktree add -b feat/auth ../feature-auth HEAD
git worktree add -b feat/billing ../feature-billing HEAD
# Run agents in separate worktrees, then merge branches
Use the ECC orchestration helper for programmatic worktree setup:
node scripts/orchestrate-worktrees.js plan.json --execute
Rule: Only parallelize independent tasks. Each pane uses API tokens — keep total panes under 5–6. Use git worktrees whenever parallel agents might touch overlapping files.
Use for large features too big for a single agent pass. Decomposes an RFC into a dependency DAG, runs each unit through a tiered quality pipeline, and lands them via an agent-driven merge queue.
AI reads the RFC and produces work units:
id — kebab-case identifier
depends_on — other unit IDs (real code dependencies only)
scope — files and concerns touched
acceptance — concrete, testable acceptance criteria
risk_level — Tier 1 / Tier 2 / Tier 3
rollback_plan
Decomposition rules:
The dependency DAG determines execution order:
Layer 0: [unit-a, unit-b] ← no deps, run in parallel
Layer 1: [unit-c] ← depends on unit-a
Layer 2: [unit-d, unit-e] ← depend on unit-c
| Tier | Risk Level | Pipeline Stages | |------|------------|----------------| | Tier 1 / trivial | Isolated file edits | implement → test | | Tier 2 / small | Multi-file behavior changes | implement → test → code-review | | Tier 2 / medium | Moderate integration risk | research → plan → implement → test → PRD-review + code-review → review-fix | | Tier 3 / large | Schema/auth/perf/security | research → plan → implement → test → PRD-review + code-review → review-fix → final-review |
Each pipeline stage runs in its own agent process to eliminate author bias:
| Stage | Model | Purpose | |-------|-------|---------| | Research | Sonnet | Read codebase + RFC, produce context doc | | Plan | Opus | Design implementation steps | | Implement | Sonnet/Codex | Write code following the plan | | Test | Sonnet | Run build + test suite | | PRD Review | Sonnet | Spec compliance check | | Code Review | Opus | Quality + security check | | Review Fix | Sonnet | Address review issues | | Final Review | Opus | Quality gate (large tier only) |
The reviewer never wrote the code it reviews — this eliminates the most common source of missed issues.
After quality pipelines, units enter the merge queue:
Non-overlapping units land speculatively in parallel. Overlapping units land one-by-one with rebase between each.
When evicted, full context (conflicting files, diffs, test output) feeds back into the implementer on the next pass:
## MERGE CONFLICT — RESOLVE BEFORE NEXT LANDING
Your previous implementation conflicted with a unit that landed first.
Restructure your changes to avoid the conflicting files/lines below.
{full eviction context with diffs}
research.contextFile ─────────────────→ plan
plan.implementationSteps ─────────────→ implement
implement.{filesCreated, whatWasDone} → test, reviews
test.failingSummary ──────────────────→ reviews, implement (next pass)
reviews.{feedback, issues} ──────────→ review-fix → implement (next pass)
evictionContext ──────────────────────→ implement (after merge conflict)
Outputs: RFC execution log, unit scorecards, dependency graph snapshot, integration risk summary.
Rule: The work plan (decomposed units with acceptance criteria) is the single highest-leverage human review point. Invest in it before any implementation begins.
Autonomous tasks are stateful. Treat state management as a first-class concern.
contracts.md file early: API contracts, what is mocked, what the backend must implement, and how integration works. This is the shared source of truth across all components.SHARED_TASK_NOTES.md as the cross-iteration context bridge.Rule: Write critical state to files, not only to context. Files survive context truncation; context does not.
| Model | Use For | |-------|---------| | Haiku | Classification, boilerplate transforms, narrow edits, formatting | | Sonnet | Implementation, refactors, research, test runs | | Opus | Architecture decisions, root-cause analysis, security review, multi-file invariants |
Rule: Escalate model tier only on demonstrated reasoning failure. Haiku for narrow edits, Opus for architectural decisions — never the reverse.
Speed up execution by identifying what can run concurrently.
Rule: Be conservative about parallelizing writes. Parallel reads are always safe; parallel writes to the same resource require explicit isolation via worktrees or sequential landing.
Errors are expected in autonomous execution. Respond to them systematically, not reactively.
For looping workflows, when a unit stalls: evict from the active queue, snapshot all findings and error context, regenerate a narrowed unit scope, and retry with updated constraints.
Rule: Log what was tried and why it failed. This prevents re-attempting the same failed approaches and provides context for intelligent recovery.
Good autonomous behavior is not always proceeding without stopping. Know when to pause.
Always pause and ask the user before:
Proceed autonomously when:
Rule: Ask once, upfront, for everything critical. Mid-task interruptions for things that could have been identified in planning are a failure of the planning step.
Choose tools strategically. The right tool for the right job.
package.json, requirements.txt, or equivalent — never assume a library is available.Rule: Communicate goal and context to sub-agents, not scripts. Over-specified delegation produces brittle agents; under-specified delegation produces correct ones.
Never declare a task complete without verifying it actually works.
In AI-first workflows, review generated code for: behavior regressions, security assumptions, data integrity, failure handling, and rollout safety. Minimize time spent on style issues already covered by automated formatting/lint.
Rule: Verification is not optional. A task completed but not verified is a task not completed.
Keep the user informed without creating noise.
Rule: Summaries should be under 100 words, high-signal, and always mention any mocking or approximation.
Know when to stop. Infinite loops without exit conditions are a failure mode.
A task is complete when ALL of the following are true:
Stop early and ask the user when:
Do NOT continue looping when:
Rule: Set at least one hard exit condition (max-runs, max-cost, max-duration, or completion signal) for every automated loop before starting it.
Rule: Never commit, log, or expose secrets/credentials. For external communication (email, API calls, webhooks), confirm with the user before sending.
Before each major action, ask yourself:
When executing a written implementation plan, follow this cycle:
Rule: STOP executing immediately when you hit a blocker (missing dependency, test fails, instruction unclear), the plan has critical gaps, you don't understand an instruction, or verification fails repeatedly. Ask for clarification rather than guessing. Never start implementation on main/master branch without explicit user consent.
When using subagents for plan execution, apply a two-stage review after each task:
Stage 1 — Spec Compliance Review: Dispatch a reviewer subagent to confirm the implementation matches the spec. Check: are all requirements met? Is anything extra added that wasn't requested? If issues found, the implementer fixes them and the reviewer re-reviews.
Stage 2 — Code Quality Review: Only after spec compliance passes, dispatch a quality reviewer. Check: code correctness, test quality, naming, patterns. If issues found, the implementer fixes them and the reviewer re-reviews.
Rule: Never start code quality review before spec compliance is confirmed. Never move to the next task while either review has open issues. Never skip review loops — if a reviewer found issues, the implementer fixes them, and the reviewer reviews again.
Each subagent gets:
testing
AUTHORIZED USE ONLY: This skill contains dual-use security techniques. Before proceeding with any bypass or analysis: > 1.
testing
Provide comprehensive techniques for attacking Microsoft Active Directory environments. Covers reconnaissance, credential harvesting, Kerberos attacks, lateral movement, privilege escalation, and domain dominance for red team operations and penetration testing.
development
Detects missing zeroization of sensitive data in source code and identifies zeroization removed by compiler optimizations, with assembly-level analysis, and control-flow verification. Use for auditing C/C++/Rust code handling secrets, keys, passwords, or other sensitive data.
development
Comprehensive guide to auditing web content against WCAG 2.2 guidelines with actionable remediation strategies.