skills/execute/SKILL.md
Use when implementing a feature plan file, choosing between sequential and team-based parallel execution with mandatory validation gates
npx skillsauth add giladresisi/ai-dev-env executeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Read plan file and execute all tasks according to the plan's specifications.
This skill enforces BLOCKING validation gates.
Execution is NOT complete until:
You CANNOT claim execution complete or generate the final Output Report until ALL validation requirements pass.
Validation steps use MANDATORY, BLOCKING, and CRITICAL language to indicate enforcement. These are not optional or advisory - they are hard requirements that gate completion.
MANDATORY — runs before anything else, including execution mode selection.
This gate ensures every execution run has explicit, user-confirmed acceptance criteria before a single line of code is written.
Work through the following sources in order, stopping as soon as criteria are found.
Source 1 — Plan file (when executing a pre-made plan):
## ACCEPTANCE CRITERIA, ## Acceptance Criteria, ## Success Criteria, ## Completion Criteria, ## Done When## COMPLETION CHECKLIST — individual checklist items there may serve as criteriaSource 2 — acceptance_criteria.md (supplementary):
.agents/acceptance_criteria.md and acceptance_criteria.md in the project rootSource 3 — The request itself:
If all three sources yield nothing → proceed to Step 0.2.
STOP. Do NOT proceed to execution mode selection.
acceptance-criteria-define skill (preferred)If the acceptance-criteria-define skill is available in this system, invoke it now via the Skill tool:
skill: "acceptance-criteria-define"
context: "<plan file path, or a summary of the execution request if no plan file>"
Pass the plan file path (if one exists) or a concise description of what is being implemented as context. The skill will derive suggested criteria, confirm them with the user, and write them to the plan file or acceptance_criteria.md.
After the skill completes: the criteria are now written. Proceed to Step 0.4 (skip Step 0.3 — the skill has already handled writing).
acceptance-criteria-define is not available)Output this message verbatim (filling in the bracketed parts):
⚠️ No acceptance criteria found
I could not find acceptance criteria for this execution request in:
- The plan file (if provided): [plan file path or "N/A"]
- acceptance_criteria.md
- The request itself
Before I start implementing, we need to agree on what "done" looks like.
Here are my suggested acceptance criteria based on my understanding of the request/plan:
## Suggested Acceptance Criteria
### Functional
- [ ] <criterion>
- [ ] <criterion>
### Error Handling
- [ ] <criterion>
### Validation
- [ ] <criterion> — verified by: `<command>`
### Out of Scope
- <item> — not required for this task
Then use AskUserQuestion to ask:
Question: "How would you like to proceed?"
Options:
Wait for the user's response. Do NOT proceed until one of the four options is chosen.
Handling each response:
If a plan file exists (this skill was invoked with a plan file path):
## ACCEPTANCE CRITERIA section if it exists, or append one before ## COMPLETION CHECKLIST (or at the very end if that section is absent)- [ ] checkboxes<plan-file-path>."If no plan file (ad-hoc request):
Proceed to the Execution Mode Decision Protocol below.
REQUIRED: Before executing, analyze the plan and explicitly state your execution mode decision.
Read the plan and check for these explicit parallel execution indicators:
"Parallel Execution Strategy" section - Strongest signal
"Parallel Execution:" metadata - Top-level plan metadata
**Parallel Execution:** ✅ **Yes** or similar"Team Structure" section - Defined agent assignments
"Execution Order" with phases - Multi-phase execution plan
If no explicit parallel signals found, fall back to task analysis:
MANDATORY OUTPUT: Print your decision using this template:
```
Plan: [plan name] Decision: [TEAM-BASED PARALLEL | SEQUENTIAL]
Reasoning:
Conclusion: [1-2 sentences explaining why you chose team-based or sequential execution]
Team structure (if parallel): [List agent roles and responsibilities from plan, or design team if not specified] ```
Based on your decision, jump to either:
Use these criteria in priority order. Higher priority criteria override lower ones.
After completing the Execution Mode Decision Protocol above, you will execute using one of these strategies:
NOTE: If the plan includes a "Parallel Execution Strategy" section, you MUST use team-based execution, even if task count is low. The plan author has explicitly designed the work for parallelism.
For simple plans (1-3 independent tasks, no parallel execution strategy):
REQUIRED BEFORE ANY CODE CHANGES:
Run the full test suite once on the unmodified codebase and record the results:
[project test command] (e.g., uv run pytest tests/ -v or npm test) — on
full-suite runs, exclude side-effecting tests by default (live external connections AND
process-lifecycle/machine-wide-kill tests; see "Side-Effecting Test Safety" below); include
them only if the plan explicitly opts in.Document your baseline in your working notes:
Pre-implementation baseline: X/Y passing.
Pre-existing failures (not introduced by this work):
- test_foo_bar — [reason if known]
- test_baz_qux — [reason if known]
Why this runs before implementation, not after: Without a baseline, you cannot tell whether a failing test at validation time is something you broke or something that was already broken. Pre-existing failures must be excluded from the "all tests pass" acceptance criterion — but only if documented before you started. If you discover failures mid-execution, you cannot retroactively claim they are pre-existing.
If pre-existing failures are found:
REQUIRED IMMEDIATELY AFTER READING THE PLAN — before touching any files:
Scan the plan for steps that cannot be automated: secrets, .env files, credentials, migrations, service setup, CLI installs.
For EACH manual prerequisite found:
Classify it:
KB_API_KEY in .env, DB migration)For runtime-critical prerequisites:
For validation-only prerequisites:
Why this must happen before coding, not just before validation:
If you start writing tests before required secrets exist, you will inject env vars directly into test fixtures to make tests pass. This masks wiring bugs — e.g., load_dotenv() is never called because tests bypass it with os.environ["KEY"] = "...". The gap is invisible: every file looks correct in isolation and every test passes. The user discovers the bug on first real startup.
For EACH task in "Step by Step Tasks":
After writing or editing any Python file:
Check whether the python-performance-optimization skill is listed in your available skills. If it is, invoke it on the modified function(s):
skill: "python-performance-optimization"
args: "<function name(s)> in <file path>"
Apply any recommendations before moving to the next task. If the skill is not available, skip this step silently — do not mention it.
REQUIRED as part of every implementation task:
File-level comments — Add a short comment block at the top of every file you create or significantly modify, unless one already exists. It should state:
Example (Python):
# auth/token_validator.py
# Validates JWT tokens on incoming API requests.
# Lives here (not in middleware) so validation logic can be unit-tested independently.
Example (TypeScript):
// hooks/useSession.ts
// Custom hook that manages user session state and syncs it with localStorage.
// Centralizes session logic so all components get a consistent view of auth state.
Inline remarks — Add a concise comment directly above (not inline with) any of the following, where the reason isn't immediately obvious from the code itself:
Rules:
REQUIRED BEFORE PROCEEDING TO STEP 3:
Check if plan specifies test files or test cases
If tests ARE specified in plan:
If NO tests specified in plan:
Do not mock away required user setup:
When a test fixture injects an env var that a user is supposed to supply (e.g., os.environ["API_KEY"] = "test-key"), this is fine for unit tests — but you must also ensure at least one validation command exercises the real loading path. Specifically: if an env-loading library (python-dotenv, dotenv, etc.) is in the plan or requirements, confirm that load_dotenv() (or equivalent) is called in the app entry point — not just that the library is installed. Search for the call site; if it is missing, add it before proceeding.
DO NOT skip to validation without addressing test requirements.
CRITICAL: Before running validation commands, check if the plan requires user actions that must be completed BEFORE tests can pass.
Check the plan for these blocking prerequisites:
Database migrations (explicitly mentioned example)
External service setup
Infrastructure changes
Manual installations
Env-loading wiring — when python-dotenv, dotenv, or any config-loading library appears in requirements or the plan:
load_dotenv() call (or equivalent)main.py, app.py, etc.) — not just in requirements.txt or pyproject.tomlIf ANY blocking user actions are required:
STOP execution - Do NOT run validation commands yet
List all required actions:
⚠️ USER ACTIONS REQUIRED BEFORE VALIDATION
The following actions must be completed before tests can pass:
1. [Action description]
- Why needed: [Brief explanation]
- How to do it: [Clear instructions or reference to plan section]
2. [Next action...]
Ask user to notify when complete:
Please complete these actions and reply "done" or "ready" when finished.
I will then continue with validation testing.
WAIT for user confirmation - Do NOT proceed until user responds
After user confirms - Continue to Step 3 (Run Validation Commands)
If you did NOT block on manual prerequisites at Step 1.5 (missed them during planning):
If NO blocking user actions found:
Example blocking action (database migration):
⚠️ USER ACTIONS REQUIRED BEFORE VALIDATION
1. Apply Database Migration
- Why needed: Tests require new RPC functions (hybrid_search_chunks, keyword_search_chunks)
- How to do it:
* Open Supabase Dashboard → SQL Editor
* Copy contents of supabase/migrations/013_hybrid_search.sql
* Paste and execute in SQL Editor
Please complete this action and reply "done" when finished.
I will then run the full test suite to verify everything works.
DO NOT claim tests "can't run" or mark execution complete without running tests after user completes required actions.
CRITICAL: Execute EVERY validation command listed in the plan.
For EACH validation command:
Verification Summary:
After all commands run, display summary:
VALIDATION SUMMARY:
- Command 1: ✅ PASS (output shown above)
- Command 2: ✅ PASS (output shown above)
- Command 3: ✅ PASS (output shown above)
...
Status: [ALL PASSED / FAILURES DETECTED]
If ANY command shows ❌ FAIL, execution is INCOMPLETE - fix and re-validate.
CRITICAL: Remove ALL debug traces added during implementation, debugging, and validation.
What to remove:
console.log(), console.debug(), console.info() (JavaScript/TypeScript)print(), pprint() statements (Python)debugger; statementsWhat to keep:
How to identify debug traces YOU added:
# JavaScript/TypeScript
grep -r "console.log\|console.debug\|debugger;" --include="*.ts" --include="*.js" --include="*.tsx" --include="*.jsx"
# Python
grep -r "print(" --include="*.py" | grep -v "# keep" | grep -v "test_"
Verification:
# Run search again after cleanup - should find zero new debug traces
# Compare with git diff to ensure only debug traces removed
git diff --stat
DO NOT proceed to Verification Gate until debug traces are cleaned up.
CRITICAL: You CANNOT proceed to "Output Report" until ALL items below pass.
BLOCKING REQUIREMENTS:
Verification Process:
Self-Check Question: "Can I confidently claim this execution is production-ready?"
DO NOT claim execution complete with failing validations or missing tests.
For complex plans (4+ tasks, parallel opportunities, or explicit parallel strategy):
Use TeamCreate tool:
Same rule as in Sequential Execution — run the full test suite once on the unmodified codebase before spawning any agent or writing any code. Record N passing / N failing and note any pre-existing failures so agents are not held responsible for them during validation.
Same rule as in Sequential Execution — scan the plan for all manual prerequisites before spawning any teammates or writing any code.
Runtime-critical prerequisites (secrets, .env files, credentials, migrations): ask the user to complete them now and wait for confirmation before proceeding.
Reason: spawned agents that start without real config will write tests that mock-inject env vars, masking wiring bugs that only appear at real startup.
For each task in the plan, create using TaskCreate
Common Team Structures:
3-Agent Team (Frontend/Backend split):
4-Agent Team (Full-stack + Database):
Create general-purpose agents for each role and execute tasks in dependency-based waves.
During Execution:
Code Remarks — required for all agents:
Each spawned agent MUST follow these commenting rules when implementing their tasks:
Python Performance Check — required for all agents when editing Python:
After writing or editing any Python file, each agent must check whether the python-performance-optimization skill is available in their context. If it is, invoke it on the modified function(s) and apply any recommendations before marking the task done. If the skill is not available, skip silently.
Include both requirements explicitly in the prompt given to each spawned agent.
⚠️ Resource-Safety Rule for Parallel Test Commands
When assigning validation/test commands to agents running in parallel, distinguish between two categories:
-k filter or explicit file path.
Safe to run in parallel: pytest tests/test_foo.py -k "my_feature"pytest tests/ or npm test with no file filter.
NEVER assign these to a parallel agent. Each such process loads the entire test
dependency tree simultaneously; N agents = N × full Python/Node runtimes in memory,
which spikes RAM and CPU proportionally and can stall or OOM the machine.Rule: Each parallel agent runs only its own targeted test command (its feature slice). The full-suite regression command is run once, sequentially, by the orchestrator in Step 5 (Run Validation Commands) — after all agents have completed and been shut down.
⚠️ Side-Effecting Test Safety (full-suite runs default-skip tests with machine-wide side effects)
Full-suite runs — both the baseline (Step 1.25) and the final validation — MUST exclude tests with live, machine-wide side effects by default. On shared or multi-worktree machines these tests reach outside the test process and can disrupt or kill a live process. Two categories:
terminate path. These can terminate a live process across the whole machine, not
just their own worktree. Classic trap: a test that calls an orchestrator's run()/startup path
which scans for and terminate()s every matching process — running it in any worktree kills
a live orchestrator running in another (no traceback/WER; the victim's parent dies and its
child is orphaned). Such a test is unsafe unless it patches/mocks the kill path.pytest tests/ -m "not ib and not external and not lifecycle"), or --ignore / --deselect
of the known paths (e.g. pytest tests/ --ignore=tests/test_ib_realtime.py --ignore=tests/test_orchestrator_main.py).
If the project has no marker yet, exclude by path and note it.REQUIRED BEFORE PROCEEDING:
After all teammates complete their assigned tasks:
Verify Integration:
Run Test Suite:
Display Test Results:
TEST EXECUTION RESULTS:
[Show full test output]
Status: ✅ All tests passing
CRITICAL: Before running validation commands, check if the plan requires user actions that must be completed BEFORE tests can pass.
Check the plan for these blocking prerequisites:
Database migrations (explicitly mentioned example)
External service setup
Infrastructure changes
Manual installations
Env-loading wiring — when python-dotenv, dotenv, or any config-loading library appears in requirements or the plan:
load_dotenv() call (or equivalent)main.py, app.py, etc.) — not just in requirements.txt or pyproject.tomlIf ANY blocking user actions are required:
STOP execution - Do NOT run validation commands yet
List all required actions:
⚠️ USER ACTIONS REQUIRED BEFORE VALIDATION
The following actions must be completed before tests can pass:
1. [Action description]
- Why needed: [Brief explanation]
- How to do it: [Clear instructions or reference to plan section]
2. [Next action...]
Ask user to notify when complete:
Please complete these actions and reply "done" or "ready" when finished.
I will then continue with validation testing.
WAIT for user confirmation - Do NOT proceed until user responds
After user confirms - Continue to Step 5 (Run Validation Commands)
If you did NOT block on manual prerequisites at Step 1.5 (missed them during planning):
If NO blocking user actions found:
Example blocking action (database migration):
⚠️ USER ACTIONS REQUIRED BEFORE VALIDATION
1. Apply Database Migration
- Why needed: Tests require new RPC functions (hybrid_search_chunks, keyword_search_chunks)
- How to do it:
* Open Supabase Dashboard → SQL Editor
* Copy contents of supabase/migrations/013_hybrid_search.sql
* Paste and execute in SQL Editor
Please complete this action and reply "done" when finished.
I will then run the full test suite to verify everything works.
DO NOT claim tests "can't run" or mark execution complete without running tests after user completes required actions.
CRITICAL: Execute EVERY validation command listed in the plan.
For EACH validation command:
Validation Summary:
VALIDATION SUMMARY:
- Command 1: ✅ PASS
- Command 2: ✅ PASS
- Command 3: ✅ PASS
...
Status: [ALL PASSED / FAILURES DETECTED]
CRITICAL: Remove ALL debug traces added during implementation, debugging, and validation.
What to remove:
console.log(), console.debug(), console.info() (JavaScript/TypeScript)print(), pprint() statements (Python)debugger; statementsWhat to keep:
How to identify debug traces added by team:
# JavaScript/TypeScript
grep -r "console.log\|console.debug\|debugger;" --include="*.ts" --include="*.js" --include="*.tsx" --include="*.jsx"
# Python
grep -r "print(" --include="*.py" | grep -v "# keep" | grep -v "test_"
Verification:
# Run search again after cleanup - should find zero new debug traces
# Compare with git diff to ensure only debug traces removed
git diff --stat
Coordinate cleanup across team - Ensure all teammates remove their debug traces before proceeding.
DO NOT proceed to Pre-Shutdown Verification Gate until debug traces are cleaned up.
BLOCKING REQUIREMENTS - Cannot shutdown team until ALL pass:
Only after ALL items ✅ COMPLETE:
Gracefully shut down all teammates:
Always check for explicit parallel execution sections first:
Example: Plan with Explicit Strategy
If plan contains "Parallel Execution Strategy" section with 3 agents defined, you MUST:
Use team-based parallel execution when:
pytest tests/test_a.py, pytest tests/test_b.py)Use sequential execution when:
Never parallelise full-suite commands:
pytest tests/ with no file filter — always run sequentially in the orchestrator's Step 5 onlynpm test / cargo test / any command that loads the entire test tree — same rulePlan Structure:
Decision: TEAM-BASED PARALLEL - Plan explicitly defines 3-agent team structure. Must use team-based execution as plan author designed for parallelism.
Plan Structure:
Decision: SEQUENTIAL - No parallel signals, only 2 tightly coupled tasks. Team overhead not justified.
Plan Structure:
Decision: TEAM-BASED PARALLEL - Clear frontend/backend separation justifies team execution. Design 3-agent team to parallelize work.
Run this step BEFORE generating the Output Report.
ONLY run after passing the Verification Gate (Step 4 for sequential, Step 6 for team-based).
All 3 subagents are mandatory — none can be skipped.
Spawn all 3 as foreground parallel subagents using the Agent tool (do NOT set run_in_background: true). Start all simultaneously in a single message — do NOT wait for one before starting others. The Output Report cannot be written until their results are in hand.
Required Agent tool parameters for every subagent:
subagent_type: "general-purpose" — always use this, never a named skill type"Use the Skill tool to invoke ai-dev-env:<skill-name> …" — this ensures the skill is actually invoked inside the agentSkill: execution-report
Pass this context to the subagent:
Coverage gap context — required for execution-report:
Compare planned test coverage (from the plan file) against what was actually executed:
Planned coverage — from the plan's "Test Automation Summary" / "Coverage Summary":
Actual coverage — from this execution:
Gaps — explicitly identify:
Use this to populate the "Test Results" and "Validation Results" sections of the execution report, and add a "Coverage Gaps" subsection that lists each gap with its severity (blocking / non-blocking) and a recommended follow-up action.
Skill: acceptance-criteria-validate
Pass the plan file path and the execution output summary as context. The skill will locate acceptance criteria in the plan (or in acceptance_criteria.md) and validate whether each was met. It will produce a ACCEPTED / REJECTED / NEEDS REVIEW verdict and surface any unmet criteria.
If this skill produces a REJECTED or NEEDS REVIEW verdict: surface its output prominently to the user before declaring execution complete. Do not silently discard a failing validation result.
Skill: code-review
Pass the list of files modified during this execution and the plan file path as context. The skill will perform a technical review of all changed files for bugs, security issues, and standards compliance.
Note: These skills are installed via the marketplace. Do NOT use a bash file-existence check — just invoke each via the Agent tool as described above.
Wait for all subagents to complete, then incorporate their findings into the Output Report sections below.
ONLY generate this report after:
After execution completes and all validations pass, provide summary:
Tests Created/Updated:
Test Suite Execution:
[Show full test suite output]
Status: ✅ All tests passing
Validation Commands Executed:
| Level | Command | Status | Output | |-------|---------|--------|--------| | 1 | [command] | ✅ PASS | [summary] | | 2 | [command] | ✅ PASS | [summary] | | 3 | [command] | ✅ PASS | [summary] | | ... | ... | ... | ... |
Validation Summary: ✅ ALL VALIDATIONS PASSED
[Paste verdict and key findings from the code-review subagent]
[Paste ACCEPTED / REJECTED / NEEDS REVIEW verdict and per-criterion results from the acceptance-criteria-validate subagent]
[Paste coverage gaps and key findings from the execution-report subagent]
✅ EXECUTION COMPLETE
testing
Creates a new git worktree in the auto-co-trader project for any purpose — optimization, regression, backtesting, brainstorming, etc. Use this skill when the user wants to CREATE or SET UP a new worktree — phrases like "prepare a new worktree", "set up a worktree", "create a new worktree for <purpose>", "prep a new worktree", "new worktree for autoresearch", "prepare optimization from [strategy]", or "create a worktree using [strategy]". Do NOT use this skill when the user is already in a worktree and wants to start/run/begin a task — that is handled by the relevant program file in the worktree session.
development
Use when running comprehensive project validation including tests, type checking, linting, API connectivity checks, and server startup verification
research
Use when performing a meta-level analysis of plan adherence after implementation to identify process improvements and suggest CLAUDE.md updates
documentation
Use when investigating a GitHub issue to identify root cause, assess impact, and create a fix strategy document