.claude/skills/workflow/agentic-engineering/SKILL.md
Framework for decomposing agent-driven tasks into independently verifiable units
npx skillsauth add andrem-sec/psc-comet agentic-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Framework for decomposing agent-driven tasks into manageable, verifiable units with appropriate model routing.
Without systematic task decomposition, Claude:
Agentic engineering ensures tasks are right-sized, model-routed correctly, and verifiably complete.
Each work unit should be independently verifiable in 15 minutes or less.
What "unit" means:
Why 15 minutes:
How to verify:
If a unit takes longer than 15 minutes:
Never start implementation without defining the eval first.
1. Define Eval
2. Run Baseline
3. Implement
4. Re-Run Eval
| Unit Type | Eval Approach | Example |
|-----------|---------------|---------|
| New function | Unit test | test_calculate_total() asserts result == 42 |
| Bug fix | Regression test | test_notification_bug() reproduces issue, then passes after fix |
| API integration | Integration test | curl -X POST /api/endpoint returns 200 with expected JSON |
| Configuration | Smoke test | service starts without errors, logs show new config value |
| Documentation | Review checklist | Code examples run, terminology is consistent, <2-minute read |
Use Haiku (fast, cheap) for:
Use Sonnet (balanced) for:
Use Opus (powerful, slow) for:
Cost awareness:
Integration with model-router skill: This skill provides task category definitions; model-router handles the actual routing logic. Reference model-router for cost calculation and fallback strategies.
After major phase transitions:
After context compaction:
During iterative refinement:
When context is still relevant:
Compact after milestones:
Do NOT compact during:
Reference strategic-compact skill for detailed compaction decision guide.
Each unit has one dominant risk. Identify it explicitly before starting.
Risk categories:
Correctness: Will this produce the right result?
Performance: Will this be fast enough?
Security: Could this introduce a vulnerability?
Integration: Will this work with existing systems?
Maintainability: Can future developers understand this?
If you identify 2+ dominant risks, the unit is too large. Split into smaller units, each with a single dominant risk.
Every unit must have a clear, binary done condition.
Good done conditions:
Bad done conditions:
Pattern: Every done condition should be verifiable by running one command (npm test, curl, coverage check, etc.).
Claude Code's tool orchestration engine automatically parallelizes consecutive read-only tool calls (up to 10 concurrent). Write-heavy tools force a serial boundary.
To get free parallelism: group reads before writes.
Parallel (engine runs concurrently):
[Read file1] [Grep pattern] [Glob *.sh] [WebSearch topic]
Serial (write forces boundary):
[Read file1] ← concurrent batch 1
[Edit file1] ← serial
[Read file2] [Grep pattern] ← concurrent batch 2
[Edit file2] ← serial
When designing agent prompts, group all information-gathering steps before any implementation steps. This is not a discipline — it's a scheduling hint the engine uses automatically.
Implication for multi-step tasks: Batch all reads in one conceptual block, then all writes. Never interleave reads and writes unless the read depends on a prior write.
When coordinating multiple parallel agents, use a sentinel string as the completion signal instead of a shared database or callback mechanism.
Each agent ends its final message with exactly one of:
PR: <url> (success)
PR: none — <reason> (could not complete)
RESULT: <json> (non-PR workflows — domain-specific sentinel)
The coordinator parses this sentinel to track completion. No shared state, no callbacks, crash-safe, restartable.
Rule: Define the sentinel format in the worker's spawn prompt. Never rely on unstructured output for coordination signals.
Skipping eval definition: Starting without a defined eval leads to scope creep.
Using Opus for Haiku tasks: Wastes 15x cost for classification or boilerplate.
Working in 60-minute units: Units >15 minutes are unverifiable and compound errors.
Compacting mid-debugging: Loses stack traces. Finish debugging first.
Interleaving reads and writes: Breaks the engine's parallelism optimization. Group reads first.
Unstructured completion signals: Expecting a coordinator to parse free-text output for task status. Use explicit sentinels.
data-ai
Parallel agent swarm — decomposes work into independent units, spawns isolated workers, tracks PRs via fan-in
testing
Audit animations and transitions for motion accessibility, performance safety, and design intent. Enforces prefers-reduced-motion compliance and blocks layout-triggering transitions.
testing
Test specifically for AI-introduced regressions that repeat without tests
development
Framework for designing quality agents with proper action space and contracts