skills/workflow-feature/SKILL.md
Use for new projects, features, or major refactoring. Phases: Planning (Req→UX→Test Spec→Dev Plan) → Implementation → QA → Lessons Learned. Auto-retry on failure, unattended.
npx skillsauth add lklimek/claudius workflow-featureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use for new projects, new/fundamentally modified features, major refactoring.
Four phases, MANDATORY and SEQUENTIAL. Never skip, merge, reorder, or run phases in parallel. Within a phase, tasks and sub-phases MAY be combined or parallelized as appropriate.
Search project and global memories for relevant context before planning or dispatching agents:
search_memories(query="<topic>", project="<repo>") — discover what past sessions learned about this areaget_memories(memory_id="<id>") — read full details of relevant memories found in step 1These are MCP tools on the MemCan server. Use them if available. Skip silently if not.
This workflow runs without user interaction unless a decision is required. Do NOT pause for confirmation between phases. Accumulate reports and present a single Final Report when all phases complete or the workflow cannot proceed.
Four sequential sub-phases. Each produces an artifact consumed by the next. Sub-phases may be combined into fewer agent invocations when scope is small, but the concerns must still be addressed in order.
ux-designer-dizietPersonas, domain knowledge, functional/non-functional requirements, user stories, data needs & processing rules. Validate requirements against each persona.
Artifact: Requirements document — structured list with acceptance criteria.
ux-designer-dizietUser journeys, interaction patterns, UI mocks/wireframes, accessibility, DX planning. Derived from Requirements artifact.
Artifact: UX specification — journeys, mocks, interaction patterns.
qa-engineer-marvinWrite test case SPECIFICATIONS (not code) derived from Requirements + UX artifacts. Each test case: ID, description, preconditions, steps, expected outcome, requirement traceability. These define the acceptance criteria that QA validates against.
Artifact: Test case specification document.
architect-nagathaSystem layers and responsibilities, tool/tech selection, prefer reuse, guide code placement, deployment model. Decompose work into implementation tasks. Each task references which test cases it satisfies.
Batch small tasks so each agent gets ≥100 lines of work — respect specialization boundaries (don't merge frontend with backend, security with docs, or unrelated domains).
Artifact: Development plan with task breakdown.
developer-bilbyExecute the Development Plan task by task. For each task:
Pre-empt the QA audits before declaring impl done:
coding-best-practices Cross-Cutting Rules: length cap (≤2 preferred, 3 mediocre), present-state only, two-tier audience (strict for internal commentary, liberal for public-API doc comments).Cargo.toml, package.json, pyproject.toml, go.mod, etc.), and any project-defined reference repos for an existing equivalent before rolling a new one. If found and publicly exported, use it. If crate-private (or language equivalent), propose promoting it. If only partially overlaps, document the rationale for the new copy.Multiple Bilby agents may run in parallel on independent tasks. Use teams for shared-file tasks.
Tests are written FIRST within each task, before implementation code.
Separate agent per concern — run in parallel:
| Agent | Focus |
|-------|-------|
| qa-engineer-marvin | Three parallel passes:<br>• Tests — execute test cases from spec, verify all pass, coverage gaps<br>• Docs review (read-only) — apply coding-best-practices Cross-Cutting Rules (length cap + present-state + two-tier audience) to comments and API doc comments (rustdoc, JSDoc, docstrings, godoc, etc.) introduced by the PR diff. Findings with file:line citations and proposed rewrites at /tmp/claudius-<scope>-docs-report.md.<br>• Dedup audit (read-only) — for every new publicly exported function, type, trait/interface, and module introduced by the PR, search the workspace, direct dependencies (per the project's manifest — Cargo.toml, package.json, pyproject.toml, go.mod, etc.), and project-defined reference repos for equivalent functionality. Findings (high-confidence duplicates, partial overlaps, reviewed-and-rejected) with file:line citations both sides at /tmp/claudius-<scope>-dedup-report.md. |
| security-engineer-smythe | Security audit, dependency security |
| ux-designer-diziet | UX/DX audit against UX specification |
| technical-writer-trillian | End-user, developer, deployment docs |
| project-reviewer-adams | Validate Development Plan fully executed, code quality |
Both audits are READ-ONLY by mandate — emphasize this in the agent prompt template. Findings go to the lead, who decides follow-up:
To skip any audit, the lead must document the reason in the QA report.
QA validates TWO things:
No task is done until QA passes. Formatting, linting, and test passing are not optional. Fixes must deliver the intended end-user and developer experience, not just pass tests.
After QA passes, use claudius:lessons-learned skill to save:
Default to global memories unless strictly project-specific. Skip if nothing noteworthy. Report count of memories saved.
When a phase produces MEDIUM+ findings, test failures, or incomplete coverage:
| Failed Phase | Returns To | Rationale | |---|---|---| | QA (Phase 3) | Implementation (Phase 2) | Fix code/tests to match spec | | Implementation (Phase 2) | Dev Plan (Phase 1d) | Plan incomplete or infeasible | | Dev Plan (Phase 1d) | Test Case Spec (Phase 1c) | Test cases missing or contradictory | | Test Case Spec (Phase 1c) | UX Design (Phase 1b) | UX spec incomplete or ambiguous | | UX Design (Phase 1b) | Requirements (Phase 1a) | Requirements incomplete or conflicting |
Max 3 retries per phase. After 3, escalate to the user with a full report of all attempts and unresolved issues.
Presented ONLY when all phases complete (or max retries exhausted):
Agents default to model: inherit; set model per spawn (see grand-admiral Token Economy). Feature work leans opus for complex design and decisions; use sonnet for routine sub-tasks (straightforward implementation, config, docs, technical-writer-trillian).
Severity levels (via claudius:severity skill): CRITICAL > HIGH > MEDIUM > LOW > INFO.
Iterate until no issues above LOW remain.
Severity inflation guard: if a finding reappears across iterations (same meaning, possibly different agent/ID/wording), its severity must not increase. Downgrade to the previous iteration's level.
Include a deduplication pass — scan for duplicated logic, extract shared helpers, eliminate copy-paste. Do this during Implementation self-review and QA code quality checks.
For phases with multiple agents on shared files, use teams (TeamCreate + SendMessage + Task tools) to prevent duplicate work and conflicts. See the Claudius agent's Spawning section for team patterns.
Agents must commit all changes before exiting — uncommitted work cannot be merged.
ALL spawned agents MUST use isolation: "worktree" — no exceptions.
Pre-flight pattern: see grand-admiral skill — Worktree Isolation. Default is Option A (local-SHA injection, no push); Option B (push first) is the explicit fallback.
Post-wave: verify worktree commits, merge into the feature branch, run tests, then clean up worktrees. Push only when the user explicitly authorizes it (e.g., via /push, /ci-dance, or direct instruction) — never push as an automatic step.
testing
Coordinator-only LLM validation pass. Adds ai_assessment / ai_verdict / ai_verdict_confidence and, in the rare partial-producer case, re-estimates absent risk/impact/scope on a consolidated v3 report.
testing
Use for typos or single-line fixes (≤20 lines). Same mandatory phase order (Planning→Impl→QA→LL), minimal ceremony. Auto-retry on failure.
testing
Use for bug fixes or small changes (≤200 lines). Same phase order as workflow-feature (Planning→Impl→QA→LL) with lighter ceremony. Auto-retry on failure, unattended.
testing
Use for interactive browser-based triage of review findings. Only invoke when explicitly requested.