/SKILL.md
Adapts to your team's working mode — human-only, hybrid, or agent-first — with research-backed formulas, confidence bands, and the right velocity model for each.
npx skillsauth add enreign/progressive-estimation progressive-estimationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Estimate AI-assisted and hybrid human+agent work with progressive disclosure. Research-backed formulas with PERT statistics, confidence bands, and calibration feedback loops.
Use when a user wants time/effort estimates for:
Keywords: estimate, how long, effort, sizing, story points, rounds, agent time
Use the AskUserQuestion tool if available in your environment for ALL user
interactions: mode selection (Phase 0), intake questions (Phase 1), batch
confirmation ("adjust any tasks?"), and tracker selection (Phase 4). This
creates structured dropdowns instead of free-text back-and-forth. Ask one
question at a time (or group related questions, up to 4 per call) and wait
for the response before proceeding. If the tool is not available, fall back
to conversational questions in your text output. Instant mode skips Phase 0
and Phase 1 questions but still uses the tool for Phase 4 tracker selection.
Ask two questions upfront (one at a time):
This produces six paths:
When the user selects Instant mode:
Load references/questionnaire.md for the appropriate path.
Quick path asks only:
If task type is obvious from the description, auto-assign it and note it.
Detailed path adds: risk factors, integration complexity, domain familiarity, review depth, human fix ratio, confidence level, definition phase, organization context, and per-task dependency mapping.
Every question feeds a specific formula variable — see the mapping table in questionnaire.md for the complete wiring.
Auto-detect the team's cooperation mode from intake answers:
if num_agents == 0 → Human-only mode
if num_agents > 0 AND maturity in [exploratory, partial] → Hybrid mode
if num_agents > 0 AND maturity == mostly-automated → Agent-first mode
Announce the detected mode before proceeding:
| Mode | Announcement | Points Approach | |------|-------------|----------------| | Human-only | "Detected: human-only team. Using standard estimation with story points." | Points for sizing and velocity | | Hybrid | "Detected: hybrid team. Using dual-track estimation — points for sizing, hours for planning." | Dual-track: points + hours | | Agent-first | "Detected: agent-first team. Planning by human review capacity in hours." | Hours only; points optional for rough sizing |
This mode affects output format (Phase 4) and calibration recommendations (Phase 5).
Load references/frameworks.md to select the right model:
| Scenario | Framework | |----------|-----------| | Single agent task | Round-based | | Multi-agent project | Module/wave | | Backlog import or rough sizing | S-M-L conversion |
For batch input, apply the selected framework per task, then roll up with dependency sequencing.
Load references/formulas.md and compute estimates using the canonical
formulas. Claude performs the arithmetic inline — no external script needed.
The computation pipeline:
If the user requests a standalone deterministic calculator, generate one from
formulas.md in their preferred language. The generated script must:
Three subagent perspectives review the estimate before output:
Trigger rules based on complexity:
| Complexity | Council Members | Rationale | |------------|----------------|-----------| | S | None | Overhead exceeds value | | M | Skeptic only | 1 subagent | | L | Skeptic + Historian | 2 subagents | | XL | Full council (all 3) | All subagents |
Each agent gives 2-3 sentences. Output a consensus estimate that weighs all perspectives. Flag disagreements where agents differ by >20%.
Note: Use the Agent tool to spawn council members as subagents if available. If not available, simulate the perspectives inline.
Load references/output-schema.md for formatting.
Always lead with the one-line summary, then PERT block, then expand:
Expected: ~4 hrs | Committed (80%): ~5.5 hrs | 10-26 agent rounds + 3 hrs human | Risk: medium | Size: M
Then provide:
Ask which tracker and mode:
For batch output, produce a summary table first, then rollup, then warnings, then per-task details on request.
Load references/calibration.md when the user wants to log actuals or
tune their estimation ratios.
Key calibration features:
Suggest re-estimation when:
| Phase | Files Loaded | When | |-------|-------------|------| | 0-1 | questionnaire.md | Always (intake); skipped in Instant mode | | 1.5 | (no files) | Automatic after intake | | 2 | frameworks.md | After intake | | 3 | formulas.md | During computation | | 3.5 | calibration.md (reference stories) | Automatic for M+ tasks (council validation) | | 4 | output-schema.md | During output | | 5 | calibration.md | On request |
Based on METR research (24k runs, 228 tasks): AI agents excel at small tasks (~90% effectiveness) but effectiveness drops to ~30% for XL tasks. The skill automatically increases human effort allocation for larger tasks.
Every estimate produces a weighted expected value using a log-normal weighting:
(min + 4×geometric_mean + max) / 6 with standard deviation (max - min) / 6.
Deep validation (KS test, n=84k) showed log-normal fits actual software effort
distributions better than PERT-beta in all size bands.
Size-dependent multipliers derived from 84k estimate-actual pairs. Small tasks need larger buffers due to wider actual/estimate spreads:
Early-phase estimates have wider ranges. The skill widens min/max spread based on how well-defined the work is, preventing false precision.
The skill warns against common estimation mistakes: oversized tasks, false precision, estimates-as-commitments, and point-to-hour conversions.
Encoded Preference Skill — sequences a workflow Claude can already perform, following a specific estimation process with research-backed formulas, PERT statistics, and calibration feedback loops.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).