skills/estimate-calibrator/SKILL.md
Produces calibrated three-point PERT estimates (best/likely/worst) with confidence intervals, unknowns, and assumptions. Triggers on: "estimate this", "how long will this take", "effort estimate", "confidence interval", "story points", "t-shirt sizing". NOT for task decomposition, use task-decomposer.
npx skillsauth add mathews-tom/armory estimate-calibratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Replaces single-point guesses with structured three-point estimates: decomposes work into atomic units, estimates best/likely/worst case for each, identifies unknowns and assumptions, calculates aggregate ranges using PERT, and assigns confidence levels with explicit rationale.
| File | Contents | Load When |
| ---------------------------------- | ------------------------------------------------------------------------- | ---------------------- |
| references/estimation-methods.md | PERT formula, three-point estimation, Monte Carlo basics | Always |
| references/unknown-categories.md | Technical, scope, external, and organizational uncertainty types | Unknown identification |
| references/calibration-tips.md | Cognitive biases in estimation, historical calibration, buffer strategies | Always |
| references/sizing-heuristics.md | Common task size patterns, complexity indicators, reference class data | Quick sizing needed |
If the work item is not already decomposed into atomic units:
For each task, estimate three scenarios:
| Scenario | Definition | Mindset | | ----------- | --------------------------------------- | --------------------------------------------- | | Best case | Everything goes right. No surprises. | "If I've done this exact thing before" | | Likely case | Normal friction. Some minor obstacles. | "Realistic expectation with typical setbacks" | | Worst case | Significant problems. Not catastrophic. | "Murphy's law but not a disaster" |
Key rule: Worst case is NOT "everything goes wrong." It's the realistic bad scenario (90th percentile), not the apocalyptic one (99th percentile).
Categorize unknowns that affect estimates:
| Category | Example | Impact | | -------------- | ------------------------------------- | -------------------------------------------- | | Technical | "Never used this library before" | Likely case inflated, worst case much higher | | Scope | "Requirements may change" | All estimates may shift | | External | "Depends on API access from partner" | Blocking risk — could delay entirely | | Integration | "Haven't tested with production data" | Hidden complexity at integration | | Organizational | "Need design approval" | Calendar time, not effort time |
For individual tasks, use the PERT formula:
Expected = (Best + 4 × Likely + Worst) / 6
Std Dev = (Worst - Best) / 6
For aggregate (project) estimates:
| Confidence | Meaning | When | | ---------- | -------------------------------- | --------------------------------------------- | | High | Likely case within ±20% | Well-understood task, team has done it before | | Medium | Likely case within ±50% | Some unknowns, moderate familiarity | | Low | Likely case within ±100% or more | Significant unknowns, new technology |
## Estimate: {Work Item}
### Summary
| Scenario | Duration |
|----------|----------|
| Best case | {time} |
| Likely case | {time} |
| Worst case | {time} |
| **PERT expected** | **{time}** |
| **Confidence** | **{High/Medium/Low}** |
### Task-Level Estimates
| # | Task | Best | Likely | Worst | PERT | Unknowns |
|---|------|------|--------|-------|------|----------|
| 1 | {task} | {time} | {time} | {time} | {time} | {key unknown or "None"} |
| 2 | {task} | {time} | {time} | {time} | {time} | {key unknown} |
| | **Total** | **{sum}** | **{sum}** | **{sum}** | **{pert}** | |
### Key Unknowns
| # | Unknown | Category | Impact on Estimate | Mitigation |
|---|---------|----------|-------------------|------------|
| 1 | {unknown} | {Technical/Scope/External} | +{time} if realized | {spike, prototype, early test} |
### Assumptions
- {Assumption 1 — what must be true for this estimate to hold}
- {Assumption 2}
### Risk Factors
- {Risk}: If realized, adds {time}. Likelihood: {High/Medium/Low}.
### Confidence Rationale
**{High/Medium/Low}** because:
- {Specific reason — e.g., "Team has built 3 similar features"}
- {Specific reason — e.g., "External API is a new integration"}
### Recommendation
{Commit to PERT expected with {X}% buffer, or spike the top unknown first.}
| Problem | Resolution | | ------------------------------------- | --------------------------------------------------------------------------------------------------- | | Work item not decomposed | Decompose into 3-8 tasks first (or suggest task-decomposer skill). | | No historical reference | Estimate relative to a known task: "This is about 2x the auth feature." | | Stakeholder wants a single number | Provide PERT expected with buffer matching confidence level (High: +20%, Medium: +50%, Low: +100%). | | Estimate seems too large | Check for scope creep in task list. Remove non-essential tasks. Identify what can be deferred. | | Team has never done this type of work | Mark confidence as Low. Recommend a spike before committing to an estimate. |
Push back if:
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.