Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mathews-tom/estimate-calibrator

Name: estimate-calibrator
Author: mathews-tom

skills/estimate-calibrator/SKILL.md

npx skillsauth add mathews-tom/armory estimate-calibrator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Estimate Calibrator

Replaces single-point guesses with structured three-point estimates: decomposes work into atomic units, estimates best/likely/worst case for each, identifies unknowns and assumptions, calculates aggregate ranges using PERT, and assigns confidence levels with explicit rationale.

Reference Files

| File | Contents | Load When | | ---------------------------------- | ------------------------------------------------------------------------- | ---------------------- | | references/estimation-methods.md | PERT formula, three-point estimation, Monte Carlo basics | Always | | references/unknown-categories.md | Technical, scope, external, and organizational uncertainty types | Unknown identification | | references/calibration-tips.md | Cognitive biases in estimation, historical calibration, buffer strategies | Always | | references/sizing-heuristics.md | Common task size patterns, complexity indicators, reference class data | Quick sizing needed |

Prerequisites

Work item description (feature, task, project)
Decomposed tasks (or use task-decomposer skill first)
Context: team familiarity, tech stack, existing codebase

Workflow

Phase 1: Decompose Work

If the work item is not already decomposed into atomic units:

Break into tasks — Each task should be estimable independently.
Right granularity — Tasks should be 1 hour to 3 days. Larger tasks have higher uncertainty; break them down further.
Identify dependencies — Tasks on the critical path determine the minimum duration.

Phase 2: Three-Point Estimate

For each task, estimate three scenarios:

| Scenario | Definition | Mindset | | ----------- | --------------------------------------- | --------------------------------------------- | | Best case | Everything goes right. No surprises. | "If I've done this exact thing before" | | Likely case | Normal friction. Some minor obstacles. | "Realistic expectation with typical setbacks" | | Worst case | Significant problems. Not catastrophic. | "Murphy's law but not a disaster" |

Key rule: Worst case is NOT "everything goes wrong." It's the realistic bad scenario (90th percentile), not the apocalyptic one (99th percentile).

Phase 3: Identify Unknowns

Categorize unknowns that affect estimates:

| Category | Example | Impact | | -------------- | ------------------------------------- | -------------------------------------------- | | Technical | "Never used this library before" | Likely case inflated, worst case much higher | | Scope | "Requirements may change" | All estimates may shift | | External | "Depends on API access from partner" | Blocking risk — could delay entirely | | Integration | "Haven't tested with production data" | Hidden complexity at integration | | Organizational | "Need design approval" | Calendar time, not effort time |

Phase 4: Calculate Ranges

For individual tasks, use the PERT formula:

Expected = (Best + 4 × Likely + Worst) / 6
Std Dev = (Worst - Best) / 6

For aggregate (project) estimates:

Sum of expected values for total expected duration
Root sum of squares of std devs for aggregate uncertainty

Phase 5: Assign Confidence

| Confidence | Meaning | When | | ---------- | -------------------------------- | --------------------------------------------- | | High | Likely case within ±20% | Well-understood task, team has done it before | | Medium | Likely case within ±50% | Some unknowns, moderate familiarity | | Low | Likely case within ±100% or more | Significant unknowns, new technology |

Output Format

## Estimate: {Work Item}

### Summary
| Scenario | Duration |
|----------|----------|
| Best case | {time} |
| Likely case | {time} |
| Worst case | {time} |
| **PERT expected** | **{time}** |
| **Confidence** | **{High/Medium/Low}** |

### Task-Level Estimates

| # | Task | Best | Likely | Worst | PERT | Unknowns |
|---|------|------|--------|-------|------|----------|
| 1 | {task} | {time} | {time} | {time} | {time} | {key unknown or "None"} |
| 2 | {task} | {time} | {time} | {time} | {time} | {key unknown} |
| | **Total** | **{sum}** | **{sum}** | **{sum}** | **{pert}** | |

### Key Unknowns

| # | Unknown | Category | Impact on Estimate | Mitigation |
|---|---------|----------|-------------------|------------|
| 1 | {unknown} | {Technical/Scope/External} | +{time} if realized | {spike, prototype, early test} |

### Assumptions
- {Assumption 1 — what must be true for this estimate to hold}
- {Assumption 2}

### Risk Factors
- {Risk}: If realized, adds {time}. Likelihood: {High/Medium/Low}.

### Confidence Rationale
**{High/Medium/Low}** because:
- {Specific reason — e.g., "Team has built 3 similar features"}
- {Specific reason — e.g., "External API is a new integration"}

### Recommendation
{Commit to PERT expected with {X}% buffer, or spike the top unknown first.}

Calibration Rules

Three points, not one. Single-point estimates are always wrong. Three points communicate uncertainty — the most important part of any estimate.
Worst case is the 90th percentile, not the 99th. "Asteroid hits the office" is not a useful worst case. "The API documentation is wrong and we need to reverse-engineer the protocol" is realistic worst case.
Unknowns inflate estimates more than known difficulty. A hard but well-understood task is more predictable than an easy but novel one.
Estimates are not commitments. Communicate ranges, not deadlines. If stakeholders need a single number, give the PERT expected plus a buffer for confidence level.
Spike unknowns early. If a single unknown dominates the estimate range, invest 1-2 days spiking it before estimating the rest.

Error Handling

| Problem | Resolution | | ------------------------------------- | --------------------------------------------------------------------------------------------------- | | Work item not decomposed | Decompose into 3-8 tasks first (or suggest task-decomposer skill). | | No historical reference | Estimate relative to a known task: "This is about 2x the auth feature." | | Stakeholder wants a single number | Provide PERT expected with buffer matching confidence level (High: +20%, Medium: +50%, Low: +100%). | | Estimate seems too large | Check for scope creep in task list. Remove non-essential tasks. Identify what can be deferred. | | Team has never done this type of work | Mark confidence as Low. Recommend a spike before committing to an estimate. |

When NOT to Estimate

Push back if:

The work is exploratory (research, spikes) — timebox instead of estimating
Requirements are completely undefined — define scope first
The user wants precision (hours) for a large project — provide ranges, not false precision
The estimate will be used as a commitment without acknowledging uncertainty

mathews-tom/estimate-calibrator

skills/estimate-calibrator/SKILL.md

Produces calibrated three-point PERT estimates (best/likely/worst) with confidence intervals, unknowns, and assumptions. Triggers on: "estimate this", "how long will this take", "effort estimate", "confidence interval", "story points", "t-shirt sizing". NOT for task decomposition, use task-decomposer.

221 stars

tools

Updated May 4, 2026

$ install --global

skillsauth

npx skillsauth add mathews-tom/armory estimate-calibrator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 4, 2026, 7:02 AM175.6s6 files scanned

SKILL.md

name:: estimate-calibrator
description:: Produces calibrated three-point PERT estimates (best/likely/worst) with confidence intervals, unknowns, and assumptions. Triggers on: "estimate this", "how long will this take", "effort estimate", "confidence interval", "story points", "t-shirt sizing". NOT for task decomposition, use task-decomposer.
version:: 1.1.1
category:: development
tags:: [estimation, pert, confidence-interval, planning]
difficulty:: intermediate
phase:: plan

Estimate Calibrator

Reference Files

Prerequisites

Work item description (feature, task, project)
Decomposed tasks (or use task-decomposer skill first)
Context: team familiarity, tech stack, existing codebase

Workflow

Phase 1: Decompose Work

If the work item is not already decomposed into atomic units:

Break into tasks — Each task should be estimable independently.
Right granularity — Tasks should be 1 hour to 3 days. Larger tasks have higher uncertainty; break them down further.
Identify dependencies — Tasks on the critical path determine the minimum duration.

Phase 2: Three-Point Estimate

For each task, estimate three scenarios:

Key rule: Worst case is NOT "everything goes wrong." It's the realistic bad scenario (90th percentile), not the apocalyptic one (99th percentile).

Phase 3: Identify Unknowns

Categorize unknowns that affect estimates:

Phase 4: Calculate Ranges

For individual tasks, use the PERT formula:

Expected = (Best + 4 × Likely + Worst) / 6
Std Dev = (Worst - Best) / 6

For aggregate (project) estimates:

Sum of expected values for total expected duration
Root sum of squares of std devs for aggregate uncertainty

Phase 5: Assign Confidence

Output Format

## Estimate: {Work Item}

### Summary
| Scenario | Duration |
|----------|----------|
| Best case | {time} |
| Likely case | {time} |
| Worst case | {time} |
| **PERT expected** | **{time}** |
| **Confidence** | **{High/Medium/Low}** |

### Task-Level Estimates

| # | Task | Best | Likely | Worst | PERT | Unknowns |
|---|------|------|--------|-------|------|----------|
| 1 | {task} | {time} | {time} | {time} | {time} | {key unknown or "None"} |
| 2 | {task} | {time} | {time} | {time} | {time} | {key unknown} |
| | **Total** | **{sum}** | **{sum}** | **{sum}** | **{pert}** | |

### Key Unknowns

| # | Unknown | Category | Impact on Estimate | Mitigation |
|---|---------|----------|-------------------|------------|
| 1 | {unknown} | {Technical/Scope/External} | +{time} if realized | {spike, prototype, early test} |

### Assumptions
- {Assumption 1 — what must be true for this estimate to hold}
- {Assumption 2}

### Risk Factors
- {Risk}: If realized, adds {time}. Likelihood: {High/Medium/Low}.

### Confidence Rationale
**{High/Medium/Low}** because:
- {Specific reason — e.g., "Team has built 3 similar features"}
- {Specific reason — e.g., "External API is a new integration"}

### Recommendation
{Commit to PERT expected with {X}% buffer, or spike the top unknown first.}

Calibration Rules

Three points, not one. Single-point estimates are always wrong. Three points communicate uncertainty — the most important part of any estimate.
Worst case is the 90th percentile, not the 99th. "Asteroid hits the office" is not a useful worst case. "The API documentation is wrong and we need to reverse-engineer the protocol" is realistic worst case.
Unknowns inflate estimates more than known difficulty. A hard but well-understood task is more predictable than an easy but novel one.
Estimates are not commitments. Communicate ranges, not deadlines. If stakeholders need a single number, give the PERT expected plus a buffer for confidence level.
Spike unknowns early. If a single unknown dominates the estimate range, invest 1-2 days spiking it before estimating the rest.

Error Handling

When NOT to Estimate

Push back if:

The work is exploratory (research, spikes) — timebox instead of estimating
Requirements are completely undefined — define scope first
The user wants precision (hours) for a large project — provide ranges, not false precision
The estimate will be used as a commitment without acknowledging uncertainty

Related Skills

mathews-tom/stacked-prs

testing

VerifiedTrustedCommunity

Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.

242SKILL.mdUpdated May 23, 2026

mathews-tom/stacked-prs

mathews-tom/project-context-setup

development

VerifiedTrustedCommunity

Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.

230SKILL.mdUpdated May 12, 2026

mathews-tom/project-context-setup

mathews-tom/task-decomposer

testing

VerifiedTrustedCommunity

Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.

230SKILL.mdUpdated Apr 6, 2026

mathews-tom/task-decomposer

mathews-tom/debug-investigator

development

VerifiedTrustedCommunity

Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.

230SKILL.mdUpdated Apr 6, 2026

mathews-tom/debug-investigator

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mathews-tom/armory.git

# Copy into Claude Code skills folder (global)
cp -r armory/skills/estimate-calibrator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mathews-tom/armory

221 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT