skills/interactive-estimate/SKILL.md
Produce data-driven software delivery estimates by analyzing historical JIRA tickets, git activity, and engineer track records, then matching the new work against the most similar past tickets. Use this skill whenever the user asks "how long will this take", wants to estimate a piece of work, scope an epic, plan a sprint, or estimate delivery for JIRA stories or a Figma design. Also use whenever the user wants developer-to-work assignment recommendations based on history, wants to optimize an estimate by adding or reallocating engineers, or asks "what's the fastest way to ship this" or "who should work on this". Especially trigger when the user provides JIRA ticket IDs, JIRA story links, or Figma designs together with any indication of a team that will execute the work.
npx skillsauth add stevefeldman/agents-skills software-estimationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill produces evidence-based delivery estimates by mining historical team performance from JIRA and git, matching the new work against similar past tickets, and recommending who should work on what. It also evaluates optimization scenarios — adding engineers, reallocating focus — to find faster delivery paths.
Estimates are always presented as ranges with explicit assumptions, never single-point predictions. Internal estimates are planning tools, not contracts.
Trigger this skill any time the user wants an estimate for new work, particularly when they provide:
…together with any indication of a team that will execute the work.
The skill runs in four phases:
The scope should be in the initial prompt — either JIRA ticket IDs/links or a Figma design (URL or attached screenshots). If scope is missing, ask for it before doing anything else; the skill cannot produce a useful estimate without knowing what's being built.
Once scope is present, collect the remaining inputs in one structured ask. Prefer ask_user_input_v0 for the multi-choice questions (work type, testing tier); for free-text fields, ask in a single numbered list and wait for a combined reply.
Required inputs:
Engineers in scope — comma-separated list of identifiers (JIRA usernames, GitHub handles, or display names). These are the people who will do the work and whose history feeds the estimate.
Historical window — how far back to analyze. Default: last 6 months. Shorter windows are noisier (small samples); longer windows pull in stale velocity from before tooling/process changes.
Repositories — list of repo names to include. Used both for historical analysis and for identifying which services the new work touches.
Engineer count — how many of the listed engineers are concurrently allocated to this work (integer ≤ length of the engineer list).
Work type — one of: new feature, refactor, bug, migration, mixed. Similar-ticket matching filters on this; velocity differs meaningfully across types and mixing them produces misleading averages.
Testing requirements — one of: unit only, unit + integration, + Playwright E2E, + BrowserStack matrix, + UAT with stakeholders. Each tier adds calendar overhead the estimate must account for.
After collecting inputs, echo them back as a confirmed summary before moving on. A 10-second confirmation prevents a regeneration.
Run these in parallel where possible.
The Atlassian tools are deferred — call tool_search with a query like "jira issues" to load them before use. Do not guess JQL parameter names; the tool schemas are the source of truth.
For each engineer in the input list, query their resolved tickets in the historical window:
assignee = "<user>" AND resolved >= -<N>d AND project in (<keys>) ORDER BY resolved DESC
Capture per ticket: key, type, components, story points (if any), created/resolved timestamps (for cycle time), summary, labels.
If JIRA project keys aren't obvious from the repo list, ask the user once for the list of project keys. Don't guess.
If repo access is available, capture per-engineer commit counts and PR cycle times per repo in the window. This is the strongest signal for repo familiarity, which drives developer-to-work matching.
If git access isn't available, fall back to JIRA components as a proxy for repo familiarity — in most teams components map roughly to services/repos.
For the new scope:
Then search JIRA for the 5–10 most similar resolved tickets in the historical window using JQL with matching components + work type + key terms. These are the strongest empirical signal for the estimate — they outweigh team-wide velocity averages because they capture the actual cost of work in this specific codebase.
Combine three signals in this order of weight:
Similar-ticket cycle time (highest weight). Take the median and p80 cycle time of the similar past tickets. If the new scope decomposes to roughly N tickets-equivalent, the base range is N × median (optimistic) to N × p80 (conservative). When decomposing the new scope into ticket-equivalents, lean on the granularity of the matched historical tickets — if past similar tickets averaged 2–3 days each, decompose the new scope to similarly-sized chunks.
Engineer throughput (sanity check). Compute average tickets-per-week or story-points-per-week for the assigned engineers in the window. Cross-check against the similar-ticket math: if they disagree by more than 2×, surface the disagreement in the report rather than averaging. The user needs to know which assumption is wrong.
Testing tier multiplier. Apply on top of the base:
These multipliers are rules of thumb. State them explicitly in the report so they can be adjusted.
Build a dependency graph:
The shape of this graph determines how the estimate scales with engineer count. Two engineers on one sequential chain is barely faster than one; two engineers on two parallel streams is roughly 2× faster minus coordination overhead.
By default, infer the "other side" of the scope and include it in the estimate:
Every inferred item must appear explicitly in the "Scope as understood" section of the report so the user can correct false inferences before treating the number as real.
For each work segment (usually one per service touched), rank the input engineers by a combined score:
Output a recommended assignment per segment with a one-line rationale grounded in the data. Good rationale looks like: "Alice → cart-service: 23 tickets resolved there in the window, median cycle time 1.8 days vs. team median 2.4 days." Bad rationale looks like: "Alice is a good fit." If you can't tie it to a number, don't include the assignment.
Starting from the base estimate with the user-specified engineer count, evaluate scenarios:
Surface only scenarios that yield ≥15% reduction off the base estimate. Drop scenarios where overhead cancels the parallelism gain. If no scenario qualifies, say so explicitly — "the current team composition is already near-optimal for this scope" — rather than padding the report with marginal options.
Cap headcount suggestions at the size of the input engineer list. If analysis suggests external expertise would unlock more parallelism (e.g., "this would benefit from someone with deep payment-service experience, but no one in the input list has it"), surface this as a separate team composition observation, not as a numeric scenario.
Produce a Markdown report with this structure (lead with the summary so the answer is on the first screen):
# Estimate: <scope summary>
## Summary
- **Base estimate:** <X–Y> engineer-weeks, <A–B> calendar weeks with <N> engineers
- **Confidence:** <Low / Medium / High> — based on <similar-ticket sample size, signal agreement, scope clarity>
- **Top risk:** <single biggest uncertainty>
## Scope as understood
- Explicit from input: <bulleted list>
- Inferred (please confirm or correct):
- <inferred item> — <why this was inferred>
## How this estimate was built
- Similar past tickets (n=<count>): <list with keys, cycle times, median, p80>
- Engineer throughput sanity check: <number> — <agrees with / disagrees with> similar-ticket math
- Testing tier multiplier: <value> for <tier>
- Parallelism: <serial / N parallel streams identified from <source>>
## Recommended assignments
- <Engineer> → <service/segment>: <data-grounded rationale>
- ...
## Optimization scenarios
- **Scenario 1:** <description> → <new estimate>, savings: <delta and %>
- **Scenario 2:** ...
- (If none qualify) No optimization scenarios yield ≥15% improvement with the current team. <Team composition observation, if any.>
## Assumptions
- <every assumption made: inferred scope items, multiplier values, parallelism interpretation, any signal that was missing>
development
Use when reviewing Dependabot alerts, npm audit findings, govulncheck output, or CVE reports on a JavaScript/Node.js or Go project — especially when triaging multiple alerts across direct and transitive dependencies to assess real-world risk and produce a remediation plan.
development
Use when a code review finding needs proof — write a focused test in JavaScript or Go that either confirms the issue is real or exposes it as over-engineering hyperbole. Trigger after code-review or code-review-skill findings are presented and evidence is requested.
tools
Use when auditing an existing test suite for quality and coverage gaps, evaluating Playwright migration readiness, scoring automation against a world-class e-commerce standard, or guiding the creation of new tests. Applicable to Selenium, WebdriverIO, and Playwright suites.
development
PR-focused code review with structured checklist, severity ratings, and actionable feedback