skills/estimate/codex/SKILL.md
Codex skill for running agent-estimate CLI commands (estimate, validate, calibrate).
npx skillsauth add haoranc/agent-estimate estimateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when a user asks to estimate AI-agent effort, compare agent time with human time, validate an estimate against observed work, or recalibrate local model factors.
The skill is command-first: execute the agent-estimate CLI and return its
output. Do not invent computed values.
| User intent | Command |
| --- | --- |
| Estimate one task | agent-estimate estimate "<task>" |
| Estimate tasks from a file | agent-estimate estimate --file <path> |
| Estimate GitHub issues | agent-estimate estimate --issues <nums> --repo <owner/name> |
| Validate estimate vs actuals | agent-estimate validate <observation.yaml> |
| Recompute calibration summary | agent-estimate calibrate |
Accept exactly one input source:
--file <path>--issues <nums> with --repo <owner/name>If the input source is missing or ambiguous, ask for the missing piece.
--config <path> - custom agent fleet config.--format markdown|json - output format.--review-mode none|standard|complex|3-round - additive review tier:
none: +0mstandard: +15mcomplex: +25m3-round: +35m--type coding|brainstorm|research|config|documentation|frontend|app_dev.--spec-clarity <0.3..1.3>.--warm-context <0.3..1.15>.--agent-fit <0.9..1.2>.--title <text>.--verbose.When --type is omitted, the CLI auto-detects the category. Research-grounded
brainstorms with citation, OSS, benchmark, source, or landscape signals route to
the research band instead of the flat brainstorm band.
coding: default tiered PERT model for feature work, bug fixes, tests, and refactors.brainstorm: pure ideation and design exploration.research: audits, investigations, OSS comparisons, citation/source-grounded work.config: deploys, infra, CI/CD, runbooks, monitoring, and SRE changes.documentation: API docs, guides, README changes, changelogs.frontend: UI/page work. Content patches use 15/25/40; page builds use 40/60/90.app_dev: app shells and desktop/mobile builds. Uses a cold generic L-style prior; use modifiers for warm or highly specified work.Current threshold keys include:
opus_4_x, opus_4_7, opus_4_6gpt_5_5, gpt_5_4gemini_3_1_prosonnet_4_6haiku_4_5Legacy keys such as opus, gpt_5, gpt_5_2, gpt_5_3,
gemini_3_pro, and sonnet remain accepted.
agent-estimate binary.python -m agent_estimate.cli.app.--format json as a normal success path.agent-estimate estimate "Add login button with OAuth"
agent-estimate estimate "Audit dependencies for known CVEs" --type research
agent-estimate estimate "Build a landing page" --type frontend
agent-estimate estimate "Build an Electron app shell" --type app_dev --spec-clarity 0.3 --warm-context 0.3
agent-estimate estimate --file tasks.md
agent-estimate estimate --issues 1,2,3 --repo org/name
agent-estimate estimate --review-mode 3-round "Refactor auth module"
agent-estimate validate observation.yaml --db ~/.agent-estimate/calibration.db
agent-estimate calibrate --db ~/.agent-estimate/calibration.db
task_type: frontend
estimated_minutes: 60.0
actual_work_minutes: 52.0
actual_total_minutes: 87.0
file_count: 4
line_count: 180
test_count: 3
execution_mode: single
review_mode: 3-round
review_overhead_minutes: 35.0
modifiers:
spec_clarity: 1.0
warm_context: 0.9
agent-estimate installed: pip install agent-estimate or
pip install -e '.[dev]' in this repo.default_agents.yaml.validate and calibrate should tune
them against local SQLite history over time.data-ai
Estimate effort for AI coding agent tasks using PERT three-point estimation with METR reliability thresholds and wave planning.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------