Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

coalesce-labs/feature-metrics

Name: feature-metrics
Author: coalesce-labs

plugins/pm/skills/feature-metrics/SKILL.md

npx skillsauth add coalesce-labs/catalyst feature-metrics

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/feature-metrics - Define Success Metrics

Select trustworthy metrics using the STEDII framework.

Context Routing Logic (Internal - for Claude)

Automatic Context Checks: When this skill is invoked, immediately check:

| Source | Files/Folders | Search Terms | What to Extract | | --------------- | ---------------------------------------------- | ------------------------------------- | ------------------------------------------ | | Current PRD | thoughts/shared/pm/prds/*.md | feature name from chat | Hypothesis, problem statement, user impact | | Business Info | thoughts/shared/pm/context/business-info-template.md | business model, growth stage, metrics | Product strategy, current North Star | | Metrics Context | thoughts/shared/pm/metrics/*.md | baseline numbers, historical data | Current metric baselines, ranges | | Strategy | thoughts/shared/pm/frameworks/*.md | feature related to strategic pillar | Strategic fit and expected outcomes | | Meetings | thoughts/shared/product/meeting-notes/*.md | feature name, "success metrics" | Stakeholder expectations, past decisions |

Context Priority:

Current PRD and feature context FIRST
Business model and strategy SECOND
Historical metrics and baselines THIRD
Stakeholder expectations FOURTH

Cross-Skill Links:

If feature is part of larger product strategy → Link to /write-prod-strategy
If testing this feature → Link to /experiment-decision and /experiment-metrics
If metric is North Star related → Link to /define-north-star
If sizing impact → Link to /impact-sizing for usage estimates
If tracking retention → Link to /retention-analysis for cohort analysis

When to Use

Defining success criteria for a new feature
Setting up an A/B test
Creating a PRD metrics section
Validating existing metrics

Step 0: Understanding Current State

Before we define metrics, let me check what context already exists...

Checking:

thoughts/shared/pm/prds/ for any existing PRD for this feature
thoughts/shared/pm/context/business-info-template.md for your product model
thoughts/shared/pm/metrics/ for historical baseline data
thoughts/shared/pm/frameworks/ for strategic context
thoughts/shared/product/meeting-notes/ for stakeholder expectations

[If feature PRD exists]: "I found your [Feature Name] PRD from [date]. It mentions [hypothesis/goal]. Let me use that as context."

[If metrics exist]: "I found historical data: [Metric] baselines are currently [values]. I'll use this as reference."

Based on what I find, I'll show you:

What We Know About This Feature

Strategic Context:

[How this feature fits into your Q# strategy / roadmap]
[Expected user impact: # of users affected]
[Business outcome: revenue/retention/engagement impact]

Current Baselines:

[Relevant historical metrics for comparison]
[Product stage: early-stage feature / mature feature / existing metric improvement]

Success Expectations:

[From stakeholder meetings: what they're expecting]
[From user research: what users need]
[From business model: what drives your North Star]

Questions to Clarify Before Selecting Metrics

Feature Scope: Is this a small UX improvement, new capability, or major feature overhaul?
User Segment: Who is this feature for? All users, specific segment, or internal teams?
Impact Type: Are we trying to drive growth, engagement, retention, monetization, or efficiency?
Experiment Timeline: How long can we run the test? (This affects which metrics we can use)
Business Context: What's more important right now - speed or certainty?

STEDII Framework

Every good metric should pass these 6 criteria:

S - Sensitive

Can the metric detect changes from your feature?

Will it move meaningfully with expected impact?
Is the sample size sufficient?

T - Timely

How quickly does the metric respond?

Can you measure it within your experiment window?
Leading indicators > lagging indicators

E - Easy to Understand

Can stakeholders interpret it?

Avoid complex calculations
Clear cause and effect

D - Directional

Is improvement clear?

Up = good or Down = good? Be explicit
Avoid metrics where direction is ambiguous

I - Implementable

Can you actually track it?

Data exists or can be collected
Engineering effort is reasonable

I - Independent

Does it avoid external factors?

Seasonality effects?
Other experiments running?

Quick Start Prompt

When PM types /feature-metrics, respond:

Let's define metrics for your feature. I'll use the STEDII framework.

Tell me:
1. What feature are we measuring?
2. What user behavior does it change?
3. What business outcome do we expect?

I'll help you select primary metrics, guardrails, and kill criteria.

Metric Types

Primary Metric

The one metric that defines success.

Directly tied to feature goal
Must pass all STEDII criteria
Single source of truth for go/no-go

Guardrail Metrics

Metrics that must NOT get worse.

Protect against unintended harm
Set acceptable ranges (not targets)
Examples: page load time, error rate, support tickets

Kill Criteria

When to stop the experiment early.

Serious negative impact threshold
Safety concerns
Automatic rollback triggers

Output Template

# Feature Metrics: [Feature Name]

## Primary Metric

**Metric:** [Name]
**Definition:** [Exactly how it's calculated]
**Current baseline:** [X]
**Target:** [Y] ([+/- Z%])
**Timeline:** [When we expect to see impact]

**STEDII Check:**

- [x] Sensitive - [why]
- [x] Timely - [why]
- [x] Easy to understand - [why]
- [x] Directional - [up/down = good]
- [x] Implementable - [data source]
- [x] Independent - [controls for]

## Guardrail Metrics

| Metric     | Acceptable Range | Why It Matters     |
| ---------- | ---------------- | ------------------ |
| [Metric 1] | [range]          | [protects against] |
| [Metric 2] | [range]          | [protects against] |

## Kill Criteria

If any of these occur, immediately rollback:

- [Metric] drops below [threshold]
- [Metric] increases above [threshold]
- [Qualitative signal] occurs

## Measurement Plan

- **Data source:** [where data comes from]
- **Tracking:** [how it's implemented]
- **Dashboard:** [where to monitor]
- **Review cadence:** [how often to check]

Common Metric Pairs

| Feature Type | Primary Metric | Common Guardrails | | ------------ | ------------------- | -------------------- | | Growth | Signups, Activation | Retention, Quality | | Engagement | DAU, Sessions | Load time, Errors | | Revenue | Conversion, ARPU | Refunds, Churn | | Retention | D7/D30 retention | NPS, Support tickets | | Efficiency | Task completion | Time on task, Errors |

Output Integration

Where Files Go

Feature metrics definitions:

Active work: Add to PRD in Strategic Fit section
When finalized: Reference in /experiment-decision for A/B testing approach
Archive: Store final metrics in thoughts/shared/pm/metrics/[feature-name]-baseline.md for historical reference

Link to Other Work

After defining metrics:

Reference in PRDs - "Success is defined as [primary metric] reaching [target] based on STEDII framework"
Use in experiments - Feature metrics become primary metric in /experiment-decision
Track progress - Monitor against baseline in weekly status updates
Feed retention analysis - If tracking retention, pass metric definitions to /retention-analysis

Cross-Skill Integration

Feeds into:

/experiment-decision - Primary metric determines test design and duration
/feature-results - Use these metrics to measure actual impact post-launch
/impact-sizing - Use guardrails to validate usage estimates
/metrics-framework - This metric may become a leading indicator for North Star

Pulls from:

/define-north-star - Ensure primary metric ladders up to North Star
/impact-sizing - Usage estimates inform what metrics can detect changes
[[business-info-template]] - Company metrics and baselines

Tips

One primary metric - Multiple "primary" metrics = no primary metric
Guardrails are not goals - You're not trying to improve them, just protect them
Leading > Lagging - Measure what you can act on quickly
Avoid vanity metrics - Page views don't matter if nobody converts
Baseline matters - Know your current numbers before running experiment
Time to signal - Faster metrics (hours/days) beat slow metrics (months)

Output Quality Self-Check

Before presenting output to the PM, verify:

[ ] File saved to correct location: Output saved to thoughts/shared/pm/metrics/feature-metrics-[feature-name]-[date].md
[ ] Context routing table was checked: Reviewed thoughts/shared/pm/prds/ for feature context, thoughts/shared/pm/context/business-info-template.md for North Star metric, and thoughts/shared/pm/metrics/ for existing dashboards and baselines
[ ] Metrics pass STEDII framework: Each proposed metric is evaluated against all 6 STEDII dimensions (Sensitive, Timely, Easy to understand, Directional, Implementable, Independent) with pass/fail reasoning
[ ] Primary metric has baseline and target: The primary metric includes a current baseline number and a specific target value with timeline (not "improve" or "increase")
[ ] Guardrail metrics defined: At least 1 guardrail metric is specified with an acceptable range and explanation of what it protects against
[ ] Metrics ladder to North Star: The output explicitly shows how the primary metric connects upward to the company's North Star metric from [[business-info-template]]
[ ] Data source identified for each metric: Every metric names where the data comes from (e.g., "PostHog event: task_created" or "database query on users table")
[ ] Metric sensitivity estimated: The output addresses whether the expected feature impact is large enough for the metric to detect, given current variance and traffic

coalesce-labs/feature-metrics

plugins/pm/skills/feature-metrics/SKILL.md

Define success metrics using the STEDII framework for trustworthy experiment metrics.

9 stars

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add coalesce-labs/catalyst feature-metrics

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 7:48 AM4.1s1 file scanned

SKILL.md

name:: feature-metrics
description:: Define success metrics using the STEDII framework for trustworthy experiment metrics.
disable-model-invocation:: false
user-invocable:: true

/feature-metrics - Define Success Metrics

Select trustworthy metrics using the STEDII framework.

Context Routing Logic (Internal - for Claude)

Automatic Context Checks: When this skill is invoked, immediately check:

Context Priority:

Current PRD and feature context FIRST
Business model and strategy SECOND
Historical metrics and baselines THIRD
Stakeholder expectations FOURTH

Cross-Skill Links:

If feature is part of larger product strategy → Link to /write-prod-strategy
If testing this feature → Link to /experiment-decision and /experiment-metrics
If metric is North Star related → Link to /define-north-star
If sizing impact → Link to /impact-sizing for usage estimates
If tracking retention → Link to /retention-analysis for cohort analysis

When to Use

Defining success criteria for a new feature
Setting up an A/B test
Creating a PRD metrics section
Validating existing metrics

Step 0: Understanding Current State

Before we define metrics, let me check what context already exists...

Checking:

thoughts/shared/pm/prds/ for any existing PRD for this feature
thoughts/shared/pm/context/business-info-template.md for your product model
thoughts/shared/pm/metrics/ for historical baseline data
thoughts/shared/pm/frameworks/ for strategic context
thoughts/shared/product/meeting-notes/ for stakeholder expectations

[If feature PRD exists]: "I found your [Feature Name] PRD from [date]. It mentions [hypothesis/goal]. Let me use that as context."

[If metrics exist]: "I found historical data: [Metric] baselines are currently [values]. I'll use this as reference."

Based on what I find, I'll show you:

What We Know About This Feature

Strategic Context:

[How this feature fits into your Q# strategy / roadmap]
[Expected user impact: # of users affected]
[Business outcome: revenue/retention/engagement impact]

Current Baselines:

[Relevant historical metrics for comparison]
[Product stage: early-stage feature / mature feature / existing metric improvement]

Success Expectations:

[From stakeholder meetings: what they're expecting]
[From user research: what users need]
[From business model: what drives your North Star]

Questions to Clarify Before Selecting Metrics

Feature Scope: Is this a small UX improvement, new capability, or major feature overhaul?
User Segment: Who is this feature for? All users, specific segment, or internal teams?
Impact Type: Are we trying to drive growth, engagement, retention, monetization, or efficiency?
Experiment Timeline: How long can we run the test? (This affects which metrics we can use)
Business Context: What's more important right now - speed or certainty?

STEDII Framework

Every good metric should pass these 6 criteria:

S - Sensitive

Can the metric detect changes from your feature?

Will it move meaningfully with expected impact?
Is the sample size sufficient?

T - Timely

How quickly does the metric respond?

Can you measure it within your experiment window?
Leading indicators > lagging indicators

E - Easy to Understand

Can stakeholders interpret it?

Avoid complex calculations
Clear cause and effect

D - Directional

Is improvement clear?

Up = good or Down = good? Be explicit
Avoid metrics where direction is ambiguous

I - Implementable

Can you actually track it?

Data exists or can be collected
Engineering effort is reasonable

I - Independent

Does it avoid external factors?

Seasonality effects?
Other experiments running?

Quick Start Prompt

When PM types /feature-metrics, respond:

Let's define metrics for your feature. I'll use the STEDII framework.

Tell me:
1. What feature are we measuring?
2. What user behavior does it change?
3. What business outcome do we expect?

I'll help you select primary metrics, guardrails, and kill criteria.

Metric Types

Primary Metric

The one metric that defines success.

Directly tied to feature goal
Must pass all STEDII criteria
Single source of truth for go/no-go

Guardrail Metrics

Metrics that must NOT get worse.

Protect against unintended harm
Set acceptable ranges (not targets)
Examples: page load time, error rate, support tickets

Kill Criteria

When to stop the experiment early.

Serious negative impact threshold
Safety concerns
Automatic rollback triggers

Output Template

# Feature Metrics: [Feature Name]

## Primary Metric

**Metric:** [Name]
**Definition:** [Exactly how it's calculated]
**Current baseline:** [X]
**Target:** [Y] ([+/- Z%])
**Timeline:** [When we expect to see impact]

**STEDII Check:**

- [x] Sensitive - [why]
- [x] Timely - [why]
- [x] Easy to understand - [why]
- [x] Directional - [up/down = good]
- [x] Implementable - [data source]
- [x] Independent - [controls for]

## Guardrail Metrics

| Metric     | Acceptable Range | Why It Matters     |
| ---------- | ---------------- | ------------------ |
| [Metric 1] | [range]          | [protects against] |
| [Metric 2] | [range]          | [protects against] |

## Kill Criteria

If any of these occur, immediately rollback:

- [Metric] drops below [threshold]
- [Metric] increases above [threshold]
- [Qualitative signal] occurs

## Measurement Plan

- **Data source:** [where data comes from]
- **Tracking:** [how it's implemented]
- **Dashboard:** [where to monitor]
- **Review cadence:** [how often to check]

Common Metric Pairs

Output Integration

Where Files Go

Feature metrics definitions:

Active work: Add to PRD in Strategic Fit section
When finalized: Reference in /experiment-decision for A/B testing approach
Archive: Store final metrics in thoughts/shared/pm/metrics/[feature-name]-baseline.md for historical reference

Link to Other Work

After defining metrics:

Reference in PRDs - "Success is defined as [primary metric] reaching [target] based on STEDII framework"
Use in experiments - Feature metrics become primary metric in /experiment-decision
Track progress - Monitor against baseline in weekly status updates
Feed retention analysis - If tracking retention, pass metric definitions to /retention-analysis

Cross-Skill Integration

Feeds into:

/experiment-decision - Primary metric determines test design and duration
/feature-results - Use these metrics to measure actual impact post-launch
/impact-sizing - Use guardrails to validate usage estimates
/metrics-framework - This metric may become a leading indicator for North Star

Pulls from:

/define-north-star - Ensure primary metric ladders up to North Star
/impact-sizing - Usage estimates inform what metrics can detect changes
[[business-info-template]] - Company metrics and baselines

Tips

One primary metric - Multiple "primary" metrics = no primary metric
Guardrails are not goals - You're not trying to improve them, just protect them
Leading > Lagging - Measure what you can act on quickly
Avoid vanity metrics - Page views don't matter if nobody converts
Baseline matters - Know your current numbers before running experiment
Time to signal - Faster metrics (hours/days) beat slow metrics (months)

Output Quality Self-Check

Before presenting output to the PM, verify:

[ ] File saved to correct location: Output saved to thoughts/shared/pm/metrics/feature-metrics-[feature-name]-[date].md
[ ] Context routing table was checked: Reviewed thoughts/shared/pm/prds/ for feature context, thoughts/shared/pm/context/business-info-template.md for North Star metric, and thoughts/shared/pm/metrics/ for existing dashboards and baselines
[ ] Metrics pass STEDII framework: Each proposed metric is evaluated against all 6 STEDII dimensions (Sensitive, Timely, Easy to understand, Directional, Implementable, Independent) with pass/fail reasoning
[ ] Primary metric has baseline and target: The primary metric includes a current baseline number and a specific target value with timeline (not "improve" or "increase")
[ ] Guardrail metrics defined: At least 1 guardrail metric is specified with an acceptable range and explanation of what it protects against
[ ] Metrics ladder to North Star: The output explicitly shows how the primary metric connects upward to the company's North Star metric from [[business-info-template]]
[ ] Data source identified for each metric: Every metric names where the data comes from (e.g., "PostHog event: task_created" or "database query on users table")
[ ] Metric sensitivity estimated: The output addresses whether the expected feature impact is large enough for the metric to detect, given current variance and traffic

Related Skills

coalesce-labs/migrate-dual-harness

development

VerifiedTrustedCommunity

Migrate a single-harness repo to the dual-harness layout so both Claude Code and Codex load the same instructions and skills — AGENTS.md as the portable canonical doc, a thin CLAUDE.md `@AGENTS.md` bridge, and a `.agents/skills` dir with a `.claude/skills` symlink onto it. Use when asked to migrate to dual-harness, make this repo work in both Claude and Codex, or for agent metadata cleanup.

17SKILL.mdUpdated Jul 27, 2026

coalesce-labs/migrate-dual-harness

coalesce-labs/recovery-pass

tools

VerifiedTrustedCommunity

Goal-driven senior-engineer pipeline-unstick sweep (CTL-1176 rung 3). Given the stuck/failed/needs-human set (or ONE ticket handed by the recovery router), its GOAL is to get the pipeline MOVING again — not to fix one ticket's review findings (that is phase-remediate). It runs AFTER the eyes (diagnostician evidence) and the hands (deterministic unstuck-sweep seams) have already tried, and it CONSUMES their output from a recovery-pass.json brief rather than re-diagnosing or redoing their narrow work. It acts like a senior engineer with full tool access — it resolves merge conflicts, rebases, force-pushes, merges green PRs, and re-dispatches stalled phases AUTONOMOUSLY — and escalates to the operator ONLY for a genuine value judgment / something that degrades other functionality / a real cost-benefit trade-off / a serious architecture change / an ADR conflict. On escalation it AUTHORS the operator inbox row + the push notification (executive-voiced). Dispatched as a `claude --bg` job by phase-agent-dispatch via slash command, AND invocable bare by the operator as a sweep — hence `user-invocable: true`. Ships behind CATALYST_RECOVERY_PASS (off by default — no live behavior change until shadow/enforce).

17SKILL.mdUpdated Jun 18, 2026

coalesce-labs/recovery-pass

coalesce-labs/setup-catalyst

tools

VerifiedTrustedCommunity

Diagnose and fix Catalyst setup issues. Validates tools, database, config, OTel, direnv, and thoughts. Automatically fixes what it can — creates directories, initializes the database, sets WAL mode, runs migrations. Use for new installs, upgrades, or when something isn't working.

17SKILL.mdUpdated Jun 7, 2026

coalesce-labs/setup-catalyst

coalesce-labs/plugins/dev/skills/phase-triage

tools

VerifiedTrustedCommunity

--- name: phase-triage description: Phase agent that triages a Linear ticket — expands acronyms, classifies (feature/bug/docs/refactor/chore), identifies genuine blockers (a semantic second-pass over the backlog — NOT a prose scrape; CTL-838), estimates scope, writes triage.json, and posts a triage analysis comment to Linear. Triage completion is signaled by that comment plus the local triage.json — there is no `triaged` label. Emits phase.triage.complete.<TICKET> on success and phase.triage.fai

17SKILL.mdUpdated May 18, 2026

coalesce-labs/plugins/dev/skills/phase-triage

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/coalesce-labs/catalyst.git

# Copy into Claude Code skills folder (global)
cp -r catalyst/plugins/pm/skills/feature-metrics ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

coalesce-labs/catalyst

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT