engineering/skills/feature-flags-architect/SKILL.md
Use when adding, retiring, or auditing feature flags. Triggers on "add a flag", "ship behind a flag", "rollout plan", "kill switch", "stale flags", "flag debt", "LaunchDarkly", "GrowthBook", "Statsig", "Unleash", "Flipt", or any progressive-delivery question. Ships flag debt scanner, rollout planner, and kill-switch auditor (all stdlib Python), 4 references on flag taxonomy + provider trade-offs + rollout strategies + lifecycle, plus a /flag-cleanup slash command.
npx skillsauth add alirezarezvani/claude-skills feature-flags-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
End-to-end discipline for feature flags: classify them, ship them, ramp them, and retire them. Most teams treat flags as throwaway if-statements; this skill treats them as a controlled lifecycle with measurable debt.
ifrequest → design → ship → ramp → cleanup → archive
Flags that skip cleanup become debt: dead branches, stale defaults, untested code paths, unbounded blast radius. The three scripts in this skill enforce the lifecycle.
# 1. Audit the repo for flag debt
python scripts/flag_debt_scanner.py --repo . --max-age-days 90
# 2. Plan a progressive rollout for a new flag
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
# 3. Verify every flag has a documented kill switch
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
Different flag types have different lifespans and ownership. Misclassifying creates debt.
| Type | Purpose | Typical lifespan | Owner | Cleanup trigger | |---|---|---|---|---| | Release | Hide unfinished features in production | days–weeks | Eng | 100% rollout reached | | Experiment | A/B test variants | weeks | Product/Marketing | Test concluded; winner picked | | Operational | Circuit breakers, perf toggles, kill switches | months–years | Eng/SRE | Replaced by autoscaling/feature retirement | | Permission | Entitlements per user/account/plan | years (permanent) | Product | Plan/role removed |
Only Release and Experiment flags should be on a debt-scanner watchlist. Operational and Permission flags are by design long-lived. See references/flag_taxonomy.md for decision tree.
All three are stdlib-only. Run with --help.
flag_debt_scanner.pyFinds flags older than --max-age-days with low usage, suggesting candidates for cleanup.
python scripts/flag_debt_scanner.py --repo . --max-age-days 90 --format text
python scripts/flag_debt_scanner.py --repo . --max-age-days 60 --format json > debt.json
Detection heuristic:
--repo for code references matching common flag-call patterns:
flag("..."), isFlagEnabled("..."), featureFlag("..."), getFlag("...")client.variation("...", ...), unleash.isEnabled("..."), growthbook.feature("...")git log --diff-filter=A -S <name>).--max-age-days ago AND used in ≤--min-uses places.Outputs flag name, age in days, file references, suggested action. JSON mode is CI-friendly.
rollout_planner.pyGenerates a phased rollout schedule from population size, target percent, duration, and strategy.
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 50000 --target-percent 25 --duration-days 7 --strategy linear
python scripts/rollout_planner.py --population 1000000 --target-percent 100 --duration-days 30 --strategy log
Strategies:
ring: 1% → 5% → 25% → 50% → 100%, evenly spaced. Default for risky launches.linear: constant rate per day. Default for medium-risk.log: rapid early, slow tail. Default for low-risk launches with confidence.cohort: by named cohort (internal → beta → free → paid → all).Outputs a markdown table with date, percent, expected user count, abort criteria, and verification step per phase.
kill_switch_audit.pyCross-references code-discovered flags against documentation to verify each has a kill switch path written down.
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
python scripts/kill_switch_audit.py --repo . --flag-doc runbooks/flags.md --format json
What it checks:
--flag-docUse as a pre-merge gate before any new flag ships.
| Provider | Best for | Pricing model | Lock-in risk | OSS option | |---|---|---|---|---| | LaunchDarkly | Enterprise, complex targeting, audit/compliance | Per-MAU, expensive | High | No | | GrowthBook | Mid-market, A/B testing focused, OSS-friendly | Per-MAU + OSS | Low | Yes (self-host) | | Statsig | Growth/product teams, advanced experimentation | Free tier + per-MAU | Medium | No | | Unleash | OSS-first, self-hosted, dev-friendly | OSS + Enterprise | Low | Yes | | Flipt | Lightweight, k8s-native, simple needs | OSS-only | None | Yes | | DIY | <100 flags, no targeting, full control | None | None | N/A |
Decision rules:
references/provider_comparison.md for detail.1. Classify: which of the 4 flag types?
→ Release (most common for engineering work)
2. Run rollout_planner.py to design the ramp
3. Add flag entry to docs/feature-flags.md BEFORE writing code:
- name, owner, type, kill-switch trigger, dashboard URL
4. Write the code with the flag
5. Run kill_switch_audit.py — must pass before merge
6. Deploy at 0%; verify kill switch works
7. Execute rollout schedule; abort if abort criteria met
8. At 100% for 7+ days: remove flag, delete dead branch, archive doc entry
1. Run flag_debt_scanner.py --repo . --max-age-days 90 > debt.md
2. For each flagged item:
a. Confirm it reached 100% (or was killed)
b. Find the issue/PR that introduced it; verify owner agrees to remove
c. Delete dead branches; remove flag config
d. Run kill_switch_audit.py — should now show one fewer flag
3. Update CHANGELOG: "Removed N stale flags"
1. Estimate flag count (current + 12-month projection)
2. Required features:
- Targeting rules (user, account, geo, %)?
- A/B testing + stats?
- Audit log / SOC2?
- Self-hosting / data residency?
3. Pricing budget (MAU * cost-per-MAU)
4. See provider_comparison.md decision tree
5. Build a 30-day proof-of-concept before signing
1. Identify the failure modes:
- Latency spike (which threshold?)
- Error rate spike (which threshold?)
- Business metric regression (which threshold?)
2. Wire each to an abort:
- Manual: dashboard link + on-call playbook
- Automated: alert threshold flips flag back to 0%
3. Test the kill switch in staging BEFORE production rollout
4. Document in flag-doc; pass kill_switch_audit.py
references/flag_taxonomy.md — 4 types, decision tree, ownership, lifespanreferences/provider_comparison.md — LaunchDarkly / GrowthBook / Statsig / Unleash / Flipt / DIY trade-offsreferences/rollout_strategies.md — ring / linear / log / cohort / geo, abort criteria, monitoringreferences/flag_lifecycle.md — request → design → ship → ramp → cleanup → archive/flag-cleanup — Run the full cleanup workflow on the current repo: scan for debt, generate a removal plan, audit kill switches.
assets/flag_request_template.md — fill-in form for new flag requests (name, owner, type, kill switch, rollout plan)if (FLAG_FOO) 50 places — should be a Permission flag with a runtime config, not a Release flagA team using this skill should achieve:
kill_switch_audit.py at merge timeflag_debt_scanner.py --max-age-days 90 returns ≤5 stale flags repo-widetools
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin, C#, .NET, Java, C, C++, Rust, Ruby, PHP, and Dart/Flutter. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.
tools
Use when planning, funding, scoping, or synthesizing enterprise research across workstreams — clinical study design, R&D program finance, market sizing/surveys, or product/user research. Triggers on "design this clinical study", "what sample size", "R&D budget", "burn rate", "capitalize or expense", "TAM SAM SOM", "market sizing", "survey design", "segment the market", "plan user interviews", "usability test", "synthesize research insights". Forks context to route to one of four Research-Operations sub-skills (clinical-research, research-finance, market-research, product-research) and returns a digest. Distinct from ra-qm-team (regulatory submission), finance (corporate close/valuation), research/grants (funding discovery), product-team (persona/journey/live experiments), and marketing-skill (campaign analytics).
development
Use when managing the money for an internal R&D program or portfolio — building a multi-period program budget with the F&A (indirect) split, tracking burn rate and runway against value-inflection milestones, or routing R&D cost items to a capitalize-vs-expense determination. Every budget output surfaces its assumptions block; capitalize-vs-expense is decision-support only and routes to a named finance owner — it never books an entry or decides accounting treatment. Distinct from finance/financial-analysis (corporate DCF, close, valuation) and research/grants (funding discovery — this manages money already won).
development
Use when planning and synthesizing product/user research as a method-and-repository discipline — selecting the right method for the goal (generative interviews vs usability test vs concept test vs validation), computing method-based saturation/sample size with an explicit confidence level, or synthesizing coded observations into insights while flagging single-source anecdotes. Never fabricates user insight; an insight requires recurrence across independent participants. Distinct from product-team/ux-researcher-designer (persona/journey artifacts), product-discovery (discovery-sprint planning), and experiment-designer (live A/B) — this is the research-ops method + insight-repository layer.