skills/feature-flags/SKILL.md
Feature flag design, lifecycle management, and progressive rollout strategies. Use when: adding feature toggles to gate new functionality, implementing gradual rollouts, designing A/B experiments, managing flag lifecycle and cleanup, auditing an existing codebase for stale flags, or choosing between flag implementations. Covers flag types, evaluation strategies, rollout patterns, flag hygiene, and testing with flags.
npx skillsauth add michaelsvanbeek/personal-agent-skills feature-flagsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every flag has a planned removal date. Flags that live forever become tech debt — invisible branching logic that no one understands. Ship the feature, verify it works, remove the flag.
| Type | Lifespan | Example | Owner | |------|----------|---------|-------| | Release | Days to weeks | Gate new checkout flow for gradual rollout | Engineering | | Experiment | Weeks to months | A/B test pricing page layout | Product | | Ops | Permanent (rare) | Kill switch for expensive feature under load | Engineering | | Permission | Permanent | Feature only for enterprise tier customers | Product |
interface FeatureFlag {
key: string; // Unique identifier: "checkout-v2"
description: string; // What this flag controls
type: "release" | "experiment" | "ops" | "permission";
owner: string; // Team or person responsible
createdAt: string; // ISO date
plannedRemovalDate: string; // When to clean up (required for release/experiment)
defaultValue: boolean; // Value when flag system is unavailable
rules: FlagRule[]; // Evaluation rules (ordered, first match wins)
}
interface FlagRule {
condition: FlagCondition;
value: boolean | string; // Boolean for toggles, string for variants
}
type FlagCondition =
| { type: "percentage"; value: number } // 0-100 rollout percentage
| { type: "user_ids"; value: string[] } // Specific users
| { type: "attribute"; key: string; value: string } // User attribute match
| { type: "environment"; value: string } // "production", "staging"
| { type: "default" };
{
"checkout-v2": {
"description": "New checkout flow with saved payment methods",
"type": "release",
"owner": "payments-team",
"createdAt": "2025-01-15",
"plannedRemovalDate": "2025-02-15",
"defaultValue": false,
"rules": [
{ "condition": { "type": "environment", "value": "staging" }, "value": true },
{ "condition": { "type": "user_ids", "value": ["internal-tester-1", "internal-tester-2"] }, "value": true },
{ "condition": { "type": "percentage", "value": 10 }, "value": true },
{ "condition": { "type": "default" }, "value": false }
]
}
}
import hashlib
from typing import Any
def evaluate_flag(flag_key: str, user_id: str, attributes: dict[str, Any]) -> bool:
"""Evaluate a feature flag for a specific user."""
flag = get_flag_config(flag_key)
if flag is None:
return False # Flag not found — default off
for rule in flag["rules"]:
condition = rule["condition"]
if condition["type"] == "environment":
if get_environment() == condition["value"]:
return rule["value"]
elif condition["type"] == "user_ids":
if user_id in condition["value"]:
return rule["value"]
elif condition["type"] == "attribute":
if attributes.get(condition["key"]) == condition["value"]:
return rule["value"]
elif condition["type"] == "percentage":
# Deterministic: same user always gets same result for same flag
hash_input = f"{flag_key}:{user_id}"
hash_value = int(hashlib.sha256(hash_input.encode()).hexdigest()[:8], 16)
bucket = hash_value % 100
if bucket < condition["value"]:
return rule["value"]
elif condition["type"] == "default":
return rule["value"]
return flag["defaultValue"]
import { createContext, useContext } from "react";
import type { ReactNode } from "react";
interface FlagContextValue {
flags: Record<string, boolean>;
}
const FlagContext = createContext<FlagContextValue>({ flags: {} });
function FlagProvider({ children, flags }: { children: ReactNode; flags: Record<string, boolean> }) {
return <FlagContext.Provider value={{ flags }}>{children}</FlagContext.Provider>;
}
function useFlag(key: string): boolean {
const { flags } = useContext(FlagContext);
return flags[key] ?? false;
}
// Usage in component
function CheckoutPage() {
const useNewCheckout = useFlag("checkout-v2");
if (useNewCheckout) {
return <NewCheckoutFlow />;
}
return <LegacyCheckoutFlow />;
}
Day 1: 0% → 5% (internal team + canary users)
Day 2: 5% → 10% (monitor error rates, latency)
Day 3: 10% → 25% (monitor business metrics)
Day 5: 25% → 50% (compare A/B if experiment)
Day 7: 50% → 100% (full rollout)
Day 14: Remove flag (cleanup)
At each stage, verify:
If metrics degrade:
# Instant rollback: set percentage to 0
update_flag("checkout-v2", rules=[
{"condition": {"type": "default"}, "value": False},
])
Rollback should take seconds, not minutes. This is the primary advantage of feature flags over code rollback.
// flags.json — committed, deployed with application
{
"checkout-v2": true,
"new-search": false
}
Pros: Simple, version-controlled, no external dependency. Cons: Requires deploy to change. No gradual rollout.
import boto3
import json
ssm = boto3.client("ssm")
def get_flags() -> dict[str, Any]:
"""Load flags from SSM Parameter Store."""
response = ssm.get_parameter(
Name="/myapp/prod/feature-flags",
WithDecryption=False,
)
return json.loads(response["Parameter"]["Value"])
Pros: Change flags without deploying. Cheap. Simple. Cons: No built-in UI, percentage rollout requires custom logic.
Use when:
import pytest
@pytest.mark.parametrize("flag_value", [True, False])
def test_checkout_flow(flag_value: bool, monkeypatch: pytest.MonkeyPatch) -> None:
"""Test both checkout flows regardless of current flag state."""
monkeypatch.setattr("features.evaluate_flag", lambda key, **kw: flag_value)
result = process_checkout(user_id="test-user", cart=sample_cart)
if flag_value:
assert result.flow == "v2"
else:
assert result.flow == "legacy"
@pytest.fixture
def flags_all_on() -> dict[str, bool]:
"""Enable all feature flags for integration testing."""
return {"checkout-v2": True, "new-search": True}
@pytest.fixture
def flags_all_off() -> dict[str, bool]:
"""Disable all feature flags for integration testing."""
return {"checkout-v2": False, "new-search": False}
E2E tests should test the current production state of flags. Don't override flags in E2E — if a flag is on in production, test with it on.
Remove a flag when:
if flag("checkout-v2") with the winning branch.# Find flag references in code
grep -r "checkout-v2" --include="*.py" --include="*.ts" --include="*.tsx" src/
# List flags with past planned removal dates
jq 'to_entries[] | select(.value.plannedRemovalDate < "2025-01-15") | .key' flags.json
<domain>-<feature>[-<variant>]
| Example | Type | Description |
|---------|------|-------------|
| checkout-v2 | Release | New checkout flow |
| search-fuzzy-matching | Release | Enable fuzzy search |
| pricing-annual-discount | Experiment | Test annual plan discount |
| api-rate-limit-strict | Ops | Tighter rate limits under load |
| plan-enterprise-analytics | Permission | Analytics for enterprise tier |
Rules:
flag-1, test, new-feature).| Anti-Pattern | Problem | Fix |
|-------------|---------|-----|
| Flag with no planned removal date | Lives forever, becomes invisible tech debt | Require removal date on creation for release/experiment flags |
| Nested flags (if flag_a and flag_b) | Combinatorial explosion of states to test | Keep flags independent; one flag per feature |
| Flag evaluated differently in two places | Inconsistent behavior for the same user | Centralize evaluation; pass result as parameter |
| Client-side flag evaluation with rules | Exposes targeting logic and user segments | Server evaluates; client receives boolean result |
| Using flags instead of authorization | Flag system becomes a permissions system | Use proper RBAC/ABAC for entitlements |
| No monitoring during rollout | Degradation goes unnoticed until 100% | Define monitoring criteria before enabling |
| Random-based percentage rollout | Same user gets different experience on each request | Use hash-based deterministic bucketing |
| Flag config not version-controlled | No history of who changed what, when | Store in git or use a flag service with audit log |
| Monster flag guarding 500 lines | Too much code behind one toggle | Break into smaller flags or defer merge until feature is complete |
When auditing an existing codebase for feature flag practices:
development
TypeScript coding standards and type safety conventions. Use when: creating TypeScript files, defining interfaces and types, writing type-safe code, reviewing TypeScript for type correctness, auditing a codebase for type safety gaps, eliminating any or ts-ignore usage, or improving strict-mode compliance. Covers strict typing, avoiding any and ts-ignore, discriminated unions, Zod runtime validation, immutability patterns, and proper type definitions.
testing
Writing clear, actionable tickets in any issue tracker (Jira, Linear, GitHub Issues, ServiceNow, etc.). Use when: creating epics, stories, tasks, bugs, or spikes; writing acceptance criteria; decomposing work for a sprint; linking dependencies between tickets; auditing backlog items for clarity; or coaching a team on ticket quality. Covers title conventions, description templates, acceptance criteria, decomposition rules, dependency linking, and org-specific pluggable configuration.
development
Testing strategy, patterns, and evaluation for software and LLM/AI systems. Use when: writing tests, choosing test boundaries, designing test data, structuring test suites, evaluating LLM outputs, building evaluation pipelines, setting coverage thresholds, auditing test coverage gaps in existing projects, or improving test quality and structure.
development
Writing effective status updates for different audiences and cadences. Use when: writing a weekly status update, preparing a monthly summary, drafting a quarterly review, sending updates to leadership, sharing progress with stakeholders, or improving the clarity and impact of team communications. Covers weekly, monthly, and quarterly formats tailored for upward, lateral, and downward communication.