engineering/kubernetes-operator/skills/kubernetes-operator/SKILL.md
Use when building a Kubernetes Operator — custom controllers that reconcile CRD state. Triggers on "build an operator", "CRD design", "reconcile loop", "controller-runtime", "kubebuilder", "operator-sdk", "metacontroller", "KOPF", "operator capability levels", or "custom resource". Ships CRD validator, reconcile-loop linter, and OperatorHub capability auditor (all stdlib Python), 4 references on the operator pattern + CRD design + reconcile patterns + tooling landscape, and a /operator-audit slash command. NOT a generic k8s skill — specifically the Operator pattern.
npx skillsauth add alirezarezvani/claude-skills kubernetes-operatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build operators that reconcile correctly. Most operator bugs are not Kubernetes bugs — they are reconcile-loop bugs: missing finalizers, blocking calls, no requeue on transient errors, status drift, RBAC over-grants. This skill catches them deterministically before they reach a cluster.
helm-chart-buildersenior-devopscloud-securityobserve(actual) → desired = read(spec) → diff(actual, desired) → act → update(status)
↓
requeue / done
Operators that fail are the ones that:
The 3 tools below catch each of these.
SKILL=engineering/kubernetes-operator/skills/kubernetes-operator
# Validate a CRD design
python "$SKILL/scripts/crd_validator.py" --crd config/crd/myapp.yaml
# Lint a Go reconcile function
python "$SKILL/scripts/reconcile_lint.py" --controller controllers/myapp_controller.go
# Score against OperatorHub Capability Levels (1-5)
python "$SKILL/scripts/operator_capability_audit.py" --operator-dir .
All stdlib-only. Run with --help.
crd_validator.pyValidates a CRD YAML against operator-pattern best practices.
python scripts/crd_validator.py --crd config/crd/myapp.yaml
python scripts/crd_validator.py --crd config/crd/ --format json
Checks:
spec.versions[*].subresources.status is set (status subresource)spec.scope is Namespaced (not Cluster) unless explicitly justifiedspec.versions[*].schema.openAPIV3Schema has type definitions (no x-kubernetes-preserve-unknown-fields: true at top level)served: true AND storage: truemetav1.Conditions)Age and Status/Phasereconcile_lint.pyLints a Go controller reconcile function for anti-patterns.
python scripts/reconcile_lint.py --controller controllers/myapp_controller.go
Checks (regex-based heuristics):
(ctrl.Result, error) shapereturn ctrl.Result{Requeue: true}, err)client.Update() on the spec object is flagged (controllers should update only status)time.Sleep inside reconcile is flagged (use RequeueAfter)defer after a finalizer addIsConditionTrue / SetCondition calls when conditions present in CRDoperator_capability_audit.pyScores an operator against OperatorHub's 5 Capability Levels.
python scripts/operator_capability_audit.py --operator-dir .
Levels:
Reports current level + concrete next steps to advance one level.
Pick a framework based on language and complexity. See references/tooling_landscape.md.
| Framework | Language | Best for | Maintenance | |---|---|---|---| | controller-runtime | Go | Production-grade, low-level control | Active (sig-api-machinery) | | kubebuilder | Go | Standard scaffolding, opinionated | Active (Kubernetes SIGs) | | operator-sdk | Go / Helm / Ansible | OpenShift / mixed-paradigm teams | Active (Red Hat) | | metacontroller | Any (webhook-based) | Polyglot teams, avoiding Go | Less active | | KOPF | Python | Python shops, async-first | Active (community) | | java-operator-sdk | Java | JVM shops | Active (Red Hat / Java SIG) |
Decision rules:
See references/crd_design.md for full detail. Quick rules:
Ready, Reconciling, Degraded. Each carries a reason and message.v1alpha1 → v1beta1 → v1. Plan a conversion webhook.additionalPrinterColumns for kubectl get. Show Age, Phase, Ready at minimum.See references/reconcile_loop.md for full detail. Quick rules:
ctrl.Result{RequeueAfter: ...} for known transient cases.time.Sleep. No long HTTP calls without context.1. Pick a Group/Version/Kind: e.g., apps.example.com/v1alpha1, kind=MyApp
2. kubebuilder init --domain example.com --repo github.com/org/myapp-operator
3. kubebuilder create api --group apps --version v1alpha1 --kind MyApp
4. Run crd_validator.py on config/crd/bases/apps.example.com_myapps.yaml
→ Fix every WARN before writing controller code
5. Implement the reconcile function (Karpathy principle 2: simplest correct version first)
6. Run reconcile_lint.py on controllers/myapp_controller.go
7. Run operator_capability_audit.py --operator-dir . — confirm L1
8. Test in a kind cluster: kubectl apply -f config/samples/
9. Add status conditions; aim for L2 in the same PR
1. Run operator_capability_audit.py --operator-dir <path>
2. Run crd_validator.py --crd config/crd/
3. Run reconcile_lint.py --controller controllers/
4. Triage findings:
- FAIL → block release; fix before next deploy
- WARN → file an issue; fix in next 30 days
5. Document current capability level in README; commit
6. Plan one capability level advancement per quarter
1. Identify primary language constraint (team skill)
2. Identify deployment target (vanilla k8s vs OpenShift)
3. Identify operator complexity (single CRD vs multi-CRD vs cluster-wide)
4. Cross-reference with references/tooling_landscape.md
5. Build a 1-week proof-of-concept before committing
references/operator_pattern.md — what an operator IS, when to use vs alternativesreferences/crd_design.md — CRD design principles, versioning, conversion webhooksreferences/reconcile_loop.md — reconcile patterns, error handling, idempotencyreferences/tooling_landscape.md — framework comparison + decision tree/operator-audit — Run all 3 tools on an operator repo and produce a markdown report.
assets/crd_template.yaml — CRD with status subresource, conditions, finalizer hint, printer columnsassets/reconcile_skeleton.go — Go controller reconcile function with idempotency, conditions, finalizers, requeue patternstime.Sleep(30 * time.Second) inside reconcile — block other reconciles. Use RequeueAfter.r.Client.Update(ctx, obj) to set status — use r.Status().Update(ctx, obj) instead.x-kubernetes-preserve-unknown-fields: true on spec root — defeats validation.A team using this skill should achieve:
crd_validator.py before mergereconcile_lint.py strict modetools
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin, C#, .NET, Java, C, C++, Rust, Ruby, PHP, and Dart/Flutter. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.
tools
Use when planning, funding, scoping, or synthesizing enterprise research across workstreams — clinical study design, R&D program finance, market sizing/surveys, or product/user research. Triggers on "design this clinical study", "what sample size", "R&D budget", "burn rate", "capitalize or expense", "TAM SAM SOM", "market sizing", "survey design", "segment the market", "plan user interviews", "usability test", "synthesize research insights". Forks context to route to one of four Research-Operations sub-skills (clinical-research, research-finance, market-research, product-research) and returns a digest. Distinct from ra-qm-team (regulatory submission), finance (corporate close/valuation), research/grants (funding discovery), product-team (persona/journey/live experiments), and marketing-skill (campaign analytics).
development
Use when managing the money for an internal R&D program or portfolio — building a multi-period program budget with the F&A (indirect) split, tracking burn rate and runway against value-inflection milestones, or routing R&D cost items to a capitalize-vs-expense determination. Every budget output surfaces its assumptions block; capitalize-vs-expense is decision-support only and routes to a named finance owner — it never books an entry or decides accounting treatment. Distinct from finance/financial-analysis (corporate DCF, close, valuation) and research/grants (funding discovery — this manages money already won).
development
Use when planning and synthesizing product/user research as a method-and-repository discipline — selecting the right method for the goal (generative interviews vs usability test vs concept test vs validation), computing method-based saturation/sample size with an explicit confidence level, or synthesizing coded observations into insights while flagging single-source anecdotes. Never fabricates user insight; an insight requires recurrence across independent participants. Distinct from product-team/ux-researcher-designer (persona/journey artifacts), product-discovery (discovery-sprint planning), and experiment-designer (live A/B) — this is the research-ops method + insight-repository layer.