Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

alirezarezvani/kubernetes-operator

Name: kubernetes-operator
Author: alirezarezvani

engineering/kubernetes-operator/skills/kubernetes-operator/SKILL.md

npx skillsauth add alirezarezvani/claude-skills kubernetes-operator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kubernetes Operator

Build operators that reconcile correctly. Most operator bugs are not Kubernetes bugs — they are reconcile-loop bugs: missing finalizers, blocking calls, no requeue on transient errors, status drift, RBAC over-grants. This skill catches them deterministically before they reach a cluster.

When to use

Building a new Kubernetes Operator (controller for a CRD)
Reviewing an existing operator for capability-level gaps
Auditing a CRD spec for status/conditions/finalizer correctness
Choosing a framework (controller-runtime / kubebuilder / operator-sdk / metacontroller / KOPF)
Designing the API surface of a Custom Resource
Hardening RBAC, leader election, or webhook validation

When NOT to use

Plain Helm chart packaging → use helm-chart-builder
Standard kubectl operations / blue-green deploys → use senior-devops
General k8s security posture → use cloud-security
"I want to run a workload" — that's a Deployment / Job, not an operator

Core principle: an operator is a reconcile loop, not a script

observe(actual) → desired = read(spec) → diff(actual, desired) → act → update(status)
                                                                          ↓
                                                                   requeue / done

Operators that fail are the ones that:

Treat reconcile as imperative (do this, then this, then this) instead of declarative (make actual=desired, idempotently)
Don't requeue transient failures
Don't use finalizers, leaving orphan resources
Mutate spec instead of status
Don't use the status subresource (status updates trigger spec reconciles → loop)
Block in reconcile (long HTTP calls, locks)
Forget leader election → split-brain on multi-replica deploys

The 3 tools below catch each of these.

Quick start

SKILL=engineering/kubernetes-operator/skills/kubernetes-operator

# Validate a CRD design
python "$SKILL/scripts/crd_validator.py" --crd config/crd/myapp.yaml

# Lint a Go reconcile function
python "$SKILL/scripts/reconcile_lint.py" --controller controllers/myapp_controller.go

# Score against OperatorHub Capability Levels (1-5)
python "$SKILL/scripts/operator_capability_audit.py" --operator-dir .

The 3 Python tools

All stdlib-only. Run with --help.

`crd_validator.py`

Validates a CRD YAML against operator-pattern best practices.

python scripts/crd_validator.py --crd config/crd/myapp.yaml
python scripts/crd_validator.py --crd config/crd/ --format json

Checks:

spec.versions[*].subresources.status is set (status subresource)
spec.scope is Namespaced (not Cluster) unless explicitly justified
Singular and listKind defined
spec.versions[*].schema.openAPIV3Schema has type definitions (no x-kubernetes-preserve-unknown-fields: true at top level)
A version is marked served: true AND storage: true
Conditions array is in the schema (allows metav1.Conditions)
Printer columns include Age and Status/Phase

`reconcile_lint.py`

Lints a Go controller reconcile function for anti-patterns.

python scripts/reconcile_lint.py --controller controllers/myapp_controller.go

Checks (regex-based heuristics):

Returns are (ctrl.Result, error) shape
Errors trigger a non-zero requeue (return ctrl.Result{Requeue: true}, err)
client.Update() on the spec object is flagged (controllers should update only status)
time.Sleep inside reconcile is flagged (use RequeueAfter)
HTTP calls without context cancellation are flagged
Missing defer after a finalizer add
No IsConditionTrue / SetCondition calls when conditions present in CRD
Reconcile function exceeds 80 lines (extract subroutines)

`operator_capability_audit.py`

Scores an operator against OperatorHub's 5 Capability Levels.

python scripts/operator_capability_audit.py --operator-dir .

Levels:

L1 — Basic Install: CRD defined, controller deploys it
L2 — Seamless Upgrades: PDBs, conversion webhooks, version skew strategy
L3 — Full Lifecycle: backups, restores, failure recovery
L4 — Deep Insights: metrics endpoint, Prometheus rules, alerts
L5 — Auto Pilot: auto-scaling, auto-tuning, anomaly detection

Reports current level + concrete next steps to advance one level.

Tooling landscape

Pick a framework based on language and complexity. See references/tooling_landscape.md.

| Framework | Language | Best for | Maintenance | |---|---|---|---| | controller-runtime | Go | Production-grade, low-level control | Active (sig-api-machinery) | | kubebuilder | Go | Standard scaffolding, opinionated | Active (Kubernetes SIGs) | | operator-sdk | Go / Helm / Ansible | OpenShift / mixed-paradigm teams | Active (Red Hat) | | metacontroller | Any (webhook-based) | Polyglot teams, avoiding Go | Less active | | KOPF | Python | Python shops, async-first | Active (community) | | java-operator-sdk | Java | JVM shops | Active (Red Hat / Java SIG) |

Decision rules:

New operator + Go shop → kubebuilder
New operator + Python shop → KOPF
New operator + can't pick a language → metacontroller
OpenShift target → operator-sdk

CRD design principles

See references/crd_design.md for full detail. Quick rules:

status is the source of truth for the controller's view of the world. Spec is what the user wants; status is what the controller observed.
Use the status subresource. Without it, status updates re-trigger reconcile (loop).
Use Conditions. Ready, Reconciling, Degraded. Each carries a reason and message.
Add finalizers. Without finalizers, deletion races the controller and orphans external resources.
Version your CRD from day 1. v1alpha1 → v1beta1 → v1. Plan a conversion webhook.
Validate via OpenAPI v3 schema. Don't rely on the controller for validation that should fail at admission.
Use additionalPrinterColumns for kubectl get. Show Age, Phase, Ready at minimum.
Namespace your CRDs unless they manage cluster-scoped resources.

Reconcile loop principles

See references/reconcile_loop.md for full detail. Quick rules:

Idempotent. Reconciling the same state twice → same result, zero side effects.
Read once, decide, act. Don't observe the world repeatedly during reconcile.
Update status, not spec. Spec belongs to the user.
Return errors that requeue. Use ctrl.Result{RequeueAfter: ...} for known transient cases.
Never block. No time.Sleep. No long HTTP calls without context.
Use the cache. Read via the controller's cached client; only escape the cache for a specific reason.
Leader-elect when running >1 replica. Otherwise enable single-replica mode.
Set OwnerReferences. Cascading deletion is the operator pattern's free gift.

Workflows

Workflow 1: Bootstrap a new operator (Go + kubebuilder)

1. Pick a Group/Version/Kind: e.g., apps.example.com/v1alpha1, kind=MyApp
2. kubebuilder init --domain example.com --repo github.com/org/myapp-operator
3. kubebuilder create api --group apps --version v1alpha1 --kind MyApp
4. Run crd_validator.py on config/crd/bases/apps.example.com_myapps.yaml
   → Fix every WARN before writing controller code
5. Implement the reconcile function (Karpathy principle 2: simplest correct version first)
6. Run reconcile_lint.py on controllers/myapp_controller.go
7. Run operator_capability_audit.py --operator-dir . — confirm L1
8. Test in a kind cluster: kubectl apply -f config/samples/
9. Add status conditions; aim for L2 in the same PR

Workflow 2: Audit an existing operator

1. Run operator_capability_audit.py --operator-dir <path>
2. Run crd_validator.py --crd config/crd/
3. Run reconcile_lint.py --controller controllers/
4. Triage findings:
   - FAIL → block release; fix before next deploy
   - WARN → file an issue; fix in next 30 days
5. Document current capability level in README; commit
6. Plan one capability level advancement per quarter

Workflow 3: Choose a framework

1. Identify primary language constraint (team skill)
2. Identify deployment target (vanilla k8s vs OpenShift)
3. Identify operator complexity (single CRD vs multi-CRD vs cluster-wide)
4. Cross-reference with references/tooling_landscape.md
5. Build a 1-week proof-of-concept before committing

References

references/operator_pattern.md — what an operator IS, when to use vs alternatives
references/crd_design.md — CRD design principles, versioning, conversion webhooks
references/reconcile_loop.md — reconcile patterns, error handling, idempotency
references/tooling_landscape.md — framework comparison + decision tree

Slash command

/operator-audit — Run all 3 tools on an operator repo and produce a markdown report.

Asset templates

assets/crd_template.yaml — CRD with status subresource, conditions, finalizer hint, printer columns
assets/reconcile_skeleton.go — Go controller reconcile function with idempotency, conditions, finalizers, requeue patterns

Anti-patterns

time.Sleep(30 * time.Second) inside reconcile — block other reconciles. Use RequeueAfter.
r.Client.Update(ctx, obj) to set status — use r.Status().Update(ctx, obj) instead.
No leader election + 2+ replicas — split-brain.
No finalizer — external resources orphan on deletion.
CRD without status subresource — status updates trigger spec reconciles (infinite loop).
Reconcile function > 200 lines — extract reconcileXxx subroutines per condition.
x-kubernetes-preserve-unknown-fields: true on spec root — defeats validation.
Imperative reconcile — "if creating, do A; if updating, do B; if deleting, do C". Wrong shape. Reconcile = make actual=desired, regardless of how we got here.

Verifiable success

A team using this skill should achieve:

100% of new CRDs pass crd_validator.py before merge
All reconcile functions pass reconcile_lint.py strict mode
Operators reach OperatorHub Capability Level 3 (Full Lifecycle) before public release
Mean time to fix a reconcile bug: <1 day (no infinite loops in production)

alirezarezvani/kubernetes-operator

engineering/kubernetes-operator/skills/kubernetes-operator/SKILL.md

Use when building a Kubernetes Operator — custom controllers that reconcile CRD state. Triggers on "build an operator", "CRD design", "reconcile loop", "controller-runtime", "kubebuilder", "operator-sdk", "metacontroller", "KOPF", "operator capability levels", or "custom resource". Ships CRD validator, reconcile-loop linter, and OperatorHub capability auditor (all stdlib Python), 4 references on the operator pattern + CRD design + reconcile patterns + tooling landscape, and a /operator-audit slash command. NOT a generic k8s skill — specifically the Operator pattern.

16,359 stars

tools

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add alirezarezvani/claude-skills kubernetes-operator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 28, 2026, 3:48 AM17.9s10 files scanned

SKILL.md

name:: kubernetes-operator
description:: Use when building a Kubernetes Operator — custom controllers that reconcile CRD state. Triggers on "build an operator", "CRD design", "reconcile loop", "controller-runtime", "kubebuilder", "operator-sdk", "metacontroller", "KOPF", "operator capability levels", or "custom resource". Ships CRD validator, reconcile-loop linter, and OperatorHub capability auditor (all stdlib Python), 4 references on the operator pattern + CRD design + reconcile patterns + tooling landscape, and a /operator-audit slash command. NOT a generic k8s skill — specifically the Operator pattern.
context:: fork
version:: 2.9.0
author:: claude-code-skills
license:: MIT
tags:: [kubernetes, operator, crd, controller-runtime, kubebuilder, operator-sdk, metacontroller, kopf, reconcile, devops]
compatible_tools:: [claude-code, codex-cli, cursor, antigravity, opencode, gemini-cli]

Kubernetes Operator

When to use

Building a new Kubernetes Operator (controller for a CRD)
Reviewing an existing operator for capability-level gaps
Auditing a CRD spec for status/conditions/finalizer correctness
Choosing a framework (controller-runtime / kubebuilder / operator-sdk / metacontroller / KOPF)
Designing the API surface of a Custom Resource
Hardening RBAC, leader election, or webhook validation

When NOT to use

Plain Helm chart packaging → use helm-chart-builder
Standard kubectl operations / blue-green deploys → use senior-devops
General k8s security posture → use cloud-security
"I want to run a workload" — that's a Deployment / Job, not an operator

Core principle: an operator is a reconcile loop, not a script

observe(actual) → desired = read(spec) → diff(actual, desired) → act → update(status)
                                                                          ↓
                                                                   requeue / done

Operators that fail are the ones that:

Treat reconcile as imperative (do this, then this, then this) instead of declarative (make actual=desired, idempotently)
Don't requeue transient failures
Don't use finalizers, leaving orphan resources
Mutate spec instead of status
Don't use the status subresource (status updates trigger spec reconciles → loop)
Block in reconcile (long HTTP calls, locks)
Forget leader election → split-brain on multi-replica deploys

The 3 tools below catch each of these.

Quick start

SKILL=engineering/kubernetes-operator/skills/kubernetes-operator

# Validate a CRD design
python "$SKILL/scripts/crd_validator.py" --crd config/crd/myapp.yaml

# Lint a Go reconcile function
python "$SKILL/scripts/reconcile_lint.py" --controller controllers/myapp_controller.go

# Score against OperatorHub Capability Levels (1-5)
python "$SKILL/scripts/operator_capability_audit.py" --operator-dir .

The 3 Python tools

All stdlib-only. Run with --help.

`crd_validator.py`

Validates a CRD YAML against operator-pattern best practices.

python scripts/crd_validator.py --crd config/crd/myapp.yaml
python scripts/crd_validator.py --crd config/crd/ --format json

Checks:

spec.versions[*].subresources.status is set (status subresource)
spec.scope is Namespaced (not Cluster) unless explicitly justified
Singular and listKind defined
spec.versions[*].schema.openAPIV3Schema has type definitions (no x-kubernetes-preserve-unknown-fields: true at top level)
A version is marked served: true AND storage: true
Conditions array is in the schema (allows metav1.Conditions)
Printer columns include Age and Status/Phase

`reconcile_lint.py`

Lints a Go controller reconcile function for anti-patterns.

python scripts/reconcile_lint.py --controller controllers/myapp_controller.go

Checks (regex-based heuristics):

Returns are (ctrl.Result, error) shape
Errors trigger a non-zero requeue (return ctrl.Result{Requeue: true}, err)
client.Update() on the spec object is flagged (controllers should update only status)
time.Sleep inside reconcile is flagged (use RequeueAfter)
HTTP calls without context cancellation are flagged
Missing defer after a finalizer add
No IsConditionTrue / SetCondition calls when conditions present in CRD
Reconcile function exceeds 80 lines (extract subroutines)

`operator_capability_audit.py`

Scores an operator against OperatorHub's 5 Capability Levels.

python scripts/operator_capability_audit.py --operator-dir .

Levels:

L1 — Basic Install: CRD defined, controller deploys it
L2 — Seamless Upgrades: PDBs, conversion webhooks, version skew strategy
L3 — Full Lifecycle: backups, restores, failure recovery
L4 — Deep Insights: metrics endpoint, Prometheus rules, alerts
L5 — Auto Pilot: auto-scaling, auto-tuning, anomaly detection

Reports current level + concrete next steps to advance one level.

Tooling landscape

Pick a framework based on language and complexity. See references/tooling_landscape.md.

Decision rules:

New operator + Go shop → kubebuilder
New operator + Python shop → KOPF
New operator + can't pick a language → metacontroller
OpenShift target → operator-sdk

CRD design principles

See references/crd_design.md for full detail. Quick rules:

status is the source of truth for the controller's view of the world. Spec is what the user wants; status is what the controller observed.
Use the status subresource. Without it, status updates re-trigger reconcile (loop).
Use Conditions. Ready, Reconciling, Degraded. Each carries a reason and message.
Add finalizers. Without finalizers, deletion races the controller and orphans external resources.
Version your CRD from day 1. v1alpha1 → v1beta1 → v1. Plan a conversion webhook.
Validate via OpenAPI v3 schema. Don't rely on the controller for validation that should fail at admission.
Use additionalPrinterColumns for kubectl get. Show Age, Phase, Ready at minimum.
Namespace your CRDs unless they manage cluster-scoped resources.

Reconcile loop principles

See references/reconcile_loop.md for full detail. Quick rules:

Idempotent. Reconciling the same state twice → same result, zero side effects.
Read once, decide, act. Don't observe the world repeatedly during reconcile.
Update status, not spec. Spec belongs to the user.
Return errors that requeue. Use ctrl.Result{RequeueAfter: ...} for known transient cases.
Never block. No time.Sleep. No long HTTP calls without context.
Use the cache. Read via the controller's cached client; only escape the cache for a specific reason.
Leader-elect when running >1 replica. Otherwise enable single-replica mode.
Set OwnerReferences. Cascading deletion is the operator pattern's free gift.

Workflows

Workflow 1: Bootstrap a new operator (Go + kubebuilder)

1. Pick a Group/Version/Kind: e.g., apps.example.com/v1alpha1, kind=MyApp
2. kubebuilder init --domain example.com --repo github.com/org/myapp-operator
3. kubebuilder create api --group apps --version v1alpha1 --kind MyApp
4. Run crd_validator.py on config/crd/bases/apps.example.com_myapps.yaml
   → Fix every WARN before writing controller code
5. Implement the reconcile function (Karpathy principle 2: simplest correct version first)
6. Run reconcile_lint.py on controllers/myapp_controller.go
7. Run operator_capability_audit.py --operator-dir . — confirm L1
8. Test in a kind cluster: kubectl apply -f config/samples/
9. Add status conditions; aim for L2 in the same PR

Workflow 2: Audit an existing operator

1. Run operator_capability_audit.py --operator-dir <path>
2. Run crd_validator.py --crd config/crd/
3. Run reconcile_lint.py --controller controllers/
4. Triage findings:
   - FAIL → block release; fix before next deploy
   - WARN → file an issue; fix in next 30 days
5. Document current capability level in README; commit
6. Plan one capability level advancement per quarter

Workflow 3: Choose a framework

1. Identify primary language constraint (team skill)
2. Identify deployment target (vanilla k8s vs OpenShift)
3. Identify operator complexity (single CRD vs multi-CRD vs cluster-wide)
4. Cross-reference with references/tooling_landscape.md
5. Build a 1-week proof-of-concept before committing

References

references/operator_pattern.md — what an operator IS, when to use vs alternatives
references/crd_design.md — CRD design principles, versioning, conversion webhooks
references/reconcile_loop.md — reconcile patterns, error handling, idempotency
references/tooling_landscape.md — framework comparison + decision tree

Slash command

/operator-audit — Run all 3 tools on an operator repo and produce a markdown report.

Asset templates

assets/crd_template.yaml — CRD with status subresource, conditions, finalizer hint, printer columns
assets/reconcile_skeleton.go — Go controller reconcile function with idempotency, conditions, finalizers, requeue patterns

Anti-patterns

time.Sleep(30 * time.Second) inside reconcile — block other reconciles. Use RequeueAfter.
r.Client.Update(ctx, obj) to set status — use r.Status().Update(ctx, obj) instead.
No leader election + 2+ replicas — split-brain.
No finalizer — external resources orphan on deletion.
CRD without status subresource — status updates trigger spec reconciles (infinite loop).
Reconcile function > 200 lines — extract reconcileXxx subroutines per condition.
x-kubernetes-preserve-unknown-fields: true on spec root — defeats validation.
Imperative reconcile — "if creating, do A; if updating, do B; if deleting, do C". Wrong shape. Reconcile = make actual=desired, regardless of how we got here.

Verifiable success

A team using this skill should achieve:

100% of new CRDs pass crd_validator.py before merge
All reconcile functions pass reconcile_lint.py strict mode
Operators reach OperatorHub Capability Level 3 (Full Lifecycle) before public release
Mean time to fix a reconcile bug: <1 day (no infinite loops in production)

Related Skills

alirezarezvani/weekly-review

development

VerifiedTrustedCommunity

Use when someone wants to run a weekly review, close open loops, audit stalled projects and commitments, get their system back to trusted, restart a lapsed review habit, or says "/cs:weekly-review". Walks David Allen's three-phase loop — GET CLEAR, GET CURRENT, GET CREATIVE — with deterministic scripts that inventory open loops, gate the checklist with named gaps, and score commitment health 0-100.

22,702SKILL.mdUpdated Jul 18, 2026

alirezarezvani/weekly-review

alirezarezvani/meetings

development

VerifiedTrustedCommunity

Use when someone wants to decide whether a meeting is worth calling, price a meeting in dollars, build a timeboxed agenda with desired outcomes, or turn messy meeting notes into owned action items — or says "should this be a meeting", "/cs:meeting-prep", or "/cs:meeting-actions". Runs a cost gate (ASYNC / NOT-READY / MEET), builds a decision-first agenda, and extracts an owner + due-date checklist that flags every orphan.

22,702SKILL.mdUpdated Jul 18, 2026

alirezarezvani/meetings

alirezarezvani/fable-goal

development

VerifiedTrustedCommunity

Convert a rambling description of a desired outcome into one polished, autonomous /goal prompt ready to paste into a fresh session. Use when the user says "/fable-goal", "turn this into a goal prompt", "write me a fable prompt", "write the prompt that builds X", or rambles about something they want made and asks for the prompt that makes it happen. The output is a single copy-paste prompt, never the build itself. Do NOT use when the user wants the thing built right now in this session — only when they want the PROMPT that will make it happen in a fresh session.

22,702SKILL.mdUpdated Jul 18, 2026

alirezarezvani/fable-goal

alirezarezvani/deep-work

development

VerifiedTrustedCommunity

Use when someone wants to plan a deep work day, time-block their calendar or task list, budget or cut shallow work, protect focus hours, track deep-work sessions and streaks, run an end-of-day shutdown ritual, or says "/deep-work" or "/time-block". Classifies tasks deep vs shallow, builds an energy-first time-blocked schedule that refuses deep demand past the 4-hour ceiling, batches shallow work into at most two windows, and logs focus sessions against a weekly target.

22,702SKILL.mdUpdated Jul 18, 2026

alirezarezvani/deep-work

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/alirezarezvani/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/engineering/kubernetes-operator/skills/kubernetes-operator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

alirezarezvani/claude-skills

16,359 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT