[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-18f393c8-f65b-4edc-bf00-8a8a79ab4634":3,"$fiGT7_TOuaanemnlaVFR6224kz43xQHSYCGEWY9tBRjs":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"18f393c8-f65b-4edc-bf00-8a8a79ab4634","kubernetes-operator","用于构建Kubernetes Operator时——自定义控制器，用于协调CRD状态。在“构建Operator”、“CRD设计”、“协调循环”、“controller-runtime”、“kubebuilder”、“operator-sdk”、“metacontroller”、“KOPF”、“Operator能力级别”或“自定义资源”时触发。包含CRD验证器、协调循环检查器和OperatorHub能力审计器（所有stdlib Python），4个关于Operator模式+CRD设计+协调模式+工具景观的参考，以及一个\u002Foperator-audit...","cat_coding_devops","mod_coding","alirezarezvani,coding","---\nname: kubernetes-operator\ndescription: Use when building a Kubernetes Operator — custom controllers that reconcile CRD state. Triggers on \"build an operator\", \"CRD design\", \"reconcile loop\", \"controller-runtime\", \"kubebuilder\", \"operator-sdk\", \"metacontroller\", \"KOPF\", \"operator capability levels\", or \"custom resource\". Ships CRD validator, reconcile-loop linter, and OperatorHub capability auditor (all stdlib Python), 4 references on the operator pattern + CRD design + reconcile patterns + tooling landscape, and a \u002Foperator-audit slash command. NOT a generic k8s skill — specifically the Operator pattern.\ncontext: fork\nversion: 2.4.0\nauthor: claude-code-skills\nlicense: MIT\ntags: [kubernetes, operator, crd, controller-runtime, kubebuilder, operator-sdk, metacontroller, kopf, reconcile, devops]\ncompatible_tools: [claude-code, codex-cli, cursor, antigravity, opencode, gemini-cli]\n---\n\n# Kubernetes Operator\n\nBuild operators that reconcile correctly. Most operator bugs are not Kubernetes bugs — they are reconcile-loop bugs: missing finalizers, blocking calls, no requeue on transient errors, status drift, RBAC over-grants. This skill catches them deterministically before they reach a cluster.\n\n## When to use\n\n- Building a new Kubernetes Operator (controller for a CRD)\n- Reviewing an existing operator for capability-level gaps\n- Auditing a CRD spec for status\u002Fconditions\u002Ffinalizer correctness\n- Choosing a framework (controller-runtime \u002F kubebuilder \u002F operator-sdk \u002F metacontroller \u002F KOPF)\n- Designing the API surface of a Custom Resource\n- Hardening RBAC, leader election, or webhook validation\n\n## When NOT to use\n\n- Plain Helm chart packaging → use `helm-chart-builder`\n- Standard kubectl operations \u002F blue-green deploys → use `senior-devops`\n- General k8s security posture → use `cloud-security`\n- \"I want to run a workload\" — that's a Deployment \u002F Job, not an operator\n\n## Core principle: an operator is a reconcile loop, not a script\n\n```\nobserve(actual) → desired = read(spec) → diff(actual, desired) → act → update(status)\n                                                                          ↓\n                                                                   requeue \u002F done\n```\n\nOperators that fail are the ones that:\n1. Treat reconcile as imperative (do this, then this, then this) instead of declarative (make actual=desired, idempotently)\n2. Don't requeue transient failures\n3. Don't use finalizers, leaving orphan resources\n4. Mutate spec instead of status\n5. Don't use the status subresource (status updates trigger spec reconciles → loop)\n6. Block in reconcile (long HTTP calls, locks)\n7. Forget leader election → split-brain on multi-replica deploys\n\nThe 3 tools below catch each of these.\n\n## Quick start\n\n```bash\nSKILL=engineering\u002Fkubernetes-operator\u002Fskills\u002Fkubernetes-operator\n\n# Validate a CRD design\npython \"$SKILL\u002Fscripts\u002Fcrd_validator.py\" --crd config\u002Fcrd\u002Fmyapp.yaml\n\n# Lint a Go reconcile function\npython \"$SKILL\u002Fscripts\u002Freconcile_lint.py\" --controller controllers\u002Fmyapp_controller.go\n\n# Score against OperatorHub Capability Levels (1-5)\npython \"$SKILL\u002Fscripts\u002Foperator_capability_audit.py\" --operator-dir .\n```\n\n## The 3 Python tools\n\nAll stdlib-only. Run with `--help`.\n\n### `crd_validator.py`\n\nValidates a CRD YAML against operator-pattern best practices.\n\n```bash\npython scripts\u002Fcrd_validator.py --crd config\u002Fcrd\u002Fmyapp.yaml\npython scripts\u002Fcrd_validator.py --crd config\u002Fcrd\u002F --format json\n```\n\n**Checks:**\n- `spec.versions[*].subresources.status` is set (status subresource)\n- `spec.scope` is `Namespaced` (not `Cluster`) unless explicitly justified\n- Singular and listKind defined\n- `spec.versions[*].schema.openAPIV3Schema` has type definitions (no `x-kubernetes-preserve-unknown-fields: true` at top level)\n- A version is marked `served: true` AND `storage: true`\n- Conditions array is in the schema (allows `metav1.Conditions`)\n- Printer columns include `Age` and `Status`\u002F`Phase`\n\n### `reconcile_lint.py`\n\nLints a Go controller reconcile function for anti-patterns.\n\n```bash\npython scripts\u002Freconcile_lint.py --controller controllers\u002Fmyapp_controller.go\n```\n\n**Checks (regex-based heuristics):**\n- Returns are `(ctrl.Result, error)` shape\n- Errors trigger a non-zero requeue (`return ctrl.Result{Requeue: true}, err`)\n- `client.Update()` on the spec object is flagged (controllers should update only status)\n- `time.Sleep` inside reconcile is flagged (use `RequeueAfter`)\n- HTTP calls without context cancellation are flagged\n- Missing `defer` after a finalizer add\n- No `IsConditionTrue` \u002F `SetCondition` calls when conditions present in CRD\n- Reconcile function exceeds 80 lines (extract subroutines)\n\n### `operator_capability_audit.py`\n\nScores an operator against OperatorHub's 5 Capability Levels.\n\n```bash\npython scripts\u002Foperator_capability_audit.py --operator-dir .\n```\n\n**Levels:**\n- **L1 — Basic Install:** CRD defined, controller deploys it\n- **L2 — Seamless Upgrades:** PDBs, conversion webhooks, version skew strategy\n- **L3 — Full Lifecycle:** backups, restores, failure recovery\n- **L4 — Deep Insights:** metrics endpoint, Prometheus rules, alerts\n- **L5 — Auto Pilot:** auto-scaling, auto-tuning, anomaly detection\n\nReports current level + concrete next steps to advance one level.\n\n## Tooling landscape\n\nPick a framework based on language and complexity. See `references\u002Ftooling_landscape.md`.\n\n| Framework | Language | Best for | Maintenance |\n|---|---|---|---|\n| **controller-runtime** | Go | Production-grade, low-level control | Active (sig-api-machinery) |\n| **kubebuilder** | Go | Standard scaffolding, opinionated | Active (Kubernetes SIGs) |\n| **operator-sdk** | Go \u002F Helm \u002F Ansible | OpenShift \u002F mixed-paradigm teams | Active (Red Hat) |\n| **metacontroller** | Any (webhook-based) | Polyglot teams, avoiding Go | Less active |\n| **KOPF** | Python | Python shops, async-first | Active (community) |\n| **java-operator-sdk** | Java | JVM shops | Active (Red Hat \u002F Java SIG) |\n\nDecision rules:\n- New operator + Go shop → kubebuilder\n- New operator + Python shop → KOPF\n- New operator + can't pick a language → metacontroller\n- OpenShift target → operator-sdk\n\n## CRD design principles\n\nSee `references\u002Fcrd_design.md` for full detail. Quick rules:\n\n1. **status is the source of truth for the controller's view of the world.** Spec is what the user wants; status is what the controller observed.\n2. **Use the status subresource.** Without it, status updates re-trigger reconcile (loop).\n3. **Use Conditions.** `Ready`, `Reconciling`, `Degraded`. Each carries a reason and message.\n4. **Add finalizers.** Without finalizers, deletion races the controller and orphans external resources.\n5. **Version your CRD from day 1.** `v1alpha1` → `v1beta1` → `v1`. Plan a conversion webhook.\n6. **Validate via OpenAPI v3 schema.** Don't rely on the controller for validation that should fail at admission.\n7. **Use `additionalPrinterColumns` for `kubectl get`.** Show `Age`, `Phase`, `Ready` at minimum.\n8. **Namespace your CRDs unless they manage cluster-scoped resources.**\n\n## Reconcile loop principles\n\nSee `references\u002Freconcile_loop.md` for full detail. Quick rules:\n\n1. **Idempotent.** Reconciling the same state twice → same result, zero side effects.\n2. **Read once, decide, act.** Don't observe the world repeatedly during reconcile.\n3. **Update status, not spec.** Spec belongs to the user.\n4. **Return errors that requeue.** Use `ctrl.Result{RequeueAfter: ...}` for known transient cases.\n5. **Never block.** No `time.Sleep`. No long HTTP calls without context.\n6. **Use the cache.** Read via the controller's cached client; only escape the cache for a specific reason.\n7. **Leader-elect when running >1 replica.** Otherwise enable single-replica mode.\n8. **Set OwnerReferences.** Cascading deletion is the operator pattern's free gift.\n\n## Workflows\n\n### Workflow 1: Bootstrap a new operator (Go + kubebuilder)\n\n```\n1. Pick a Group\u002FVersion\u002FKind: e.g., apps.example.com\u002Fv1alpha1, kind=MyApp\n2. kubebuilder init --domain example.com --repo github.com\u002Forg\u002Fmyapp-operator\n3. kubebuilder create api --group apps --version v1alpha1 --kind MyApp\n4. Run crd_validator.py on config\u002Fcrd\u002Fbases\u002Fapps.example.com_myapps.yaml\n   → Fix every WARN before writing controller code\n5. Implement the reconcile function (Karpathy principle 2: simplest correct version first)\n6. Run reconcile_lint.py on controllers\u002Fmyapp_controller.go\n7. Run operator_capability_audit.py --operator-dir . — confirm L1\n8. Test in a kind cluster: kubectl apply -f config\u002Fsamples\u002F\n9. Add status conditions; aim for L2 in the same PR\n```\n\n### Workflow 2: Audit an existing operator\n\n```\n1. Run operator_capability_audit.py --operator-dir \u003Cpath>\n2. Run crd_validator.py --crd config\u002Fcrd\u002F\n3. Run reconcile_lint.py --controller controllers\u002F\n4. Triage findings:\n   - FAIL → block release; fix before next deploy\n   - WARN → file an issue; fix in next 30 days\n5. Document current capability level in README; commit\n6. Plan one capability level advancement per quarter\n```\n\n### Workflow 3: Choose a framework\n\n```\n1. Identify primary language constraint (team skill)\n2. Identify deployment target (vanilla k8s vs OpenShift)\n3. Identify operator complexity (single CRD vs multi-CRD vs cluster-wide)\n4. Cross-reference with references\u002Ftooling_landscape.md\n5. Build a 1-week proof-of-concept before committing\n```\n\n## References\n\n- `references\u002Foperator_pattern.md` — what an operator IS, when to use vs alternatives\n- `references\u002Fcrd_design.md` — CRD design principles, versioning, conversion webhooks\n- `references\u002Freconcile_loop.md` — reconcile patterns, error handling, idempotency\n- `references\u002Ftooling_landscape.md` — framework comparison + decision tree\n\n## Slash command\n\n`\u002Foperator-audit` — Run all 3 tools on an operator repo and produce a markdown report.\n\n## Asset templates\n\n- `assets\u002Fcrd_template.yaml` — CRD with status subresource, conditions, finalizer hint, printer columns\n- `assets\u002Freconcile_skeleton.go` — Go controller reconcile function with idempotency, conditions, finalizers, requeue patterns\n\n## Anti-patterns\n\n- **`time.Sleep(30 * time.Second)` inside reconcile** — block other reconciles. Use `RequeueAfter`.\n- **`r.Client.Update(ctx, obj)` to set status** — use `r.Status().Update(ctx, obj)` instead.\n- **No leader election + 2+ replicas** — split-brain.\n- **No finalizer** — external resources orphan on deletion.\n- **CRD without status subresource** — status updates trigger spec reconciles (infinite loop).\n- **Reconcile function > 200 lines** — extract reconcileXxx subroutines per condition.\n- **`x-kubernetes-preserve-unknown-fields: true` on spec root** — defeats validation.\n- **Imperative reconcile** — \"if creating, do A; if updating, do B; if deleting, do C\". Wrong shape. Reconcile = make actual=desired, regardless of how we got here.\n\n## Verifiable success\n\nA team using this skill should achieve:\n\n- 100% of new CRDs pass `crd_validator.py` before merge\n- All reconcile functions pass `reconcile_lint.py` strict mode\n- Operators reach OperatorHub Capability Level 3 (Full Lifecycle) before public release\n- Mean time to fix a reconcile bug: \u003C1 day (no infinite loops in production)\n","","imported","https:\u002F\u002Fgithub.com\u002Falirezarezvani\u002Fclaude-skills","user_system_seed","SkillOPIC",true,155,485,"2026-05-16 13:54:21",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"DevOps","devops","mdi-cog-outline","CI\u002FCD、容器化、部署运维",3,162,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"43987200-dd4a-4bee-804d-82766cb7166b","1.0.0","kubernetes-operator.zip",26327,"uploads\u002Fskills\u002F18f393c8-f65b-4edc-bf00-8a8a79ab4634\u002Fkubernetes-operator.zip","3a2ba8b285c14dca6cc871dfbb7a26741169b69e6b5110dbeb3bbb9b4b13d0ac","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":11238},{\"path\":\"assets\u002Fcrd_template.yaml\",\"isDirectory\":false,\"size\":2620},{\"path\":\"assets\u002Freconcile_skeleton.go\",\"isDirectory\":false,\"size\":4052},{\"path\":\"references\u002Fcrd_design.md\",\"isDirectory\":false,\"size\":6998},{\"path\":\"references\u002Foperator_pattern.md\",\"isDirectory\":false,\"size\":7157},{\"path\":\"references\u002Freconcile_loop.md\",\"isDirectory\":false,\"size\":7275},{\"path\":\"references\u002Ftooling_landscape.md\",\"isDirectory\":false,\"size\":7642},{\"path\":\"scripts\u002Fcrd_validator.py\",\"isDirectory\":false,\"size\":5939},{\"path\":\"scripts\u002Foperator_capability_audit.py\",\"isDirectory\":false,\"size\":5916},{\"path\":\"scripts\u002Freconcile_lint.py\",\"isDirectory\":false,\"size\":6495}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]