[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-b56f2948-5426-4697-8637-0e27a491113e":3,"$fghdM7-kdw4JopMj7LAg_8culPzRIsqg1eE_MYuhuxiw":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"b56f2948-5426-4697-8637-0e27a491113e","slo-architect","使用于定义、审查或操作SLO\u002FSLI\u002F错误预算时。在“定义SLO”、“我们的SLO应该是什么”、“错误预算”、“燃尽率”、“SLI”、“服务水平目标”、“Google SRE工作簿”、“多窗口燃尽率警报”或任何可靠性目标问题上触发。提供SLO设计师、错误预算计算器（带多窗口燃尽率阈值）和SLO审查器，可捕捉常见错误（目标过于激进、窗口太短、SLO冲突、无SLI定义）。4个参考...","cat_life_career","mod_other","alirezarezvani,other","---\nname: slo-architect\ndescription: Use when defining, reviewing, or operating SLOs\u002FSLIs\u002Ferror budgets. Triggers on \"define an SLO\", \"what should our SLO be\", \"error budget\", \"burn rate\", \"SLI\", \"service level objective\", \"Google SRE workbook\", \"multi-window burn-rate alert\", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect\u002Fchaos-engineering\u002Fkubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.\ncontext: fork\nversion: 2.4.4\nauthor: claude-code-skills\nlicense: MIT\ntags: [slo, sli, sla, error-budget, burn-rate, sre, reliability, google-sre-workbook, observability]\ncompatible_tools: [claude-code, codex-cli, cursor, antigravity, opencode, gemini-cli]\n---\n\n# SLO Architect\n\nDefine SLOs that mean something. Most \"SLOs\" in the wild are arbitrary numbers no one believes — 99.9% on every endpoint, no SLI definition, no error budget, no policy for what happens when budget burns. This skill enforces the discipline from Google's SRE Workbook: pick the right SLI, set a target users actually care about, calculate the error budget, wire multi-window burn-rate alerts, and have a written policy for when budget runs out.\n\n## When to use\n\n- Defining a new SLO for a service or feature\n- Reviewing existing SLOs for common bugs\n- Picking the right SLI (event-based vs time-window based vs request-based)\n- Computing error budgets and burn-rate alert thresholds\n- Tying SLOs to existing controls — feature flags abort, chaos blast radius, operator capability levels\n\n## When NOT to use\n\n- General observability strategy (metrics + logs + traces) → use `observability-designer`\n- Customer-facing SLAs with legal teeth → that's contract drafting, not engineering\n- Performance load testing (capacity, not reliability) → use `performance-profiler`\n- Active incident response → use `incident-response`\n\n## Core principle: an SLO is a promise about user experience\n\n```\nSLI  ⟶  measurable signal of user-perceived health (e.g., HTTP 2xx rate)\nSLO  ⟶  target for the SLI over a window (e.g., 99.9% over 30 days)\nSLA  ⟶  customer-facing commitment with consequences (separate concern)\nEB   ⟶  error budget: 100% − SLO target = how much \"bad\" you can spend\nBR   ⟶  burn rate: how fast you're consuming the error budget\n```\n\nThe four cardinal mistakes:\n\n1. **Target too high** (99.99%+ on services that can't support it) — every minor blip violates SLO; alerts become noise.\n2. **Wrong SLI** (CPU usage as proxy for user experience) — system can be \"green\" while users suffer.\n3. **No error budget policy** — burning budget means nothing if there's no agreed action.\n4. **Single-window burn-rate alert** — either too noisy (page on a 5-min spike) or too slow (notice budget exhausted after the fact).\n\nThe 3 tools below catch each of these.\n\n## Quick start\n\n```bash\nSKILL=engineering\u002Fslo-architect\u002Fskills\u002Fslo-architect\n\n# 1. Design an SLO\npython \"$SKILL\u002Fscripts\u002Fslo_designer.py\" \\\n  --service checkout-svc \\\n  --sli-type request-success-rate \\\n  --target 99.9 \\\n  --window-days 30\n\n# 2. Compute error budget + multi-window burn-rate alerts\npython \"$SKILL\u002Fscripts\u002Ferror_budget_calculator.py\" \\\n  --target 99.9 --window-days 30\n\n# 3. Review existing SLO definitions for common bugs\npython \"$SKILL\u002Fscripts\u002Fslo_review.py\" --slo-doc docs\u002Fslos\u002F\n```\n\n## The 3 Python tools\n\nAll stdlib-only.\n\n### `slo_designer.py`\n\nGenerates a structured SLO definition with required fields. Refuses to render if any required field is missing (`exit 1`).\n\n```bash\npython scripts\u002Fslo_designer.py \\\n  --service checkout-svc \\\n  --sli-type request-success-rate \\\n  --target 99.9 \\\n  --window-days 30 \\\n  --owner team-checkout\n```\n\n**SLI types supported:**\n- `request-success-rate` — `(total_requests - bad_requests) \u002F total_requests`\n- `request-latency` — `count(requests \u003C threshold) \u002F total_requests`\n- `availability-time` — `(window - downtime) \u002F window`\n- `data-freshness` — `count(data_age \u003C threshold) \u002F total_data_points`\n- `correctness` — `count(correct_outputs) \u002F total_outputs`\n\nOutput is markdown by default with all required fields filled or marked `\u003Cmust define>`. JSON output (`--format json`) is consumed by `slo_review.py`.\n\n### `error_budget_calculator.py`\n\nGiven target availability + window, computes:\n- Allowed downtime in the window\n- Multi-window burn-rate thresholds per Google SRE Workbook (Chapter 5):\n  - **Fast burn** — page if 2% of monthly budget consumed in 1 hour\n  - **Slow burn** — page if 10% consumed in 6 hours, ticket if 10% in 3 days\n- Recommended alerting rules (PromQL-shaped output)\n\n```bash\npython scripts\u002Ferror_budget_calculator.py --target 99.9 --window-days 30\npython scripts\u002Ferror_budget_calculator.py --target 99.95 --window-days 7 --format json\n```\n\n### `slo_review.py`\n\nAudits a directory of SLO definitions (markdown or JSON) for the common bugs.\n\n```bash\npython scripts\u002Fslo_review.py --slo-doc docs\u002Fslos\u002F\n```\n\n**Checks:**\n- `target_too_high`: target ≥ 99.99% (sustainable only with massive engineering investment)\n- `target_too_low`: target ≤ 99.0% (probably wrong SLI; users will notice)\n- `window_too_short`: window \u003C 7 days (statistical noise dominates)\n- `window_too_long`: window > 90 days (slow feedback)\n- `no_sli_definition`: SLI section missing or vague (\"everything OK\")\n- `no_error_budget_policy`: no documented action when budget burns\n- `cpu_as_sli`: CPU\u002Fmemory used as user-experience proxy (wrong signal)\n\n## SLI selection cheatsheet\n\n| User experience | SLI type | What you measure |\n|---|---|---|\n| \"Did the request succeed?\" | request-success-rate | `2xx \u002F total` |\n| \"Was the response fast?\" | request-latency | `count(p99 \u003C threshold) \u002F total` |\n| \"Was the service up?\" | availability-time | `(window - downtime) \u002F window` |\n| \"Is the data current?\" | data-freshness | `count(data_age \u003C threshold) \u002F total` |\n| \"Was the answer correct?\" | correctness | `count(correct) \u002F total` |\n\nSee `references\u002Fsli_design.md` for examples and anti-patterns.\n\n## Error budget math (the basics)\n\nFor 99.9% SLO over 30 days:\n- Allowed unavailability: `0.1% × 30 × 24 × 60 = 43.2 minutes`\n- 1-hour fast-burn threshold (2% of monthly budget burned): `2% × 43.2 \u002F 60 ≈ 1.44 ratio multiplier`\n- 6-hour slow-burn threshold (10% in 6h): `10% × 43.2 \u002F 360 ≈ 0.6 ratio multiplier`\n\n`error_budget_calculator.py` does this math for you and emits ready-to-paste alert rules.\n\n## Composition with the rest of the portfolio\n\nThis skill explicitly composes with three others:\n\n| Skill | Composition |\n|---|---|\n| `feature-flags-architect` | Rollout abort criteria reference SLO burn-rate thresholds |\n| `chaos-engineering` | Blast-radius calculator already takes monthly error budget as input — define it here |\n| `kubernetes-operator` | Operator capability L4 (Deep Insights) requires SLOs + Prometheus rules |\n\nThe `error_budget_calculator.py` output is in the same shape `chaos-engineering\u002Fscripts\u002Fblast_radius_calculator.py` expects on stdin.\n\n## Workflows\n\n### Workflow 1: Define a new SLO\n\n```\n1. Pick the user journey to protect (e.g., \"checkout completion\").\n2. Choose SLI type (request-success-rate, latency, availability, freshness, correctness).\n3. Define the SLI precisely: numerator\u002Fdenominator with concrete labels.\n4. Pick a target by measuring 30 days of historical SLI value:\n     target = floor(p50 of last 30 days × 100) \u002F 100\n   This avoids targets the system has never sustained.\n5. Pick a window (28 days = 4 calendar weeks, recommended).\n6. Run slo_designer.py to render the SLO definition.\n7. Run error_budget_calculator.py to get burn-rate alerts.\n8. Write the error budget policy (what happens when budget burns).\n9. Run slo_review.py — must pass before the SLO is \"live\".\n```\n\n### Workflow 2: Quarterly SLO review\n\n```\n1. For every active SLO, run slo_review.py — fix any FAIL findings.\n2. Look at last quarter's data:\n   - Was the SLO too easy (never burned budget)? Tighten target.\n   - Was it too hard (frequently burned)? Loosen target OR fix the system.\n   - Did burn-rate alerts fire usefully (not too noisy, not too late)? Adjust thresholds.\n3. Audit error budget policies — were they actually followed when budget burned?\n4. Commit revised SLOs; archive old versions with date stamps.\n```\n\n### Workflow 3: SLO-driven rollback\n\n```\n1. New deploy starts burning error budget faster than baseline.\n2. Burn-rate alert fires (from error_budget_calculator.py thresholds).\n3. Auto-rollback via feature flag (kill switch from feature-flags-architect).\n4. Postmortem feeds into next SLO revision.\n```\n\n## References\n\n- `references\u002Fslo_principles.md` — SLI vs SLO vs SLA, Google SRE Workbook canon\n- `references\u002Fsli_design.md` — picking the right SLI; 5 types with examples\n- `references\u002Ferror_budget.md` — error budget math, burn-rate alerts, budget policy\n- `references\u002Fcomposition.md` — how SLOs feed feature flags, chaos, operators\n\n## Slash command\n\n`\u002Fslo-design` — interactive SLO design wizard that runs all 3 tools.\n\n## Asset templates\n\n- `assets\u002Fslo_template.yaml` — fillable SLO YAML\n- `assets\u002Ferror_budget_policy.md` — fillable policy template\n\n## Anti-patterns\n\n- **99.99% on every endpoint** — copy-paste SLOs that nobody verified the system can sustain\n- **CPU usage as SLI** — system metrics aren't user experience\n- **Single-window burn-rate alert** — too noisy if 5-min, too slow if 30-day\n- **No error budget policy** — burning budget means nothing without an action\n- **SLOs without owners** — no one is responsible; they bit-rot\n- **SLOs reviewed once a year** — system characteristics change faster than that\n- **SLAs in the SLO doc** — different audience, different stakes; keep them separate\n- **SLO target = SLA target** — SLO must be tighter (you should beat your contract before customers notice)\n\n## Verifiable success\n\nA team using this skill should achieve:\n\n- 100% of SLOs pass `slo_review.py` with 0 FAIL findings\n- Every SLO has a documented owner, error budget, burn-rate alerts, and policy\n- Burn-rate alerts fire ≤2 times\u002Fmonth per SLO that's hit (signal, not noise)\n- Mean time to detect SLO violation: \u003C30 min (multi-window burn-rate alerts working)\n- Quarterly SLO review happens every quarter (not annually)\n","","imported","https:\u002F\u002Fgithub.com\u002Falirezarezvani\u002Fclaude-skills","user_system_seed","SkillOPIC",true,67,1929,"2026-05-16 13:55:32",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"08cf99d1-bbb1-41a1-9b43-c51a2f617f3c","1.0.0","slo-architect.zip",23380,"uploads\u002Fskills\u002Fb56f2948-5426-4697-8637-0e27a491113e\u002Fslo-architect.zip","5718d58362aa64b395fe4a54da1f4530e952915573d635c941fa2d53967ea4b8","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10514},{\"path\":\"assets\u002Ferror_budget_policy.md\",\"isDirectory\":false,\"size\":2641},{\"path\":\"assets\u002Fslo_template.yaml\",\"isDirectory\":false,\"size\":2293},{\"path\":\"references\u002Fcomposition.md\",\"isDirectory\":false,\"size\":5266},{\"path\":\"references\u002Ferror_budget.md\",\"isDirectory\":false,\"size\":4588},{\"path\":\"references\u002Fsli_design.md\",\"isDirectory\":false,\"size\":5451},{\"path\":\"references\u002Fslo_principles.md\",\"isDirectory\":false,\"size\":5333},{\"path\":\"scripts\u002Ferror_budget_calculator.py\",\"isDirectory\":false,\"size\":5303},{\"path\":\"scripts\u002Fslo_designer.py\",\"isDirectory\":false,\"size\":6680},{\"path\":\"scripts\u002Fslo_review.py\",\"isDirectory\":false,\"size\":5063}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]