[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-bdad19f0-369e-427c-9e8f-29e6fa6ef9dd":3,"$fNN0Bw6jEqurO7stEsGP7rU3AR9B-9n0QZ1enEtanLII":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"bdad19f0-369e-427c-9e8f-29e6fa6ef9dd","grafana-dashboards","创建和管理适用于生产环境的Grafana仪表板，实现全面的系统可观测性。","cat_prod_data","mod_productivity","sickn33,productivity","---\nname: grafana-dashboards\ndescription: \"Create and manage production-ready Grafana dashboards for comprehensive system observability.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Grafana Dashboards\n\nCreate and manage production-ready Grafana dashboards for comprehensive system observability.\n\n## Do not use this skill when\n\n- The task is unrelated to grafana dashboards\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Purpose\n\nDesign effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.\n\n## Use this skill when\n\n- Visualize Prometheus metrics\n- Create custom dashboards\n- Implement SLO dashboards\n- Monitor infrastructure\n- Track business KPIs\n\n## Dashboard Design Principles\n\n### 1. Hierarchy of Information\n```\n┌─────────────────────────────────────┐\n│  Critical Metrics (Big Numbers)     │\n├─────────────────────────────────────┤\n│  Key Trends (Time Series)           │\n├─────────────────────────────────────┤\n│  Detailed Metrics (Tables\u002FHeatmaps) │\n└─────────────────────────────────────┘\n```\n\n### 2. RED Method (Services)\n- **Rate** - Requests per second\n- **Errors** - Error rate\n- **Duration** - Latency\u002Fresponse time\n\n### 3. USE Method (Resources)\n- **Utilization** - % time resource is busy\n- **Saturation** - Queue length\u002Fwait time\n- **Errors** - Error count\n\n## Dashboard Structure\n\n### API Monitoring Dashboard\n\n```json\n{\n  \"dashboard\": {\n    \"title\": \"API Monitoring\",\n    \"tags\": [\"api\", \"production\"],\n    \"timezone\": \"browser\",\n    \"refresh\": \"30s\",\n    \"panels\": [\n      {\n        \"title\": \"Request Rate\",\n        \"type\": \"graph\",\n        \"targets\": [\n          {\n            \"expr\": \"sum(rate(http_requests_total[5m])) by (service)\",\n            \"legendFormat\": \"{{service}}\"\n          }\n        ],\n        \"gridPos\": {\"x\": 0, \"y\": 0, \"w\": 12, \"h\": 8}\n      },\n      {\n        \"title\": \"Error Rate %\",\n        \"type\": \"graph\",\n        \"targets\": [\n          {\n            \"expr\": \"(sum(rate(http_requests_total{status=~\\\"5..\\\"}[5m])) \u002F sum(rate(http_requests_total[5m]))) * 100\",\n            \"legendFormat\": \"Error Rate\"\n          }\n        ],\n        \"alert\": {\n          \"conditions\": [\n            {\n              \"evaluator\": {\"params\": [5], \"type\": \"gt\"},\n              \"operator\": {\"type\": \"and\"},\n              \"query\": {\"params\": [\"A\", \"5m\", \"now\"]},\n              \"type\": \"query\"\n            }\n          ]\n        },\n        \"gridPos\": {\"x\": 12, \"y\": 0, \"w\": 12, \"h\": 8}\n      },\n      {\n        \"title\": \"P95 Latency\",\n        \"type\": \"graph\",\n        \"targets\": [\n          {\n            \"expr\": \"histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))\",\n            \"legendFormat\": \"{{service}}\"\n          }\n        ],\n        \"gridPos\": {\"x\": 0, \"y\": 8, \"w\": 24, \"h\": 8}\n      }\n    ]\n  }\n}\n```\n\n**Reference:** See `assets\u002Fapi-dashboard.json`\n\n## Panel Types\n\n### 1. Stat Panel (Single Value)\n```json\n{\n  \"type\": \"stat\",\n  \"title\": \"Total Requests\",\n  \"targets\": [{\n    \"expr\": \"sum(http_requests_total)\"\n  }],\n  \"options\": {\n    \"reduceOptions\": {\n      \"values\": false,\n      \"calcs\": [\"lastNotNull\"]\n    },\n    \"orientation\": \"auto\",\n    \"textMode\": \"auto\",\n    \"colorMode\": \"value\"\n  },\n  \"fieldConfig\": {\n    \"defaults\": {\n      \"thresholds\": {\n        \"mode\": \"absolute\",\n        \"steps\": [\n          {\"value\": 0, \"color\": \"green\"},\n          {\"value\": 80, \"color\": \"yellow\"},\n          {\"value\": 90, \"color\": \"red\"}\n        ]\n      }\n    }\n  }\n}\n```\n\n### 2. Time Series Graph\n```json\n{\n  \"type\": \"graph\",\n  \"title\": \"CPU Usage\",\n  \"targets\": [{\n    \"expr\": \"100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\\\"idle\\\"}[5m])) * 100)\"\n  }],\n  \"yaxes\": [\n    {\"format\": \"percent\", \"max\": 100, \"min\": 0},\n    {\"format\": \"short\"}\n  ]\n}\n```\n\n### 3. Table Panel\n```json\n{\n  \"type\": \"table\",\n  \"title\": \"Service Status\",\n  \"targets\": [{\n    \"expr\": \"up\",\n    \"format\": \"table\",\n    \"instant\": true\n  }],\n  \"transformations\": [\n    {\n      \"id\": \"organize\",\n      \"options\": {\n        \"excludeByName\": {\"Time\": true},\n        \"indexByName\": {},\n        \"renameByName\": {\n          \"instance\": \"Instance\",\n          \"job\": \"Service\",\n          \"Value\": \"Status\"\n        }\n      }\n    }\n  ]\n}\n```\n\n### 4. Heatmap\n```json\n{\n  \"type\": \"heatmap\",\n  \"title\": \"Latency Heatmap\",\n  \"targets\": [{\n    \"expr\": \"sum(rate(http_request_duration_seconds_bucket[5m])) by (le)\",\n    \"format\": \"heatmap\"\n  }],\n  \"dataFormat\": \"tsbuckets\",\n  \"yAxis\": {\n    \"format\": \"s\"\n  }\n}\n```\n\n## Variables\n\n### Query Variables\n```json\n{\n  \"templating\": {\n    \"list\": [\n      {\n        \"name\": \"namespace\",\n        \"type\": \"query\",\n        \"datasource\": \"Prometheus\",\n        \"query\": \"label_values(kube_pod_info, namespace)\",\n        \"refresh\": 1,\n        \"multi\": false\n      },\n      {\n        \"name\": \"service\",\n        \"type\": \"query\",\n        \"datasource\": \"Prometheus\",\n        \"query\": \"label_values(kube_service_info{namespace=\\\"$namespace\\\"}, service)\",\n        \"refresh\": 1,\n        \"multi\": true\n      }\n    ]\n  }\n}\n```\n\n### Use Variables in Queries\n```\nsum(rate(http_requests_total{namespace=\"$namespace\", service=~\"$service\"}[5m]))\n```\n\n## Alerts in Dashboards\n\n```json\n{\n  \"alert\": {\n    \"name\": \"High Error Rate\",\n    \"conditions\": [\n      {\n        \"evaluator\": {\n          \"params\": [5],\n          \"type\": \"gt\"\n        },\n        \"operator\": {\"type\": \"and\"},\n        \"query\": {\n          \"params\": [\"A\", \"5m\", \"now\"]\n        },\n        \"reducer\": {\"type\": \"avg\"},\n        \"type\": \"query\"\n      }\n    ],\n    \"executionErrorState\": \"alerting\",\n    \"for\": \"5m\",\n    \"frequency\": \"1m\",\n    \"message\": \"Error rate is above 5%\",\n    \"noDataState\": \"no_data\",\n    \"notifications\": [\n      {\"uid\": \"slack-channel\"}\n    ]\n  }\n}\n```\n\n## Dashboard Provisioning\n\n**dashboards.yml:**\n```yaml\napiVersion: 1\n\nproviders:\n  - name: 'default'\n    orgId: 1\n    folder: 'General'\n    type: file\n    disableDeletion: false\n    updateIntervalSeconds: 10\n    allowUiUpdates: true\n    options:\n      path: \u002Fetc\u002Fgrafana\u002Fdashboards\n```\n\n## Common Dashboard Patterns\n\n### Infrastructure Dashboard\n\n**Key Panels:**\n- CPU utilization per node\n- Memory usage per node\n- Disk I\u002FO\n- Network traffic\n- Pod count by namespace\n- Node status\n\n**Reference:** See `assets\u002Finfrastructure-dashboard.json`\n\n### Database Dashboard\n\n**Key Panels:**\n- Queries per second\n- Connection pool usage\n- Query latency (P50, P95, P99)\n- Active connections\n- Database size\n- Replication lag\n- Slow queries\n\n**Reference:** See `assets\u002Fdatabase-dashboard.json`\n\n### Application Dashboard\n\n**Key Panels:**\n- Request rate\n- Error rate\n- Response time (percentiles)\n- Active users\u002Fsessions\n- Cache hit rate\n- Queue length\n\n## Best Practices\n\n1. **Start with templates** (Grafana community dashboards)\n2. **Use consistent naming** for panels and variables\n3. **Group related metrics** in rows\n4. **Set appropriate time ranges** (default: Last 6 hours)\n5. **Use variables** for flexibility\n6. **Add panel descriptions** for context\n7. **Configure units** correctly\n8. **Set meaningful thresholds** for colors\n9. **Use consistent colors** across dashboards\n10. **Test with different time ranges**\n\n## Dashboard as Code\n\n### Terraform Provisioning\n\n```hcl\nresource \"grafana_dashboard\" \"api_monitoring\" {\n  config_json = file(\"${path.module}\u002Fdashboards\u002Fapi-monitoring.json\")\n  folder      = grafana_folder.monitoring.id\n}\n\nresource \"grafana_folder\" \"monitoring\" {\n  title = \"Production Monitoring\"\n}\n```\n\n### Ansible Provisioning\n\n```yaml\n- name: Deploy Grafana dashboards\n  copy:\n    src: \"{{ item }}\"\n    dest: \u002Fetc\u002Fgrafana\u002Fdashboards\u002F\n  with_fileglob:\n    - \"dashboards\u002F*.json\"\n  notify: restart grafana\n```\n\n## Reference Files\n\n- `assets\u002Fapi-dashboard.json` - API monitoring dashboard\n- `assets\u002Finfrastructure-dashboard.json` - Infrastructure dashboard\n- `assets\u002Fdatabase-dashboard.json` - Database monitoring dashboard\n- `references\u002Fdashboard-design.md` - Dashboard design guide\n\n## Related Skills\n\n- `prometheus-configuration` - For metric collection\n- `slo-implementation` - For SLO dashboards\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,241,1579,"2026-05-16 13:21:21",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"效率工具","productivity","mdi-lightning-bolt-outline","文档处理、数据分析、自动化工作流",4,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"数据分析","data-analysis","mdi-chart-bar","数据可视化、统计分析",2,30,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"bed639b8-f74d-4ee4-98af-164065da495b","1.0.0","grafana-dashboards.zip",3242,"uploads\u002Fskills\u002Fbdad19f0-369e-427c-9e8f-29e6fa6ef9dd\u002Fgrafana-dashboards.zip","85fba7a299de87cc785ca812e84334006cf018a2ad960219c93fe040388fdb3e","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":8976}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]