[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-ac278dd8-4150-49d9-9ecc-a2401e0da58a":3,"$f1ALvT76HLoPvUgTbAbZIqOuqesN1Cy-nv1MvmFjYx0s":44},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"ac278dd8-4150-49d9-9ecc-a2401e0da58a","prometheus-configuration","Prometheus配置、指标收集、抓取配置和记录规则的完整指南。","cat_life_career","mod_other","sickn33,other","---\nname: prometheus-configuration\ndescription: \"Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Prometheus Configuration\n\nComplete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.\n\n## Do not use this skill when\n\n- The task is unrelated to prometheus configuration\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Purpose\n\nConfigure Prometheus for comprehensive metric collection, alerting, and monitoring of infrastructure and applications.\n\n## Use this skill when\n\n- Set up Prometheus monitoring\n- Configure metric scraping\n- Create recording rules\n- Design alert rules\n- Implement service discovery\n\n## Prometheus Architecture\n\n```\n┌──────────────┐\n│ Applications │ ← Instrumented with client libraries\n└──────┬───────┘\n       │ \u002Fmetrics endpoint\n       ↓\n┌──────────────┐\n│  Prometheus  │ ← Scrapes metrics periodically\n│    Server    │\n└──────┬───────┘\n       │\n       ├─→ AlertManager (alerts)\n       ├─→ Grafana (visualization)\n       └─→ Long-term storage (Thanos\u002FCortex)\n```\n\n## Installation\n\n### Kubernetes with Helm\n\n```bash\nhelm repo add prometheus-community https:\u002F\u002Fprometheus-community.github.io\u002Fhelm-charts\nhelm repo update\n\nhelm install prometheus prometheus-community\u002Fkube-prometheus-stack \\\n  --namespace monitoring \\\n  --create-namespace \\\n  --set prometheus.prometheusSpec.retention=30d \\\n  --set prometheus.prometheusSpec.storageVolumeSize=50Gi\n```\n\n### Docker Compose\n\n```yaml\nversion: '3.8'\nservices:\n  prometheus:\n    image: prom\u002Fprometheus:latest\n    ports:\n      - \"9090:9090\"\n    volumes:\n      - .\u002Fprometheus.yml:\u002Fetc\u002Fprometheus\u002Fprometheus.yml\n      - prometheus-data:\u002Fprometheus\n    command:\n      - '--config.file=\u002Fetc\u002Fprometheus\u002Fprometheus.yml'\n      - '--storage.tsdb.path=\u002Fprometheus'\n      - '--storage.tsdb.retention.time=30d'\n\nvolumes:\n  prometheus-data:\n```\n\n## Configuration File\n\n**prometheus.yml:**\n```yaml\nglobal:\n  scrape_interval: 15s\n  evaluation_interval: 15s\n  external_labels:\n    cluster: 'production'\n    region: 'us-west-2'\n\n# Alertmanager configuration\nalerting:\n  alertmanagers:\n    - static_configs:\n        - targets:\n          - alertmanager:9093\n\n# Load rules files\nrule_files:\n  - \u002Fetc\u002Fprometheus\u002Frules\u002F*.yml\n\n# Scrape configurations\nscrape_configs:\n  # Prometheus itself\n  - job_name: 'prometheus'\n    static_configs:\n      - targets: ['localhost:9090']\n\n  # Node exporters\n  - job_name: 'node-exporter'\n    static_configs:\n      - targets:\n        - 'node1:9100'\n        - 'node2:9100'\n        - 'node3:9100'\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        regex: '([^:]+)(:[0-9]+)?'\n        replacement: '${1}'\n\n  # Kubernetes pods with annotations\n  - job_name: 'kubernetes-pods'\n    kubernetes_sd_configs:\n      - role: pod\n    relabel_configs:\n      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]\n        action: keep\n        regex: true\n      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]\n        action: replace\n        target_label: __metrics_path__\n        regex: (.+)\n      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]\n        action: replace\n        regex: ([^:]+)(?::\\d+)?;(\\d+)\n        replacement: $1:$2\n        target_label: __address__\n      - source_labels: [__meta_kubernetes_namespace]\n        action: replace\n        target_label: namespace\n      - source_labels: [__meta_kubernetes_pod_name]\n        action: replace\n        target_label: pod\n\n  # Application metrics\n  - job_name: 'my-app'\n    static_configs:\n      - targets:\n        - 'app1.example.com:9090'\n        - 'app2.example.com:9090'\n    metrics_path: '\u002Fmetrics'\n    scheme: 'https'\n    tls_config:\n      ca_file: \u002Fetc\u002Fprometheus\u002Fca.crt\n      cert_file: \u002Fetc\u002Fprometheus\u002Fclient.crt\n      key_file: \u002Fetc\u002Fprometheus\u002Fclient.key\n```\n\n**Reference:** See `assets\u002Fprometheus.yml.template`\n\n## Scrape Configurations\n\n### Static Targets\n\n```yaml\nscrape_configs:\n  - job_name: 'static-targets'\n    static_configs:\n      - targets: ['host1:9100', 'host2:9100']\n        labels:\n          env: 'production'\n          region: 'us-west-2'\n```\n\n### File-based Service Discovery\n\n```yaml\nscrape_configs:\n  - job_name: 'file-sd'\n    file_sd_configs:\n      - files:\n        - \u002Fetc\u002Fprometheus\u002Ftargets\u002F*.json\n        - \u002Fetc\u002Fprometheus\u002Ftargets\u002F*.yml\n        refresh_interval: 5m\n```\n\n**targets\u002Fproduction.json:**\n```json\n[\n  {\n    \"targets\": [\"app1:9090\", \"app2:9090\"],\n    \"labels\": {\n      \"env\": \"production\",\n      \"service\": \"api\"\n    }\n  }\n]\n```\n\n### Kubernetes Service Discovery\n\n```yaml\nscrape_configs:\n  - job_name: 'kubernetes-services'\n    kubernetes_sd_configs:\n      - role: service\n    relabel_configs:\n      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]\n        action: keep\n        regex: true\n      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]\n        action: replace\n        target_label: __scheme__\n        regex: (https?)\n      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]\n        action: replace\n        target_label: __metrics_path__\n        regex: (.+)\n```\n\n**Reference:** See `references\u002Fscrape-configs.md`\n\n## Recording Rules\n\nCreate pre-computed metrics for frequently queried expressions:\n\n```yaml\n# \u002Fetc\u002Fprometheus\u002Frules\u002Frecording_rules.yml\ngroups:\n  - name: api_metrics\n    interval: 15s\n    rules:\n      # HTTP request rate per service\n      - record: job:http_requests:rate5m\n        expr: sum by (job) (rate(http_requests_total[5m]))\n\n      # Error rate percentage\n      - record: job:http_requests_errors:rate5m\n        expr: sum by (job) (rate(http_requests_total{status=~\"5..\"}[5m]))\n\n      - record: job:http_requests_error_rate:percentage\n        expr: |\n          (job:http_requests_errors:rate5m \u002F job:http_requests:rate5m) * 100\n\n      # P95 latency\n      - record: job:http_request_duration:p95\n        expr: |\n          histogram_quantile(0.95,\n            sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))\n          )\n\n  - name: resource_metrics\n    interval: 30s\n    rules:\n      # CPU utilization percentage\n      - record: instance:node_cpu:utilization\n        expr: |\n          100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)\n\n      # Memory utilization percentage\n      - record: instance:node_memory:utilization\n        expr: |\n          100 - ((node_memory_MemAvailable_bytes \u002F node_memory_MemTotal_bytes) * 100)\n\n      # Disk usage percentage\n      - record: instance:node_disk:utilization\n        expr: |\n          100 - ((node_filesystem_avail_bytes \u002F node_filesystem_size_bytes) * 100)\n```\n\n**Reference:** See `references\u002Frecording-rules.md`\n\n## Alert Rules\n\n```yaml\n# \u002Fetc\u002Fprometheus\u002Frules\u002Falert_rules.yml\ngroups:\n  - name: availability\n    interval: 30s\n    rules:\n      - alert: ServiceDown\n        expr: up{job=\"my-app\"} == 0\n        for: 1m\n        labels:\n          severity: critical\n        annotations:\n          summary: \"Service {{ $labels.instance }} is down\"\n          description: \"{{ $labels.job }} has been down for more than 1 minute\"\n\n      - alert: HighErrorRate\n        expr: job:http_requests_error_rate:percentage > 5\n        for: 5m\n        labels:\n          severity: warning\n        annotations:\n          summary: \"High error rate for {{ $labels.job }}\"\n          description: \"Error rate is {{ $value }}% (threshold: 5%)\"\n\n      - alert: HighLatency\n        expr: job:http_request_duration:p95 > 1\n        for: 5m\n        labels:\n          severity: warning\n        annotations:\n          summary: \"High latency for {{ $labels.job }}\"\n          description: \"P95 latency is {{ $value }}s (threshold: 1s)\"\n\n  - name: resources\n    interval: 1m\n    rules:\n      - alert: HighCPUUsage\n        expr: instance:node_cpu:utilization > 80\n        for: 5m\n        labels:\n          severity: warning\n        annotations:\n          summary: \"High CPU usage on {{ $labels.instance }}\"\n          description: \"CPU usage is {{ $value }}%\"\n\n      - alert: HighMemoryUsage\n        expr: instance:node_memory:utilization > 85\n        for: 5m\n        labels:\n          severity: warning\n        annotations:\n          summary: \"High memory usage on {{ $labels.instance }}\"\n          description: \"Memory usage is {{ $value }}%\"\n\n      - alert: DiskSpaceLow\n        expr: instance:node_disk:utilization > 90\n        for: 5m\n        labels:\n          severity: critical\n        annotations:\n          summary: \"Low disk space on {{ $labels.instance }}\"\n          description: \"Disk usage is {{ $value }}%\"\n```\n\n## Validation\n\n```bash\n# Validate configuration\npromtool check config prometheus.yml\n\n# Validate rules\npromtool check rules \u002Fetc\u002Fprometheus\u002Frules\u002F*.yml\n\n# Test query\npromtool query instant http:\u002F\u002Flocalhost:9090 'up'\n```\n\n**Reference:** See `scripts\u002Fvalidate-prometheus.sh`\n\n## Best Practices\n\n1. **Use consistent naming** for metrics (prefix_name_unit)\n2. **Set appropriate scrape intervals** (15-60s typical)\n3. **Use recording rules** for expensive queries\n4. **Implement high availability** (multiple Prometheus instances)\n5. **Configure retention** based on storage capacity\n6. **Use relabeling** for metric cleanup\n7. **Monitor Prometheus itself**\n8. **Implement federation** for large deployments\n9. **Use Thanos\u002FCortex** for long-term storage\n10. **Document custom metrics**\n\n## Troubleshooting\n\n**Check scrape targets:**\n```bash\ncurl http:\u002F\u002Flocalhost:9090\u002Fapi\u002Fv1\u002Ftargets\n```\n\n**Check configuration:**\n```bash\ncurl http:\u002F\u002Flocalhost:9090\u002Fapi\u002Fv1\u002Fstatus\u002Fconfig\n```\n\n**Test query:**\n```bash\ncurl 'http:\u002F\u002Flocalhost:9090\u002Fapi\u002Fv1\u002Fquery?query=up'\n```\n\n## Reference Files\n\n- `assets\u002Fprometheus.yml.template` - Complete configuration template\n- `references\u002Fscrape-configs.md` - Scrape configuration patterns\n- `references\u002Frecording-rules.md` - Recording rule examples\n- `scripts\u002Fvalidate-prometheus.sh` - Validation script\n\n## Related Skills\n\n- `grafana-dashboards` - For visualization\n- `slo-implementation` - For SLO monitoring\n- `distributed-tracing` - For request tracing\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,175,772,"2026-05-16 13:35:27",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":43},"7bcaf146-26b0-4c3c-836e-082af70117bf","1.0.0","prometheus-configuration.zip",3424,"uploads\u002Fskills\u002Fac278dd8-4150-49d9-9ecc-a2401e0da58a\u002Fprometheus-configuration.zip","874c5ad30e060429141fdc60110a268bea1bb5f7cec1e504cbf70c170794556a","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10947}]","2026-05-16 13:35:28",{"code":45,"message":46,"data":47},200,"success",{"items":48,"stats":49,"page":52},[],{"averageRating":50,"totalRatings":50,"ratingCounts":51},0,[50,50,50,50,50],{"limit":53,"offset":50,"hasMore":54,"nextOffset":53,"ratedOnly":16},15,false]