[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-20eb0cb9-2d6f-404b-a532-fc11dda0a66a":3,"$fMlPJeZyyWd3UumkOiJg_WEpFWIZ6d3h_1pmjXpWMgcY":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"20eb0cb9-2d6f-404b-a532-fc11dda0a66a","service-mesh-observability","Istio、Linkerd和服务网格部署的可观测性模式完整指南。","cat_life_career","mod_other","sickn33,other","---\nname: service-mesh-observability\ndescription: \"Complete guide to observability patterns for Istio, Linkerd, and service mesh deployments.\"\nrisk: critical\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Service Mesh Observability\n\nComplete guide to observability patterns for Istio, Linkerd, and service mesh deployments.\n\n## Do not use this skill when\n\n- The task is unrelated to service mesh observability\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Use this skill when\n\n- Setting up distributed tracing across services\n- Implementing service mesh metrics and dashboards\n- Debugging latency and error issues\n- Defining SLOs for service communication\n- Visualizing service dependencies\n- Troubleshooting mesh connectivity\n\n## Core Concepts\n\n### 1. Three Pillars of Observability\n\n```\n┌─────────────────────────────────────────────────────┐\n│                  Observability                       │\n├─────────────────┬─────────────────┬─────────────────┤\n│     Metrics     │     Traces      │      Logs       │\n│                 │                 │                 │\n│ • Request rate  │ • Span context  │ • Access logs   │\n│ • Error rate    │ • Latency       │ • Error details │\n│ • Latency P50   │ • Dependencies  │ • Debug info    │\n│ • Saturation    │ • Bottlenecks   │ • Audit trail   │\n└─────────────────┴─────────────────┴─────────────────┘\n```\n\n### 2. Golden Signals for Mesh\n\n| Signal | Description | Alert Threshold |\n|--------|-------------|-----------------|\n| **Latency** | Request duration P50, P99 | P99 > 500ms |\n| **Traffic** | Requests per second | Anomaly detection |\n| **Errors** | 5xx error rate | > 1% |\n| **Saturation** | Resource utilization | > 80% |\n\n## Templates\n\n### Template 1: Istio with Prometheus & Grafana\n\n```yaml\n# Install Prometheus\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: prometheus\n  namespace: istio-system\ndata:\n  prometheus.yml: |\n    global:\n      scrape_interval: 15s\n    scrape_configs:\n      - job_name: 'istio-mesh'\n        kubernetes_sd_configs:\n          - role: endpoints\n            namespaces:\n              names:\n                - istio-system\n        relabel_configs:\n          - source_labels: [__meta_kubernetes_service_name]\n            action: keep\n            regex: istio-telemetry\n---\n# ServiceMonitor for Prometheus Operator\napiVersion: monitoring.coreos.com\u002Fv1\nkind: ServiceMonitor\nmetadata:\n  name: istio-mesh\n  namespace: istio-system\nspec:\n  selector:\n    matchLabels:\n      app: istiod\n  endpoints:\n    - port: http-monitoring\n      interval: 15s\n```\n\n### Template 2: Key Istio Metrics Queries\n\n```promql\n# Request rate by service\nsum(rate(istio_requests_total{reporter=\"destination\"}[5m])) by (destination_service_name)\n\n# Error rate (5xx)\nsum(rate(istio_requests_total{reporter=\"destination\", response_code=~\"5..\"}[5m]))\n  \u002F sum(rate(istio_requests_total{reporter=\"destination\"}[5m])) * 100\n\n# P99 latency\nhistogram_quantile(0.99,\n  sum(rate(istio_request_duration_milliseconds_bucket{reporter=\"destination\"}[5m]))\n  by (le, destination_service_name))\n\n# TCP connections\nsum(istio_tcp_connections_opened_total{reporter=\"destination\"}) by (destination_service_name)\n\n# Request size\nhistogram_quantile(0.99,\n  sum(rate(istio_request_bytes_bucket{reporter=\"destination\"}[5m]))\n  by (le, destination_service_name))\n```\n\n### Template 3: Jaeger Distributed Tracing\n\n```yaml\n# Jaeger installation for Istio\napiVersion: install.istio.io\u002Fv1alpha1\nkind: IstioOperator\nspec:\n  meshConfig:\n    enableTracing: true\n    defaultConfig:\n      tracing:\n        sampling: 100.0  # 100% in dev, lower in prod\n        zipkin:\n          address: jaeger-collector.istio-system:9411\n---\n# Jaeger deployment\napiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n  name: jaeger\n  namespace: istio-system\nspec:\n  selector:\n    matchLabels:\n      app: jaeger\n  template:\n    metadata:\n      labels:\n        app: jaeger\n    spec:\n      containers:\n        - name: jaeger\n          image: jaegertracing\u002Fall-in-one:1.50\n          ports:\n            - containerPort: 5775   # UDP\n            - containerPort: 6831   # Thrift\n            - containerPort: 6832   # Thrift\n            - containerPort: 5778   # Config\n            - containerPort: 16686  # UI\n            - containerPort: 14268  # HTTP\n            - containerPort: 14250  # gRPC\n            - containerPort: 9411   # Zipkin\n          env:\n            - name: COLLECTOR_ZIPKIN_HOST_PORT\n              value: \":9411\"\n```\n\n### Template 4: Linkerd Viz Dashboard\n\n```bash\n# Install Linkerd viz extension\nlinkerd viz install | kubectl apply -f -\n\n# Access dashboard\nlinkerd viz dashboard\n\n# CLI commands for observability\n# Top requests\nlinkerd viz top deploy\u002Fmy-app\n\n# Per-route metrics\nlinkerd viz routes deploy\u002Fmy-app --to deploy\u002Fbackend\n\n# Live traffic inspection\nlinkerd viz tap deploy\u002Fmy-app --to deploy\u002Fbackend\n\n# Service edges (dependencies)\nlinkerd viz edges deployment -n my-namespace\n```\n\n### Template 5: Grafana Dashboard JSON\n\n```json\n{\n  \"dashboard\": {\n    \"title\": \"Service Mesh Overview\",\n    \"panels\": [\n      {\n        \"title\": \"Request Rate\",\n        \"type\": \"graph\",\n        \"targets\": [\n          {\n            \"expr\": \"sum(rate(istio_requests_total{reporter=\\\"destination\\\"}[5m])) by (destination_service_name)\",\n            \"legendFormat\": \"{{destination_service_name}}\"\n          }\n        ]\n      },\n      {\n        \"title\": \"Error Rate\",\n        \"type\": \"gauge\",\n        \"targets\": [\n          {\n            \"expr\": \"sum(rate(istio_requests_total{response_code=~\\\"5..\\\"}[5m])) \u002F sum(rate(istio_requests_total[5m])) * 100\"\n          }\n        ],\n        \"fieldConfig\": {\n          \"defaults\": {\n            \"thresholds\": {\n              \"steps\": [\n                {\"value\": 0, \"color\": \"green\"},\n                {\"value\": 1, \"color\": \"yellow\"},\n                {\"value\": 5, \"color\": \"red\"}\n              ]\n            }\n          }\n        }\n      },\n      {\n        \"title\": \"P99 Latency\",\n        \"type\": \"graph\",\n        \"targets\": [\n          {\n            \"expr\": \"histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{reporter=\\\"destination\\\"}[5m])) by (le, destination_service_name))\",\n            \"legendFormat\": \"{{destination_service_name}}\"\n          }\n        ]\n      },\n      {\n        \"title\": \"Service Topology\",\n        \"type\": \"nodeGraph\",\n        \"targets\": [\n          {\n            \"expr\": \"sum(rate(istio_requests_total{reporter=\\\"destination\\\"}[5m])) by (source_workload, destination_service_name)\"\n          }\n        ]\n      }\n    ]\n  }\n}\n```\n\n### Template 6: Kiali Service Mesh Visualization\n\n```yaml\n# Kiali installation\napiVersion: kiali.io\u002Fv1alpha1\nkind: Kiali\nmetadata:\n  name: kiali\n  namespace: istio-system\nspec:\n  auth:\n    strategy: anonymous  # or openid, token\n  deployment:\n    accessible_namespaces:\n      - \"**\"\n  external_services:\n    prometheus:\n      url: http:\u002F\u002Fprometheus.istio-system:9090\n    tracing:\n      url: http:\u002F\u002Fjaeger-query.istio-system:16686\n    grafana:\n      url: http:\u002F\u002Fgrafana.istio-system:3000\n```\n\n### Template 7: OpenTelemetry Integration\n\n```yaml\n# OpenTelemetry Collector for mesh\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: otel-collector-config\ndata:\n  config.yaml: |\n    receivers:\n      otlp:\n        protocols:\n          grpc:\n            endpoint: 0.0.0.0:4317\n          http:\n            endpoint: 0.0.0.0:4318\n      zipkin:\n        endpoint: 0.0.0.0:9411\n\n    processors:\n      batch:\n        timeout: 10s\n\n    exporters:\n      jaeger:\n        endpoint: jaeger-collector:14250\n        tls:\n          insecure: true\n      prometheus:\n        endpoint: 0.0.0.0:8889\n\n    service:\n      pipelines:\n        traces:\n          receivers: [otlp, zipkin]\n          processors: [batch]\n          exporters: [jaeger]\n        metrics:\n          receivers: [otlp]\n          processors: [batch]\n          exporters: [prometheus]\n---\n# Istio Telemetry v2 with OTel\napiVersion: telemetry.istio.io\u002Fv1alpha1\nkind: Telemetry\nmetadata:\n  name: mesh-default\n  namespace: istio-system\nspec:\n  tracing:\n    - providers:\n        - name: otel\n      randomSamplingPercentage: 10\n```\n\n## Alerting Rules\n\n```yaml\napiVersion: monitoring.coreos.com\u002Fv1\nkind: PrometheusRule\nmetadata:\n  name: mesh-alerts\n  namespace: istio-system\nspec:\n  groups:\n    - name: mesh.rules\n      rules:\n        - alert: HighErrorRate\n          expr: |\n            sum(rate(istio_requests_total{response_code=~\"5..\"}[5m])) by (destination_service_name)\n            \u002F sum(rate(istio_requests_total[5m])) by (destination_service_name) > 0.05\n          for: 5m\n          labels:\n            severity: critical\n          annotations:\n            summary: \"High error rate for {{ $labels.destination_service_name }}\"\n\n        - alert: HighLatency\n          expr: |\n            histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[5m]))\n            by (le, destination_service_name)) > 1000\n          for: 5m\n          labels:\n            severity: warning\n          annotations:\n            summary: \"High P99 latency for {{ $labels.destination_service_name }}\"\n\n        - alert: MeshCertExpiring\n          expr: |\n            (certmanager_certificate_expiration_timestamp_seconds - time()) \u002F 86400 \u003C 7\n          labels:\n            severity: warning\n          annotations:\n            summary: \"Mesh certificate expiring in less than 7 days\"\n```\n\n## Best Practices\n\n### Do's\n- **Sample appropriately** - 100% in dev, 1-10% in prod\n- **Use trace context** - Propagate headers consistently\n- **Set up alerts** - For golden signals\n- **Correlate metrics\u002Ftraces** - Use exemplars\n- **Retain strategically** - Hot\u002Fcold storage tiers\n\n### Don'ts\n- **Don't over-sample** - Storage costs add up\n- **Don't ignore cardinality** - Limit label values\n- **Don't skip dashboards** - Visualize dependencies\n- **Don't forget costs** - Monitor observability costs\n\n## Resources\n\n- [Istio Observability](https:\u002F\u002Fistio.io\u002Flatest\u002Fdocs\u002Ftasks\u002Fobservability\u002F)\n- [Linkerd Observability](https:\u002F\u002Flinkerd.io\u002F2.14\u002Ffeatures\u002Fdashboard\u002F)\n- [OpenTelemetry](https:\u002F\u002Fopentelemetry.io\u002F)\n- [Kiali](https:\u002F\u002Fkiali.io\u002F)\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,64,1115,"2026-05-16 13:40:09",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"1cf24491-0e5e-41ae-8441-0594a085b76f","1.0.0","service-mesh-observability.zip",3587,"uploads\u002Fskills\u002F20eb0cb9-2d6f-404b-a532-fc11dda0a66a\u002Fservice-mesh-observability.zip","255cf85532cdf54806c5d81355986bff8f7615503ff5706778cc2716030bc6be","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":11069}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]