[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-50525e5d-b316-4cf8-aba2-bd9904162a19":3,"$frk9Q2mqYtvoDzTrSy6n2wbZn7CwT7ndl9gzC-nu6KUE":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"50525e5d-b316-4cf8-aba2-bd9904162a19","distributed-tracing","使用Jaeger和Tempo实现分布式跟踪，以实现跨微服务的请求流可见性。","cat_life_career","mod_other","sickn33,other","---\nname: distributed-tracing\ndescription: \"Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.\"\nrisk: critical\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Distributed Tracing\n\nImplement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.\n\n## Do not use this skill when\n\n- The task is unrelated to distributed tracing\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Purpose\n\nTrack requests across distributed systems to understand latency, dependencies, and failure points.\n\n## Use this skill when\n\n- Debug latency issues\n- Understand service dependencies\n- Identify bottlenecks\n- Trace error propagation\n- Analyze request paths\n\n## Distributed Tracing Concepts\n\n### Trace Structure\n```\nTrace (Request ID: abc123)\n  ↓\nSpan (frontend) [100ms]\n  ↓\nSpan (api-gateway) [80ms]\n  ├→ Span (auth-service) [10ms]\n  └→ Span (user-service) [60ms]\n      └→ Span (database) [40ms]\n```\n\n### Key Components\n- **Trace** - End-to-end request journey\n- **Span** - Single operation within a trace\n- **Context** - Metadata propagated between services\n- **Tags** - Key-value pairs for filtering\n- **Logs** - Timestamped events within a span\n\n## Jaeger Setup\n\n### Kubernetes Deployment\n\n```bash\n# Deploy Jaeger Operator\nkubectl create namespace observability\nkubectl create -f https:\u002F\u002Fgithub.com\u002Fjaegertracing\u002Fjaeger-operator\u002Freleases\u002Fdownload\u002Fv1.51.0\u002Fjaeger-operator.yaml -n observability\n\n# Deploy Jaeger instance\nkubectl apply -f - \u003C\u003CEOF\napiVersion: jaegertracing.io\u002Fv1\nkind: Jaeger\nmetadata:\n  name: jaeger\n  namespace: observability\nspec:\n  strategy: production\n  storage:\n    type: elasticsearch\n    options:\n      es:\n        server-urls: http:\u002F\u002Felasticsearch:9200\n  ingress:\n    enabled: true\nEOF\n```\n\n### Docker Compose\n\n```yaml\nversion: '3.8'\nservices:\n  jaeger:\n    image: jaegertracing\u002Fall-in-one:latest\n    ports:\n      - \"5775:5775\u002Fudp\"\n      - \"6831:6831\u002Fudp\"\n      - \"6832:6832\u002Fudp\"\n      - \"5778:5778\"\n      - \"16686:16686\"  # UI\n      - \"14268:14268\"  # Collector\n      - \"14250:14250\"  # gRPC\n      - \"9411:9411\"    # Zipkin\n    environment:\n      - COLLECTOR_ZIPKIN_HOST_PORT=:9411\n```\n\n**Reference:** See `references\u002Fjaeger-setup.md`\n\n## Application Instrumentation\n\n### OpenTelemetry (Recommended)\n\n#### Python (Flask)\n```python\nfrom opentelemetry import trace\nfrom opentelemetry.exporter.jaeger.thrift import JaegerExporter\nfrom opentelemetry.sdk.resources import SERVICE_NAME, Resource\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\nfrom opentelemetry.instrumentation.flask import FlaskInstrumentor\nfrom flask import Flask\n\n# Initialize tracer\nresource = Resource(attributes={SERVICE_NAME: \"my-service\"})\nprovider = TracerProvider(resource=resource)\nprocessor = BatchSpanProcessor(JaegerExporter(\n    agent_host_name=\"jaeger\",\n    agent_port=6831,\n))\nprovider.add_span_processor(processor)\ntrace.set_tracer_provider(provider)\n\n# Instrument Flask\napp = Flask(__name__)\nFlaskInstrumentor().instrument_app(app)\n\n@app.route('\u002Fapi\u002Fusers')\ndef get_users():\n    tracer = trace.get_tracer(__name__)\n\n    with tracer.start_as_current_span(\"get_users\") as span:\n        span.set_attribute(\"user.count\", 100)\n        # Business logic\n        users = fetch_users_from_db()\n        return {\"users\": users}\n\ndef fetch_users_from_db():\n    tracer = trace.get_tracer(__name__)\n\n    with tracer.start_as_current_span(\"database_query\") as span:\n        span.set_attribute(\"db.system\", \"postgresql\")\n        span.set_attribute(\"db.statement\", \"SELECT * FROM users\")\n        # Database query\n        return query_database()\n```\n\n#### Node.js (Express)\n```javascript\nconst { NodeTracerProvider } = require('@opentelemetry\u002Fsdk-trace-node');\nconst { JaegerExporter } = require('@opentelemetry\u002Fexporter-jaeger');\nconst { BatchSpanProcessor } = require('@opentelemetry\u002Fsdk-trace-base');\nconst { registerInstrumentations } = require('@opentelemetry\u002Finstrumentation');\nconst { HttpInstrumentation } = require('@opentelemetry\u002Finstrumentation-http');\nconst { ExpressInstrumentation } = require('@opentelemetry\u002Finstrumentation-express');\n\n\u002F\u002F Initialize tracer\nconst provider = new NodeTracerProvider({\n  resource: { attributes: { 'service.name': 'my-service' } }\n});\n\nconst exporter = new JaegerExporter({\n  endpoint: 'http:\u002F\u002Fjaeger:14268\u002Fapi\u002Ftraces'\n});\n\nprovider.addSpanProcessor(new BatchSpanProcessor(exporter));\nprovider.register();\n\n\u002F\u002F Instrument libraries\nregisterInstrumentations({\n  instrumentations: [\n    new HttpInstrumentation(),\n    new ExpressInstrumentation(),\n  ],\n});\n\nconst express = require('express');\nconst app = express();\n\napp.get('\u002Fapi\u002Fusers', async (req, res) => {\n  const tracer = trace.getTracer('my-service');\n  const span = tracer.startSpan('get_users');\n\n  try {\n    const users = await fetchUsers();\n    span.setAttributes({ 'user.count': users.length });\n    res.json({ users });\n  } finally {\n    span.end();\n  }\n});\n```\n\n#### Go\n```go\npackage main\n\nimport (\n    \"context\"\n    \"go.opentelemetry.io\u002Fotel\"\n    \"go.opentelemetry.io\u002Fotel\u002Fexporters\u002Fjaeger\"\n    \"go.opentelemetry.io\u002Fotel\u002Fsdk\u002Fresource\"\n    sdktrace \"go.opentelemetry.io\u002Fotel\u002Fsdk\u002Ftrace\"\n    semconv \"go.opentelemetry.io\u002Fotel\u002Fsemconv\u002Fv1.4.0\"\n)\n\nfunc initTracer() (*sdktrace.TracerProvider, error) {\n    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(\n        jaeger.WithEndpoint(\"http:\u002F\u002Fjaeger:14268\u002Fapi\u002Ftraces\"),\n    ))\n    if err != nil {\n        return nil, err\n    }\n\n    tp := sdktrace.NewTracerProvider(\n        sdktrace.WithBatcher(exporter),\n        sdktrace.WithResource(resource.NewWithAttributes(\n            semconv.SchemaURL,\n            semconv.ServiceNameKey.String(\"my-service\"),\n        )),\n    )\n\n    otel.SetTracerProvider(tp)\n    return tp, nil\n}\n\nfunc getUsers(ctx context.Context) ([]User, error) {\n    tracer := otel.Tracer(\"my-service\")\n    ctx, span := tracer.Start(ctx, \"get_users\")\n    defer span.End()\n\n    span.SetAttributes(attribute.String(\"user.filter\", \"active\"))\n\n    users, err := fetchUsersFromDB(ctx)\n    if err != nil {\n        span.RecordError(err)\n        return nil, err\n    }\n\n    span.SetAttributes(attribute.Int(\"user.count\", len(users)))\n    return users, nil\n}\n```\n\n**Reference:** See `references\u002Finstrumentation.md`\n\n## Context Propagation\n\n### HTTP Headers\n```\ntraceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01\ntracestate: congo=t61rcWkgMzE\n```\n\n### Propagation in HTTP Requests\n\n#### Python\n```python\nfrom opentelemetry.propagate import inject\n\nheaders = {}\ninject(headers)  # Injects trace context\n\nresponse = requests.get('http:\u002F\u002Fdownstream-service\u002Fapi', headers=headers)\n```\n\n#### Node.js\n```javascript\nconst { propagation } = require('@opentelemetry\u002Fapi');\n\nconst headers = {};\npropagation.inject(context.active(), headers);\n\naxios.get('http:\u002F\u002Fdownstream-service\u002Fapi', { headers });\n```\n\n## Tempo Setup (Grafana)\n\n### Kubernetes Deployment\n\n```yaml\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: tempo-config\ndata:\n  tempo.yaml: |\n    server:\n      http_listen_port: 3200\n\n    distributor:\n      receivers:\n        jaeger:\n          protocols:\n            thrift_http:\n            grpc:\n        otlp:\n          protocols:\n            http:\n            grpc:\n\n    storage:\n      trace:\n        backend: s3\n        s3:\n          bucket: tempo-traces\n          endpoint: s3.amazonaws.com\n\n    querier:\n      frontend_worker:\n        frontend_address: tempo-query-frontend:9095\n---\napiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n  name: tempo\nspec:\n  replicas: 1\n  template:\n    spec:\n      containers:\n      - name: tempo\n        image: grafana\u002Ftempo:latest\n        args:\n          - -config.file=\u002Fetc\u002Ftempo\u002Ftempo.yaml\n        volumeMounts:\n        - name: config\n          mountPath: \u002Fetc\u002Ftempo\n      volumes:\n      - name: config\n        configMap:\n          name: tempo-config\n```\n\n**Reference:** See `assets\u002Fjaeger-config.yaml.template`\n\n## Sampling Strategies\n\n### Probabilistic Sampling\n```yaml\n# Sample 1% of traces\nsampler:\n  type: probabilistic\n  param: 0.01\n```\n\n### Rate Limiting Sampling\n```yaml\n# Sample max 100 traces per second\nsampler:\n  type: ratelimiting\n  param: 100\n```\n\n### Adaptive Sampling\n```python\nfrom opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased\n\n# Sample based on trace ID (deterministic)\nsampler = ParentBased(root=TraceIdRatioBased(0.01))\n```\n\n## Trace Analysis\n\n### Finding Slow Requests\n\n**Jaeger Query:**\n```\nservice=my-service\nduration > 1s\n```\n\n### Finding Errors\n\n**Jaeger Query:**\n```\nservice=my-service\nerror=true\ntags.http.status_code >= 500\n```\n\n### Service Dependency Graph\n\nJaeger automatically generates service dependency graphs showing:\n- Service relationships\n- Request rates\n- Error rates\n- Average latencies\n\n## Best Practices\n\n1. **Sample appropriately** (1-10% in production)\n2. **Add meaningful tags** (user_id, request_id)\n3. **Propagate context** across all service boundaries\n4. **Log exceptions** in spans\n5. **Use consistent naming** for operations\n6. **Monitor tracing overhead** (\u003C1% CPU impact)\n7. **Set up alerts** for trace errors\n8. **Implement distributed context** (baggage)\n9. **Use span events** for important milestones\n10. **Document instrumentation** standards\n\n## Integration with Logging\n\n### Correlated Logs\n```python\nimport logging\nfrom opentelemetry import trace\n\nlogger = logging.getLogger(__name__)\n\ndef process_request():\n    span = trace.get_current_span()\n    trace_id = span.get_span_context().trace_id\n\n    logger.info(\n        \"Processing request\",\n        extra={\"trace_id\": format(trace_id, '032x')}\n    )\n```\n\n## Troubleshooting\n\n**No traces appearing:**\n- Check collector endpoint\n- Verify network connectivity\n- Check sampling configuration\n- Review application logs\n\n**High latency overhead:**\n- Reduce sampling rate\n- Use batch span processor\n- Check exporter configuration\n\n## Reference Files\n\n- `references\u002Fjaeger-setup.md` - Jaeger installation\n- `references\u002Finstrumentation.md` - Instrumentation patterns\n- `assets\u002Fjaeger-config.yaml.template` - Jaeger configuration\n\n## Related Skills\n\n- `prometheus-configuration` - For metrics\n- `grafana-dashboards` - For visualization\n- `slo-implementation` - For latency SLOs\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,222,1257,"2026-05-16 13:15:31",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"fa97166a-8e4d-4eec-80ae-e7561f3d8b2f","1.0.0","distributed-tracing.zip",4333,"uploads\u002Fskills\u002F50525e5d-b316-4cf8-aba2-bd9904162a19\u002Fdistributed-tracing.zip","3e2047ad85093f905980fc513232930634f4f14a399186c4377f4247a3e0bfca","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10840}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]