[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-bf2e9c57-3ef4-4d1e-acda-95c147aabaa5":3,"$fkDk7IQvSd4mHWqP9i4hpZl2MUxNdqKJAvprBwELDtO0":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"bf2e9c57-3ef4-4d1e-acda-95c147aabaa5","machine-learning-ops-ml-pipeline","设计和实现一个完整的机器学习管道：$参数","cat_life_career","mod_other","sickn33,other","---\nname: machine-learning-ops-ml-pipeline\ndescription: \"Design and implement a complete ML pipeline for: $ARGUMENTS\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Machine Learning Pipeline - Multi-Agent MLOps Orchestration\n\nDesign and implement a complete ML pipeline for: $ARGUMENTS\n\n## Use this skill when\n\n- Working on machine learning pipeline - multi-agent mlops orchestration tasks or workflows\n- Needing guidance, best practices, or checklists for machine learning pipeline - multi-agent mlops orchestration\n\n## Do not use this skill when\n\n- The task is unrelated to machine learning pipeline - multi-agent mlops orchestration\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Thinking\n\nThis workflow orchestrates multiple specialized agents to build a production-ready ML pipeline following modern MLOps best practices. The approach emphasizes:\n\n- **Phase-based coordination**: Each phase builds upon previous outputs, with clear handoffs between agents\n- **Modern tooling integration**: MLflow\u002FW&B for experiments, Feast\u002FTecton for features, KServe\u002FSeldon for serving\n- **Production-first mindset**: Every component designed for scale, monitoring, and reliability\n- **Reproducibility**: Version control for data, models, and infrastructure\n- **Continuous improvement**: Automated retraining, A\u002FB testing, and drift detection\n\nThe multi-agent approach ensures each aspect is handled by domain experts:\n- Data engineers handle ingestion and quality\n- Data scientists design features and experiments\n- ML engineers implement training pipelines\n- MLOps engineers handle production deployment\n- Observability engineers ensure monitoring\n\n## Phase 1: Data & Requirements Analysis\n\n\u003CTask>\nsubagent_type: data-engineer\nprompt: |\n  Analyze and design data pipeline for ML system with requirements: $ARGUMENTS\n\n  Deliverables:\n  1. Data source audit and ingestion strategy:\n     - Source systems and connection patterns\n     - Schema validation using Pydantic\u002FGreat Expectations\n     - Data versioning with DVC or lakeFS\n     - Incremental loading and CDC strategies\n\n  2. Data quality framework:\n     - Profiling and statistics generation\n     - Anomaly detection rules\n     - Data lineage tracking\n     - Quality gates and SLAs\n\n  3. Storage architecture:\n     - Raw\u002Fprocessed\u002Ffeature layers\n     - Partitioning strategy\n     - Retention policies\n     - Cost optimization\n\n  Provide implementation code for critical components and integration patterns.\n\u003C\u002FTask>\n\n\u003CTask>\nsubagent_type: data-scientist\nprompt: |\n  Design feature engineering and model requirements for: $ARGUMENTS\n  Using data architecture from: {phase1.data-engineer.output}\n\n  Deliverables:\n  1. Feature engineering pipeline:\n     - Transformation specifications\n     - Feature store schema (Feast\u002FTecton)\n     - Statistical validation rules\n     - Handling strategies for missing data\u002Foutliers\n\n  2. Model requirements:\n     - Algorithm selection rationale\n     - Performance metrics and baselines\n     - Training data requirements\n     - Evaluation criteria and thresholds\n\n  3. Experiment design:\n     - Hypothesis and success metrics\n     - A\u002FB testing methodology\n     - Sample size calculations\n     - Bias detection approach\n\n  Include feature transformation code and statistical validation logic.\n\u003C\u002FTask>\n\n## Phase 2: Model Development & Training\n\n\u003CTask>\nsubagent_type: ml-engineer\nprompt: |\n  Implement training pipeline based on requirements: {phase1.data-scientist.output}\n  Using data pipeline: {phase1.data-engineer.output}\n\n  Build comprehensive training system:\n  1. Training pipeline implementation:\n     - Modular training code with clear interfaces\n     - Hyperparameter optimization (Optuna\u002FRay Tune)\n     - Distributed training support (Horovod\u002FPyTorch DDP)\n     - Cross-validation and ensemble strategies\n\n  2. Experiment tracking setup:\n     - MLflow\u002FWeights & Biases integration\n     - Metric logging and visualization\n     - Artifact management (models, plots, data samples)\n     - Experiment comparison and analysis tools\n\n  3. Model registry integration:\n     - Version control and tagging strategy\n     - Model metadata and lineage\n     - Promotion workflows (dev -> staging -> prod)\n     - Rollback procedures\n\n  Provide complete training code with configuration management.\n\u003C\u002FTask>\n\n\u003CTask>\nsubagent_type: python-pro\nprompt: |\n  Optimize and productionize ML code from: {phase2.ml-engineer.output}\n\n  Focus areas:\n  1. Code quality and structure:\n     - Refactor for production standards\n     - Add comprehensive error handling\n     - Implement proper logging with structured formats\n     - Create reusable components and utilities\n\n  2. Performance optimization:\n     - Profile and optimize bottlenecks\n     - Implement caching strategies\n     - Optimize data loading and preprocessing\n     - Memory management for large-scale training\n\n  3. Testing framework:\n     - Unit tests for data transformations\n     - Integration tests for pipeline components\n     - Model quality tests (invariance, directional)\n     - Performance regression tests\n\n  Deliver production-ready, maintainable code with full test coverage.\n\u003C\u002FTask>\n\n## Phase 3: Production Deployment & Serving\n\n\u003CTask>\nsubagent_type: mlops-engineer\nprompt: |\n  Design production deployment for models from: {phase2.ml-engineer.output}\n  With optimized code from: {phase2.python-pro.output}\n\n  Implementation requirements:\n  1. Model serving infrastructure:\n     - REST\u002FgRPC APIs with FastAPI\u002FTorchServe\n     - Batch prediction pipelines (Airflow\u002FKubeflow)\n     - Stream processing (Kafka\u002FKinesis integration)\n     - Model serving platforms (KServe\u002FSeldon Core)\n\n  2. Deployment strategies:\n     - Blue-green deployments for zero downtime\n     - Canary releases with traffic splitting\n     - Shadow deployments for validation\n     - A\u002FB testing infrastructure\n\n  3. CI\u002FCD pipeline:\n     - GitHub Actions\u002FGitLab CI workflows\n     - Automated testing gates\n     - Model validation before deployment\n     - ArgoCD for GitOps deployment\n\n  4. Infrastructure as Code:\n     - Terraform modules for cloud resources\n     - Helm charts for Kubernetes deployments\n     - Docker multi-stage builds for optimization\n     - Secret management with Vault\u002FSecrets Manager\n\n  Provide complete deployment configuration and automation scripts.\n\u003C\u002FTask>\n\n\u003CTask>\nsubagent_type: kubernetes-architect\nprompt: |\n  Design Kubernetes infrastructure for ML workloads from: {phase3.mlops-engineer.output}\n\n  Kubernetes-specific requirements:\n  1. Workload orchestration:\n     - Training job scheduling with Kubeflow\n     - GPU resource allocation and sharing\n     - Spot\u002Fpreemptible instance integration\n     - Priority classes and resource quotas\n\n  2. Serving infrastructure:\n     - HPA\u002FVPA for autoscaling\n     - KEDA for event-driven scaling\n     - Istio service mesh for traffic management\n     - Model caching and warm-up strategies\n\n  3. Storage and data access:\n     - PVC strategies for training data\n     - Model artifact storage with CSI drivers\n     - Distributed storage for feature stores\n     - Cache layers for inference optimization\n\n  Provide Kubernetes manifests and Helm charts for entire ML platform.\n\u003C\u002FTask>\n\n## Phase 4: Monitoring & Continuous Improvement\n\n\u003CTask>\nsubagent_type: observability-engineer\nprompt: |\n  Implement comprehensive monitoring for ML system deployed in: {phase3.mlops-engineer.output}\n  Using Kubernetes infrastructure: {phase3.kubernetes-architect.output}\n\n  Monitoring framework:\n  1. Model performance monitoring:\n     - Prediction accuracy tracking\n     - Latency and throughput metrics\n     - Feature importance shifts\n     - Business KPI correlation\n\n  2. Data and model drift detection:\n     - Statistical drift detection (KS test, PSI)\n     - Concept drift monitoring\n     - Feature distribution tracking\n     - Automated drift alerts and reports\n\n  3. System observability:\n     - Prometheus metrics for all components\n     - Grafana dashboards for visualization\n     - Distributed tracing with Jaeger\u002FZipkin\n     - Log aggregation with ELK\u002FLoki\n\n  4. Alerting and automation:\n     - PagerDuty\u002FOpsgenie integration\n     - Automated retraining triggers\n     - Performance degradation workflows\n     - Incident response runbooks\n\n  5. Cost tracking:\n     - Resource utilization metrics\n     - Cost allocation by model\u002Fexperiment\n     - Optimization recommendations\n     - Budget alerts and controls\n\n  Deliver monitoring configuration, dashboards, and alert rules.\n\u003C\u002FTask>\n\n## Configuration Options\n\n- **experiment_tracking**: mlflow | wandb | neptune | clearml\n- **feature_store**: feast | tecton | databricks | custom\n- **serving_platform**: kserve | seldon | torchserve | triton\n- **orchestration**: kubeflow | airflow | prefect | dagster\n- **cloud_provider**: aws | azure | gcp | multi-cloud\n- **deployment_mode**: realtime | batch | streaming | hybrid\n- **monitoring_stack**: prometheus | datadog | newrelic | custom\n\n## Success Criteria\n\n1. **Data Pipeline Success**:\n   - \u003C 0.1% data quality issues in production\n   - Automated data validation passing 99.9% of time\n   - Complete data lineage tracking\n   - Sub-second feature serving latency\n\n2. **Model Performance**:\n   - Meeting or exceeding baseline metrics\n   - \u003C 5% performance degradation before retraining\n   - Successful A\u002FB tests with statistical significance\n   - No undetected model drift > 24 hours\n\n3. **Operational Excellence**:\n   - 99.9% uptime for model serving\n   - \u003C 200ms p99 inference latency\n   - Automated rollback within 5 minutes\n   - Complete observability with \u003C 1 minute alert time\n\n4. **Development Velocity**:\n   - \u003C 1 hour from commit to production\n   - Parallel experiment execution\n   - Reproducible training runs\n   - Self-service model deployment\n\n5. **Cost Efficiency**:\n   - \u003C 20% infrastructure waste\n   - Optimized resource allocation\n   - Automatic scaling based on load\n   - Spot instance utilization > 60%\n\n## Final Deliverables\n\nUpon completion, the orchestrated pipeline will provide:\n- End-to-end ML pipeline with full automation\n- Comprehensive documentation and runbooks\n- Production-ready infrastructure as code\n- Complete monitoring and alerting system\n- CI\u002FCD pipelines for continuous improvement\n- Cost optimization and scaling strategies\n- Disaster recovery and rollback procedures\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,138,746,"2026-05-16 13:27:21",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"2587f1e6-a125-49b3-808b-e0c8539243ac","1.0.0","machine-learning-ops-ml-pipeline.zip",4164,"uploads\u002Fskills\u002Fbf2e9c57-3ef4-4d1e-acda-95c147aabaa5\u002Fmachine-learning-ops-ml-pipeline.zip","11c8e47218422b0e70060da5da99562a5aa8933fe4c6b12c877aeba4b14da17d","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10865}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]