[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-72447f56-93f1-4247-8db7-4583d84a1e2c":3,"$fnfC2VpC8CgqyUGD7OX4b9JfXZcMiQrX-wg2GiD3kUB4":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"72447f56-93f1-4247-8db7-4583d84a1e2c","ml-pipeline-workflow","从数据准备到模型部署的完整端到端MLOps管道编排。","cat_prod_automation","mod_productivity","sickn33,productivity","---\nname: ml-pipeline-workflow\ndescription: \"Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# ML Pipeline Workflow\n\nComplete end-to-end MLOps pipeline orchestration from data preparation through model deployment.\n\n## Do not use this skill when\n\n- The task is unrelated to ml pipeline workflow\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources\u002Fimplementation-playbook.md`.\n\n## Overview\n\nThis skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring.\n\n## Use this skill when\n\n- Building new ML pipelines from scratch\n- Designing workflow orchestration for ML systems\n- Implementing data → model → deployment automation\n- Setting up reproducible training workflows\n- Creating DAG-based ML orchestration\n- Integrating ML components into production systems\n\n## What This Skill Provides\n\n### Core Capabilities\n\n1. **Pipeline Architecture**\n   - End-to-end workflow design\n   - DAG orchestration patterns (Airflow, Dagster, Kubeflow)\n   - Component dependencies and data flow\n   - Error handling and retry strategies\n\n2. **Data Preparation**\n   - Data validation and quality checks\n   - Feature engineering pipelines\n   - Data versioning and lineage\n   - Train\u002Fvalidation\u002Ftest splitting strategies\n\n3. **Model Training**\n   - Training job orchestration\n   - Hyperparameter management\n   - Experiment tracking integration\n   - Distributed training patterns\n\n4. **Model Validation**\n   - Validation frameworks and metrics\n   - A\u002FB testing infrastructure\n   - Performance regression detection\n   - Model comparison workflows\n\n5. **Deployment Automation**\n   - Model serving patterns\n   - Canary deployments\n   - Blue-green deployment strategies\n   - Rollback mechanisms\n\n### Reference Documentation\n\nSee the `references\u002F` directory for detailed guides:\n- **data-preparation.md** - Data cleaning, validation, and feature engineering\n- **model-training.md** - Training workflows and best practices\n- **model-validation.md** - Validation strategies and metrics\n- **model-deployment.md** - Deployment patterns and serving architectures\n\n### Assets and Templates\n\nThe `assets\u002F` directory contains:\n- **pipeline-dag.yaml.template** - DAG template for workflow orchestration\n- **training-config.yaml** - Training configuration template\n- **validation-checklist.md** - Pre-deployment validation checklist\n\n## Usage Patterns\n\n### Basic Pipeline Setup\n\n```python\n# 1. Define pipeline stages\nstages = [\n    \"data_ingestion\",\n    \"data_validation\",\n    \"feature_engineering\",\n    \"model_training\",\n    \"model_validation\",\n    \"model_deployment\"\n]\n\n# 2. Configure dependencies\n# See assets\u002Fpipeline-dag.yaml.template for full example\n```\n\n### Production Workflow\n\n1. **Data Preparation Phase**\n   - Ingest raw data from sources\n   - Run data quality checks\n   - Apply feature transformations\n   - Version processed datasets\n\n2. **Training Phase**\n   - Load versioned training data\n   - Execute training jobs\n   - Track experiments and metrics\n   - Save trained models\n\n3. **Validation Phase**\n   - Run validation test suite\n   - Compare against baseline\n   - Generate performance reports\n   - Approve for deployment\n\n4. **Deployment Phase**\n   - Package model artifacts\n   - Deploy to serving infrastructure\n   - Configure monitoring\n   - Validate production traffic\n\n## Best Practices\n\n### Pipeline Design\n\n- **Modularity**: Each stage should be independently testable\n- **Idempotency**: Re-running stages should be safe\n- **Observability**: Log metrics at every stage\n- **Versioning**: Track data, code, and model versions\n- **Failure Handling**: Implement retry logic and alerting\n\n### Data Management\n\n- Use data validation libraries (Great Expectations, TFX)\n- Version datasets with DVC or similar tools\n- Document feature engineering transformations\n- Maintain data lineage tracking\n\n### Model Operations\n\n- Separate training and serving infrastructure\n- Use model registries (MLflow, Weights & Biases)\n- Implement gradual rollouts for new models\n- Monitor model performance drift\n- Maintain rollback capabilities\n\n### Deployment Strategies\n\n- Start with shadow deployments\n- Use canary releases for validation\n- Implement A\u002FB testing infrastructure\n- Set up automated rollback triggers\n- Monitor latency and throughput\n\n## Integration Points\n\n### Orchestration Tools\n\n- **Apache Airflow**: DAG-based workflow orchestration\n- **Dagster**: Asset-based pipeline orchestration\n- **Kubeflow Pipelines**: Kubernetes-native ML workflows\n- **Prefect**: Modern dataflow automation\n\n### Experiment Tracking\n\n- MLflow for experiment tracking and model registry\n- Weights & Biases for visualization and collaboration\n- TensorBoard for training metrics\n\n### Deployment Platforms\n\n- AWS SageMaker for managed ML infrastructure\n- Google Vertex AI for GCP deployments\n- Azure ML for Azure cloud\n- Kubernetes + KServe for cloud-agnostic serving\n\n## Progressive Disclosure\n\nStart with the basics and gradually add complexity:\n\n1. **Level 1**: Simple linear pipeline (data → train → deploy)\n2. **Level 2**: Add validation and monitoring stages\n3. **Level 3**: Implement hyperparameter tuning\n4. **Level 4**: Add A\u002FB testing and gradual rollouts\n5. **Level 5**: Multi-model pipelines with ensemble strategies\n\n## Common Patterns\n\n### Batch Training Pipeline\n\n```yaml\n# See assets\u002Fpipeline-dag.yaml.template\nstages:\n  - name: data_preparation\n    dependencies: []\n  - name: model_training\n    dependencies: [data_preparation]\n  - name: model_evaluation\n    dependencies: [model_training]\n  - name: model_deployment\n    dependencies: [model_evaluation]\n```\n\n### Real-time Feature Pipeline\n\n```python\n# Stream processing for real-time features\n# Combined with batch training\n# See references\u002Fdata-preparation.md\n```\n\n### Continuous Training\n\n```python\n# Automated retraining on schedule\n# Triggered by data drift detection\n# See references\u002Fmodel-training.md\n```\n\n## Troubleshooting\n\n### Common Issues\n\n- **Pipeline failures**: Check dependencies and data availability\n- **Training instability**: Review hyperparameters and data quality\n- **Deployment issues**: Validate model artifacts and serving config\n- **Performance degradation**: Monitor data drift and model metrics\n\n### Debugging Steps\n\n1. Check pipeline logs for each stage\n2. Validate input\u002Foutput data at boundaries\n3. Test components in isolation\n4. Review experiment tracking metrics\n5. Inspect model artifacts and metadata\n\n## Next Steps\n\nAfter setting up your pipeline:\n\n1. Explore **hyperparameter-tuning** skill for optimization\n2. Learn **experiment-tracking-setup** for MLflow\u002FW&B\n3. Review **model-deployment-patterns** for serving strategies\n4. Implement monitoring with observability tools\n\n## Related Skills\n\n- **experiment-tracking-setup**: MLflow and Weights & Biases integration\n- **hyperparameter-tuning**: Automated hyperparameter optimization\n- **model-deployment-patterns**: Advanced deployment strategies\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,186,1379,"2026-05-16 13:28:58",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"效率工具","productivity","mdi-lightning-bolt-outline","文档处理、数据分析、自动化工作流",4,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"自动化","automation","mdi-robot-outline","工作流自动化、批处理",3,101,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"d8ae51ee-d9cc-46b6-a046-e2316b96647e","1.0.0","ml-pipeline-workflow.zip",3045,"uploads\u002Fskills\u002F72447f56-93f1-4247-8db7-4583d84a1e2c\u002Fml-pipeline-workflow.zip","7bfa3075be004a7a8ddcf32034c5ff688e980acdf4a10730f92e726fe3ea345a","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":7655}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]