[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-7f2ac674-8502-4910-8563-9aa7bcbce56d":3,"$fxqd7RjmmJbzLiawjw1o-bJhmmLZrjrwljn-E6pQ0Yhw":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"7f2ac674-8502-4910-8563-9aa7bcbce56d","agent-orchestration-improve-agent","通过性能分析、提示工程和持续迭代对现有代理进行系统改进。","cat_life_career","mod_other","sickn33,other","---\nname: agent-orchestration-improve-agent\ndescription: \"Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Agent Performance Optimization Workflow\n\nSystematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.\n\n[Extended thinking: Agent optimization requires a data-driven approach combining performance metrics, user feedback analysis, and advanced prompt engineering techniques. Success depends on systematic evaluation, targeted improvements, and rigorous testing with rollback capabilities for production safety.]\n\n## Use this skill when\n\n- Improving an existing agent's performance or reliability\n- Analyzing failure modes, prompt quality, or tool usage\n- Running structured A\u002FB tests or evaluation suites\n- Designing iterative optimization workflows for agents\n\n## Do not use this skill when\n\n- You are building a brand-new agent from scratch\n- There are no metrics, feedback, or test cases available\n- The task is unrelated to agent performance or prompt quality\n\n## Instructions\n\n1. Establish baseline metrics and collect representative examples.\n2. Identify failure modes and prioritize high-impact fixes.\n3. Apply prompt and workflow improvements with measurable goals.\n4. Validate with tests and roll out changes in controlled stages.\n\n## Safety\n\n- Avoid deploying prompt changes without regression testing.\n- Roll back quickly if quality or safety metrics regress.\n\n## Phase 1: Performance Analysis and Baseline Metrics\n\nComprehensive analysis of agent performance using context-manager for historical data collection.\n\n### 1.1 Gather Performance Data\n\n```\nUse: context-manager\nCommand: analyze-agent-performance $ARGUMENTS --days 30\n```\n\nCollect metrics including:\n\n- Task completion rate (successful vs failed tasks)\n- Response accuracy and factual correctness\n- Tool usage efficiency (correct tools, call frequency)\n- Average response time and token consumption\n- User satisfaction indicators (corrections, retries)\n- Hallucination incidents and error patterns\n\n### 1.2 User Feedback Pattern Analysis\n\nIdentify recurring patterns in user interactions:\n\n- **Correction patterns**: Where users consistently modify outputs\n- **Clarification requests**: Common areas of ambiguity\n- **Task abandonment**: Points where users give up\n- **Follow-up questions**: Indicators of incomplete responses\n- **Positive feedback**: Successful patterns to preserve\n\n### 1.3 Failure Mode Classification\n\nCategorize failures by root cause:\n\n- **Instruction misunderstanding**: Role or task confusion\n- **Output format errors**: Structure or formatting issues\n- **Context loss**: Long conversation degradation\n- **Tool misuse**: Incorrect or inefficient tool selection\n- **Constraint violations**: Safety or business rule breaches\n- **Edge case handling**: Unusual input scenarios\n\n### 1.4 Baseline Performance Report\n\nGenerate quantitative baseline metrics:\n\n```\nPerformance Baseline:\n- Task Success Rate: [X%]\n- Average Corrections per Task: [Y]\n- Tool Call Efficiency: [Z%]\n- User Satisfaction Score: [1-10]\n- Average Response Latency: [Xms]\n- Token Efficiency Ratio: [X:Y]\n```\n\n## Phase 2: Prompt Engineering Improvements\n\nApply advanced prompt optimization techniques using prompt-engineer agent.\n\n### 2.1 Chain-of-Thought Enhancement\n\nImplement structured reasoning patterns:\n\n```\nUse: prompt-engineer\nTechnique: chain-of-thought-optimization\n```\n\n- Add explicit reasoning steps: \"Let's approach this step-by-step...\"\n- Include self-verification checkpoints: \"Before proceeding, verify that...\"\n- Implement recursive decomposition for complex tasks\n- Add reasoning trace visibility for debugging\n\n### 2.2 Few-Shot Example Optimization\n\nCurate high-quality examples from successful interactions:\n\n- **Select diverse examples** covering common use cases\n- **Include edge cases** that previously failed\n- **Show both positive and negative examples** with explanations\n- **Order examples** from simple to complex\n- **Annotate examples** with key decision points\n\nExample structure:\n\n```\nGood Example:\nInput: [User request]\nReasoning: [Step-by-step thought process]\nOutput: [Successful response]\nWhy this works: [Key success factors]\n\nBad Example:\nInput: [Similar request]\nOutput: [Failed response]\nWhy this fails: [Specific issues]\nCorrect approach: [Fixed version]\n```\n\n### 2.3 Role Definition Refinement\n\nStrengthen agent identity and capabilities:\n\n- **Core purpose**: Clear, single-sentence mission\n- **Expertise domains**: Specific knowledge areas\n- **Behavioral traits**: Personality and interaction style\n- **Tool proficiency**: Available tools and when to use them\n- **Constraints**: What the agent should NOT do\n- **Success criteria**: How to measure task completion\n\n### 2.4 Constitutional AI Integration\n\nImplement self-correction mechanisms:\n\n```\nConstitutional Principles:\n1. Verify factual accuracy before responding\n2. Self-check for potential biases or harmful content\n3. Validate output format matches requirements\n4. Ensure response completeness\n5. Maintain consistency with previous responses\n```\n\nAdd critique-and-revise loops:\n\n- Initial response generation\n- Self-critique against principles\n- Automatic revision if issues detected\n- Final validation before output\n\n### 2.5 Output Format Tuning\n\nOptimize response structure:\n\n- **Structured templates** for common tasks\n- **Dynamic formatting** based on complexity\n- **Progressive disclosure** for detailed information\n- **Markdown optimization** for readability\n- **Code block formatting** with syntax highlighting\n- **Table and list generation** for data presentation\n\n## Phase 3: Testing and Validation\n\nComprehensive testing framework with A\u002FB comparison.\n\n### 3.1 Test Suite Development\n\nCreate representative test scenarios:\n\n```\nTest Categories:\n1. Golden path scenarios (common successful cases)\n2. Previously failed tasks (regression testing)\n3. Edge cases and corner scenarios\n4. Stress tests (complex, multi-step tasks)\n5. Adversarial inputs (potential breaking points)\n6. Cross-domain tasks (combining capabilities)\n```\n\n### 3.2 A\u002FB Testing Framework\n\nCompare original vs improved agent:\n\n```\nUse: parallel-test-runner\nConfig:\n  - Agent A: Original version\n  - Agent B: Improved version\n  - Test set: 100 representative tasks\n  - Metrics: Success rate, speed, token usage\n  - Evaluation: Blind human review + automated scoring\n```\n\nStatistical significance testing:\n\n- Minimum sample size: 100 tasks per variant\n- Confidence level: 95% (p \u003C 0.05)\n- Effect size calculation (Cohen's d)\n- Power analysis for future tests\n\n### 3.3 Evaluation Metrics\n\nComprehensive scoring framework:\n\n**Task-Level Metrics:**\n\n- Completion rate (binary success\u002Ffailure)\n- Correctness score (0-100% accuracy)\n- Efficiency score (steps taken vs optimal)\n- Tool usage appropriateness\n- Response relevance and completeness\n\n**Quality Metrics:**\n\n- Hallucination rate (factual errors per response)\n- Consistency score (alignment with previous responses)\n- Format compliance (matches specified structure)\n- Safety score (constraint adherence)\n- User satisfaction prediction\n\n**Performance Metrics:**\n\n- Response latency (time to first token)\n- Total generation time\n- Token consumption (input + output)\n- Cost per task (API usage fees)\n- Memory\u002Fcontext efficiency\n\n### 3.4 Human Evaluation Protocol\n\nStructured human review process:\n\n- Blind evaluation (evaluators don't know version)\n- Standardized rubric with clear criteria\n- Multiple evaluators per sample (inter-rater reliability)\n- Qualitative feedback collection\n- Preference ranking (A vs B comparison)\n\n## Phase 4: Version Control and Deployment\n\nSafe rollout with monitoring and rollback capabilities.\n\n### 4.1 Version Management\n\nSystematic versioning strategy:\n\n```\nVersion Format: agent-name-v[MAJOR].[MINOR].[PATCH]\nExample: customer-support-v2.3.1\n\nMAJOR: Significant capability changes\nMINOR: Prompt improvements, new examples\nPATCH: Bug fixes, minor adjustments\n```\n\nMaintain version history:\n\n- Git-based prompt storage\n- Changelog with improvement details\n- Performance metrics per version\n- Rollback procedures documented\n\n### 4.2 Staged Rollout\n\nProgressive deployment strategy:\n\n1. **Alpha testing**: Internal team validation (5% traffic)\n2. **Beta testing**: Selected users (20% traffic)\n3. **Canary release**: Gradual increase (20% → 50% → 100%)\n4. **Full deployment**: After success criteria met\n5. **Monitoring period**: 7-day observation window\n\n### 4.3 Rollback Procedures\n\nQuick recovery mechanism:\n\n```\nRollback Triggers:\n- Success rate drops >10% from baseline\n- Critical errors increase >5%\n- User complaints spike\n- Cost per task increases >20%\n- Safety violations detected\n\nRollback Process:\n1. Detect issue via monitoring\n2. Alert team immediately\n3. Switch to previous stable version\n4. Analyze root cause\n5. Fix and re-test before retry\n```\n\n### 4.4 Continuous Monitoring\n\nReal-time performance tracking:\n\n- Dashboard with key metrics\n- Anomaly detection alerts\n- User feedback collection\n- Automated regression testing\n- Weekly performance reports\n\n## Success Criteria\n\nAgent improvement is successful when:\n\n- Task success rate improves by ≥15%\n- User corrections decrease by ≥25%\n- No increase in safety violations\n- Response time remains within 10% of baseline\n- Cost per task doesn't increase >5%\n- Positive user feedback increases\n\n## Post-Deployment Review\n\nAfter 30 days of production use:\n\n1. Analyze accumulated performance data\n2. Compare against baseline and targets\n3. Identify new improvement opportunities\n4. Document lessons learned\n5. Plan next optimization cycle\n\n## Continuous Improvement Cycle\n\nEstablish regular improvement cadence:\n\n- **Weekly**: Monitor metrics and collect feedback\n- **Monthly**: Analyze patterns and plan improvements\n- **Quarterly**: Major version updates with new capabilities\n- **Annually**: Strategic review and architecture updates\n\nRemember: Agent optimization is an iterative process. Each cycle builds upon previous learnings, gradually improving performance while maintaining stability and safety.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,121,668,"2026-05-16 13:01:19",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"04a9f7a7-80de-4a92-aa7d-4718332153d9","1.0.0","agent-orchestration-improve-agent.zip",4555,"uploads\u002Fskills\u002F7f2ac674-8502-4910-8563-9aa7bcbce56d\u002Fagent-orchestration-improve-agent.zip","c283fbf148fc3adeaf6767742994dba26bb5acd557da2ad7d09dc1f8f2ccf44e","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10506}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]