[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-9f0cbfec-98ad-4954-b564-cb4ee28811f8":3,"$fbBKMdi3TEtUTLPHYW1gF7JAAVGZNa9k_pe82Q4ZHG4M":42},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":33},"9f0cbfec-98ad-4954-b564-cb4ee28811f8","skill-optimizer","诊断和优化代理技能（SKILL.md）使用真实会话数据和基于研究的静态分析。与Claude Code、Codex以及任何兼容代理技能的代理协同工作。","cat_coding_backend","mod_coding","sickn33,coding","---\nname: skill-optimizer\ndescription: \"Diagnose and optimize Agent Skills (SKILL.md) with real session data and research-backed static analysis. Works with Claude Code, Codex, and any Agent Skills-compatible agent.\"\nrisk: safe\nsource: hqhq1025\u002Fskill-optimizer (MIT)\ndate_added: \"2026-04-11\"\n---\n\n## When to Use This Skill\n\n- Use when skills are not triggering as expected or seem broken\n- Use when you want to audit and improve your skill library's quality\n- Use when you want to understand which skills are underperforming or wasting context tokens\n\n## Rules\n\n- **Read-only**: never modify skill files. Only output report.\n- **All 8 dimensions**: do not skip any. If data is insufficient, report \"N\u002FA — insufficient session data\" rather than omitting.\n- **Quantify**: \"you had 12 research tasks last week but the skill never triggered\" beats \"you often do research\".\n- **Suggest, don't prescribe**: give specific wording suggestions for description improvements, but frame as suggestions.\n- **Show evidence**: for undertrigger claims, quote the actual user message that should have triggered the skill.\n- **Evidence-based suggestions**: when suggesting description rewrites, cite the specific research finding that motivates the change (e.g., \"front-load trigger keywords — MCP study shows 3.6x selection rate improvement\").\n\n## Overview\n\nAnalyze skills using **historical session data + static quality checks**, output a diagnostic report with P0\u002FP1\u002FP2 prioritized fixes. Scores each skill on a 5-point composite scale across 8 dimensions.\n\nCSO (Claude\u002FAgent Search Optimization) = writing skill descriptions so agents select the right skill at the right time. This skill checks for CSO violations.\n\n## Usage\n\n- `\u002Foptimize-skill` → scan all skills\n- `\u002Foptimize-skill my-skill` → single skill\n- `\u002Foptimize-skill skill-a skill-b` → multiple specified skills\n\n## Data Sources\n\nAuto-detect the current agent platform and scan the corresponding paths:\n\n| Source | Claude Code | Codex | Shared |\n|--------|------------|-------|--------|\n| Session transcripts | `~\u002F.claude\u002Fprojects\u002F**\u002F*.jsonl` | `~\u002F.codex\u002Fsessions\u002F**\u002F*.jsonl` | — |\n| Skill files | `~\u002F.claude\u002Fskills\u002F*\u002FSKILL.md` | `~\u002F.codex\u002Fskills\u002F*\u002FSKILL.md` | `~\u002F.agents\u002Fskills\u002F*\u002FSKILL.md` |\n\n**Platform detection:** Check which directories exist. Scan all available sources — a user may have both Claude Code and Codex installed.\n\n## Workflow\n\n```\nIdentify target skills\n        ↓\nCollect session data (python3 scripts scan JSONL transcripts)\n        ↓\nRun 8 analysis dimensions\n        ↓\nCompute composite scores\n        ↓\nOutput report with P0\u002FP1\u002FP2\n```\n\n### Step 1: Identify Target Skills\n\nScan skill directories in order: `~\u002F.claude\u002Fskills\u002F`, `~\u002F.codex\u002Fskills\u002F`, `~\u002F.agents\u002Fskills\u002F`. Deduplicate by skill name (same name in multiple locations = same skill). For each, read `SKILL.md` and extract:\n- name, description (from YAML frontmatter)\n- trigger keywords (from description field)\n- defined workflow steps (Step 1\u002F2\u002F3... or ### sections under Workflow)\n- word count\n\nIf user specified skill names, filter to only those.\n\n### Step 2: Collect Session Data\n\nUse python3 scripts via Bash to scan session JSONL files. Extract:\n\n**Claude Code sessions** (`~\u002F.claude\u002Fprojects\u002F**\u002F*.jsonl`):\n- `Skill` tool_use calls (which skills were invoked)\n- User messages (full text)\n- Assistant messages after skill invocation (for workflow tracking)\n- User messages after skill invocation (for reaction analysis)\n\n**Codex sessions** (`~\u002F.codex\u002Fsessions\u002F**\u002F*.jsonl`):\n- `session_meta` events → extract `base_instructions` for skill loading evidence\n- `response_item` events → assistant outputs (workflow tracking)\n- `event_msg` events → tool execution and skill-related events\n- User messages from `turn_context` events (for reaction analysis)\n\n**Note:** Codex injects skills via context rather than explicit `Skill` tool calls. Skill loading (present in `base_instructions`) does NOT equal active invocation. To detect actual use, search for skill-specific workflow markers (step headers, output formats) in `response_item` content within that session. A skill is \"invoked\" only if the agent produced output following the skill's defined workflow.\n\n**Aggregated:**\n- Per-skill: invocation count, trigger keyword match count\n- Per-skill: user reaction sentiment after invocation\n- Per-skill: workflow step completion markers\n\n### Step 3: Run 8 Analysis Dimensions\n\n**You MUST run ALL 8 dimensions.** The baseline behavior without this skill is to skip dimensions 4.2, 4.3, 4.5b, and 4.8. These are the most valuable dimensions — do not skip them.\n\n#### 4.1 Trigger Rate\n\nCount how many times each skill was actually invoked vs how many times its trigger keywords appeared in user messages.\n\n**Claude Code:** count `Skill` tool_use calls in transcripts.\n**Codex:** count sessions where the agent produced output following the skill's workflow markers (not merely loaded in context).\n\n**Diagnose:**\n- Never triggered → skill may be useless or trigger words wrong\n- Keywords match >> actual invocations → undertrigger problem, description needs work\n- High frequency → core skill, worth optimizing\n\n#### 4.2 Post-Invocation User Reaction\n\n**This dimension is critical and easy to skip. Do not skip it.**\n\nAfter a skill is invoked in a session, read the user's next 3 messages. Classify:\n- **Negative**: \"no\", \"wrong\", \"never mind\", \"not what I wanted\", user interrupts\n- **Correction**: user re-describes their intent, manually overrides skill output\n- **Positive**: \"good\", \"ok\", \"continue\", \"nice\", user follows the workflow\n- **Silent switch**: user changes topic entirely (likely false positive trigger)\n\nReport per-skill satisfaction rate.\n\n#### 4.3 Workflow Completion Rate\n\n**This dimension is critical and easy to skip. Do not skip it.**\n\nFor each skill invocation found in session data:\n1. Extract the skill's defined steps from SKILL.md\n2. Search the assistant messages in that session for step markers (Step N, specific output formats defined in the skill)\n3. Calculate: how far did execution get?\n\nReport: `{skill-name} (N steps): avg completed Step X\u002FN (Y%)`\n\nIf a specific step is frequently where execution stops, flag it.\n\n#### 4.4 Static Quality Analysis\n\nCheck each SKILL.md against these 14 rules:\n\n| Check | Pass Criteria |\n|-------|--------------|\n| Frontmatter format | Only `name` + `description`, total \u003C 1024 chars |\n| Name format | Letters, numbers, hyphens only |\n| Description trigger | Starts with \"Use when...\" or has explicit trigger conditions |\n| Description workflow leak | Description does NOT summarize the skill's workflow steps (CSO violation) |\n| Description pushiness | Description actively claims scenarios where it should be used, not just passive |\n| Overview section | Present |\n| Rules section | Present |\n| MUST\u002FNEVER density | Count ALL-CAPS directive words; >5 per 100 words = flag |\n| Word count | \u003C 500 words (flag if over) |\n| Narrative anti-pattern | No \"In session X, we found...\" storytelling |\n| YAML quoting safety | description containing `: ` must be wrapped in double quotes |\n| Critical info position | Core trigger conditions and primary actions must be in the first 20% of SKILL.md |\n| Description 250-char check | Primary trigger keywords must appear within the first 250 characters of description |\n| Trigger condition count | ≤ 2 trigger conditions in description is ideal |\n\n#### 4.5a False Positive Rate (Overtrigger)\n\nSkill was invoked but user immediately rejected or ignored it.\n\n#### 4.5b Undertrigger Detection\n\n**This is the highest-value dimension.** For each skill, extract its **capability keywords** (not just trigger keywords — what the skill CAN do). Then scan user messages for tasks that match those capabilities but where the skill was NOT invoked.\n\nReport: which user messages SHOULD have triggered the skill but didn't, and suggest description improvements.\n\n**Compounding Risk Assessment:**\nFor skills with chronic undertriggering (0 triggers across 5+ sessions where relevant tasks appeared), flag as \"compounding risk\" — undertriggered skills cannot self-improve through usage feedback, causing the gap to widen over time. Recommend immediate description rewrite as P0.\n\n#### 4.6 Cross-Skill Conflicts\n\nCompare all skill pairs:\n- Trigger keyword overlap (same keywords in two descriptions)\n- Workflow overlap (two skills teach similar processes)\n- Contradictory guidance\n\n#### 4.7 Environment Consistency\n\nFor each skill, extract referenced:\n- File paths → check if they exist (`test -e`)\n- CLI tools → check if installed (`which`)\n- Directories → check if they exist\n\nFlag any broken references.\n\n#### 4.8 Token Economics\n\n**This dimension is critical and easy to skip. Do not skip it.**\n\nFor each skill:\n- Word count (from Step 1)\n- Trigger frequency (from 4.1)\n- Cost-effectiveness = trigger count \u002F word count\n- Flag: large + never-triggered skills as candidates for removal or compression\n\n**Progressive Disclosure Tier Check:**\nEvaluate each skill against the 3-tier loading model:\n- Tier 1 (frontmatter): ~100 tokens. Check: is description ≤ 1024 chars?\n- Tier 2 (SKILL.md body): \u003C500 lines recommended. Check: word count.\n- Tier 3 (reference files): loaded on demand. Check: does skill use reference files for detailed content, or cram everything into SKILL.md?\n\nFlag skills that put 500+ words in SKILL.md without using reference files as \"poor progressive disclosure\".\n\n### Step 4: Composite Score\n\nRate each skill on a 5-point scale:\n\n| Score | Meaning |\n|-------|---------|\n| 5 | Healthy: high trigger rate, positive reactions, complete workflows, clean static |\n| 4 | Good: minor issues in 1-2 dimensions |\n| 3 | Needs attention: significant gap in 1 dimension or minor gaps in 3+ |\n| 2 | Problematic: never triggered, or negative user reactions, or major static issues |\n| 1 | Broken: doesn't work, references missing, or fundamentally misaligned |\n\n**Scored dimensions** (weighted average):\n- Trigger rate: 25%\n- User reaction: 20%\n- Workflow completion: 15%\n- Static quality: 15%\n- Undertrigger: 15%\n- Token economics: 10%\n\n**Qualitative dimensions** (reported but not scored):\n- 4.5a Overtrigger: reported as count + examples\n- 4.6 Cross-Skill Conflicts: reported as conflict pairs\n- 4.7 Environment Consistency: reported as pass\u002Ffail per reference\n\n## Report Format\n\n```markdown\n# Skill Optimization Report\n**Date**: {date}\n**Scope**: {all \u002F specified skills}\n**Session data**: {N} sessions, {date range}\n\n## Overview\n| Skill | Triggers | Reaction | Completion | Static | Undertrigger | Token | Score |\n|-------|----------|----------|------------|--------|--------------|-------|-------|\n| example-skill | 2 | 100% | 86% | B+ | 1 miss | 486w | 4\u002F5 |\n\n## P0 Fixes (blocking usage)\n1. ...\n\n## P1 Improvements (better experience)\n1. ...\n\n## P2 Optional Optimizations\n1. ...\n\n## Per-Skill Diagnostics\n### {skill-name}\n#### 4.1 Trigger Rate\n...\n#### 4.2 User Reaction\n...\n(all 8 dimensions)\n```\n\n## Research Background\n\nThe analysis dimensions in this report are grounded in the following research:\n- **Undertrigger detection**: Memento-Skills (arXiv:2603.18743) — skills as structured files require accurate routing; unrouted skills cannot self-improve via the read-write learning loop\n- **Description quality**: MCP Description Quality (arXiv:2602.18914) — well-written descriptions achieve 72% tool selection rate vs. 20% random baseline (3.6x improvement)\n- **Information position**: Lost in the Middle (Liu et al., TACL 2024) — U-shaped LLM attention curve\n- **Format impact**: He et al. (arXiv:2411.10541) — format changes alone can cause 9-40% performance variance\n- **Instruction compliance**: IFEval (arXiv:2311.07911) — LLMs struggle with multi-constraint prompts\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,140,436,"2026-05-16 13:40:53",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":25,"skillCount":32,"createdAt":26},"后端开发","backend","mdi-server","API、数据库、服务端架构",296,[34],{"id":35,"skillId":4,"version":36,"fileName":37,"fileSize":38,"filePath":39,"fileHash":40,"manifest":41,"createdAt":19},"9ad73c1c-7631-47bf-b317-a2a3714e8ca5","1.0.0","skill-optimizer.zip",5098,"uploads\u002Fskills\u002F9f0cbfec-98ad-4954-b564-cb4ee28811f8\u002Fskill-optimizer.zip","19edf7977ee6a9d1eaefc97f3e298876399acf0ea6bff7d91fc289ca015503b5","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":12088}]",{"code":43,"message":44,"data":45},200,"success",{"items":46,"stats":47,"page":50},[],{"averageRating":48,"totalRatings":48,"ratingCounts":49},0,[48,48,48,48,48],{"limit":51,"offset":48,"hasMore":52,"nextOffset":51,"ratedOnly":16},15,false]