[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-a170078b-ec1a-440e-85bd-6cc76b27e54e":3,"$f1F3aM3xgjcgA-fU3nEp4cmvhUEYiVaouJj_aPJYFRPo":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"a170078b-ec1a-440e-85bd-6cc76b27e54e","ab-test-setup","A\u002FB测试设置结构化指南，包含假设、指标和执行准备性的强制性关卡。","cat_coding_review","mod_coding","sickn33,coding","---\nname: ab-test-setup\ndescription: \"Structured guide for setting up A\u002FB tests with mandatory gates for hypothesis, metrics, and execution readiness.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# A\u002FB Test Setup\n\n## 1️⃣ Purpose & Scope\n\nEnsure every A\u002FB test is **valid, rigorous, and safe** before a single line of code is written.\n\n- Prevents \"peeking\"\n- Enforces statistical power\n- Blocks invalid hypotheses\n\n---\n\n## 2️⃣ Pre-Requisites\n\nYou must have:\n\n- A clear user problem\n- Access to an analytics source\n- Roughly estimated traffic volume\n\n### Hypothesis Quality Checklist\n\nA valid hypothesis includes:\n\n- Observation or evidence\n- Single, specific change\n- Directional expectation\n- Defined audience\n- Measurable success criteria\n\n---\n\n### 3️⃣ Hypothesis Lock (Hard Gate)\n\nBefore designing variants or metrics, you MUST:\n\n- Present the **final hypothesis**\n- Specify:\n  - Target audience\n  - Primary metric\n  - Expected direction of effect\n  - Minimum Detectable Effect (MDE)\n\nAsk explicitly:\n\n> “Is this the final hypothesis we are committing to for this test?”\n\n**Do NOT proceed until confirmed.**\n\n---\n\n### 4️⃣ Assumptions & Validity Check (Mandatory)\n\nExplicitly list assumptions about:\n\n- Traffic stability\n- User independence\n- Metric reliability\n- Randomization quality\n- External factors (seasonality, campaigns, releases)\n\nIf assumptions are weak or violated:\n\n- Warn the user\n- Recommend delaying or redesigning the test\n\n---\n\n### 5️⃣ Test Type Selection\n\nChoose the simplest valid test:\n\n- **A\u002FB Test** – single change, two variants\n- **A\u002FB\u002Fn Test** – multiple variants, higher traffic required\n- **Multivariate Test (MVT)** – interaction effects, very high traffic\n- **Split URL Test** – major structural changes\n\nDefault to **A\u002FB** unless there is a clear reason otherwise.\n\n---\n\n### 6️⃣ Metrics Definition\n\n#### Primary Metric (Mandatory)\n\n- Single metric used to evaluate success\n- Directly tied to the hypothesis\n- Pre-defined and frozen before launch\n\n#### Secondary Metrics\n\n- Provide context\n- Explain _why_ results occurred\n- Must not override the primary metric\n\n#### Guardrail Metrics\n\n- Metrics that must not degrade\n- Used to prevent harmful wins\n- Trigger test stop if significantly negative\n\n---\n\n### 7️⃣ Sample Size & Duration\n\nDefine upfront:\n\n- Baseline rate\n- MDE\n- Significance level (typically 95%)\n- Statistical power (typically 80%)\n\nEstimate:\n\n- Required sample size per variant\n- Expected test duration\n\n**Do NOT proceed without a realistic sample size estimate.**\n\n---\n\n### 8️⃣ Execution Readiness Gate (Hard Stop)\n\nYou may proceed to implementation **only if all are true**:\n\n- Hypothesis is locked\n- Primary metric is frozen\n- Sample size is calculated\n- Test duration is defined\n- Guardrails are set\n- Tracking is verified\n\nIf any item is missing, stop and resolve it.\n\n---\n\n## Running the Test\n\n### During the Test\n\n**DO:**\n\n- Monitor technical health\n- Document external factors\n\n**DO NOT:**\n\n- Stop early due to “good-looking” results\n- Change variants mid-test\n- Add new traffic sources\n- Redefine success criteria\n\n---\n\n## Analyzing Results\n\n### Analysis Discipline\n\nWhen interpreting results:\n\n- Do NOT generalize beyond the tested population\n- Do NOT claim causality beyond the tested change\n- Do NOT override guardrail failures\n- Separate statistical significance from business judgment\n\n### Interpretation Outcomes\n\n| Result               | Action                                 |\n| -------------------- | -------------------------------------- |\n| Significant positive | Consider rollout                       |\n| Significant negative | Reject variant, document learning      |\n| Inconclusive         | Consider more traffic or bolder change |\n| Guardrail failure    | Do not ship, even if primary wins      |\n\n---\n\n## Documentation & Learning\n\n### Test Record (Mandatory)\n\nDocument:\n\n- Hypothesis\n- Variants\n- Metrics\n- Sample size vs achieved\n- Results\n- Decision\n- Learnings\n- Follow-up ideas\n\nStore records in a shared, searchable location to avoid repeated failures.\n\n---\n\n## Refusal Conditions (Safety)\n\nRefuse to proceed if:\n\n- Baseline rate is unknown and cannot be estimated\n- Traffic is insufficient to detect the MDE\n- Primary metric is undefined\n- Multiple variables are changed without proper design\n- Hypothesis cannot be clearly stated\n\nExplain why and recommend next steps.\n\n---\n\n## Key Principles (Non-Negotiable)\n\n- One hypothesis per test\n- One primary metric\n- Commit before launch\n- No peeking\n- Learning over winning\n- Statistical rigor first\n\n---\n\n## Final Reminder\n\nA\u002FB testing is not about proving ideas right.\nIt is about **learning the truth with confidence**.\n\nIf you feel tempted to rush, simplify, or “just try it” —\nthat is the signal to **slow down and re-check the design**.\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,243,764,"2026-05-16 13:00:39",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"代码审查","review","mdi-magnify-scan","代码质量分析、安全审查",4,145,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"fe1a3274-7370-454e-bee9-f935e892a33f","1.0.0","ab-test-setup.zip",2549,"uploads\u002Fskills\u002Fa170078b-ec1a-440e-85bd-6cc76b27e54e\u002Fab-test-setup.zip","3c023fba8e5206b24e87b4b14a646056c5e216627e5cacfe760d35eb412679ae","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":5263}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]