应用简介
在规划产品实验、撰写可测试的假设、估计样本量、优先排序测试或以实际统计严谨性解释A/B测试结果时使用。
--- name: experiment-designer description: Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor. --- # Experiment Designer Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions. ## When To Use Use this skill for: - A/B and multivariate experiment planning - Hypothesis writing and success criteria definition - Sample size and minimum detectable effect planning - Experiment prioritization with ICE scoring - Reading statistical output for product decisions ## Core Workflow 1. Write hypothesis in If/Then/Because format - If we change `[intervention]` - Then `[metric]` will change by `[expected direction/magnitude]` - Because `[behavioral mechanism]` 2. Define metrics before running test - Primary metric: single decision metric - Guardrail metrics: quality/risk protection - Secondary metrics: diagnostics only 3. Estimate sample size - Baseline conversion or baseline mean - Minimum detectable effect (MDE) - Significance level (alpha) and power Use: ```bash python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute ``` 4. Prioritize experiments with ICE - Impact: potential upside - Confidence: evidence quality - Ease: cost/speed/complexity ICE Score = (Impact * Confidence * Ease) / 10 5. Launch with stopping rules - Decide fixed sample size or fixed duration in advance - Avoid repeated peeking without proper method - Monitor guardrails continuously 6. Interpret results - Statistical significance is not business significance - Compare point estimate + confidence interval to decision threshold - Investigate novelty effects and segment heterogeneity ## Hypothesis Quality Checklist - [ ] Contains explicit intervention and audience - [ ] Specifies measurable metric change - [ ] States plausible causal reason - [ ] Includes expected minimum effect - [ ] Defines failure condition ## Common Experiment Pitfalls - Underpowered tests leading to false negatives - Running too many simultaneous changes without isolation - Changing targeting or implementation mid-test - Stopping early on random spikes - Ignoring sample ratio mismatch and instrumentation drift - Declaring success from p-value without effect-size context ## Statistical Interpretation Guardrails - p-value < alpha indicates evidence against null, not guaranteed truth. - Confidence interval crossing zero/no-effect means uncertain directional claim. - Wide intervals imply low precision even when significant. - Use practical significance thresholds tied to business impact. See: - `references/experiment-playbook.md` - `references/statistics-reference.md` ## Tooling ### `scripts/sample_size_calculator.py` Computes required sample size (per variant and total) from: - baseline rate - MDE (absolute or relative) - significance level (alpha) - statistical power Example: ```bash python3 scripts/sample_size_calculator.py \ --baseline-rate 0.10 \ --mde 0.015 \ --mde-type absolute \ --alpha 0.05 \ --power 0.8 ```
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
alirezarezvani
other
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
其他
职场发展
文件结构
references
scripts
SKILL.md3.1 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息