[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-f77e3b13-80e7-4ad1-bcd3-b54e67d58784":3,"$fRCLNI6uPIhdzsaCBhhOeiqO7MP0qZCEQ5-X3DLVhefI":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"f77e3b13-80e7-4ad1-bcd3-b54e67d58784","chief-data-officer-advisor","首席数据官为初创公司提供咨询：人工智能训练数据权利和同意来源，数据产品战略（仓库 vs 湖屋 vs 网状结构，构建 vs 购买），B2B客户数据资产估值和并购准备，数据团队组织演变。在决定是否在客户数据上训练模型、选择数据架构、为融资或并购估值数据、安排数据招聘顺序或当用户提及CDO、首席数据官、数据战略、数据网状结构、湖屋、训练数据、数据产品时使用。","cat_coding_review","mod_coding","alirezarezvani,coding","---\nname: \"chief-data-officer-advisor\"\ndescription: \"Chief Data Officer advisory for startups: AI training data rights and consent provenance, data product strategy (warehouse vs lakehouse vs mesh, build-vs-buy), B2B customer-data-as-asset valuation and M&A readiness, data team org evolution. Use when deciding whether to train models on customer data, choosing data architecture, valuing data for fundraising or M&A, sequencing data hires, or when user mentions CDO, chief data officer, data strategy, data mesh, lakehouse, training data, data product, data monetization, or customer data asset. NOT a tactical data engineering skill — strategic decisions only.\"\nlicense: MIT\nmetadata:\n  version: 1.0.0\n  author: Alireza Rezvani\n  category: c-level\n  domain: chief-data-officer-leadership\n  updated: 2026-05-12\n  python-tools: ai_training_data_audit.py, data_product_strategy_picker.py, data_asset_valuator.py\n  frameworks: training-data-rights-matrix, data-product-strategy, customer-data-as-asset, data-team-org-evolution\n---\n\n# Chief Data Officer Advisor\n\nStrategic data leadership for startup CDOs and founders without one. **Four decisions, no surveys:**\n\n1. **Can we train our model on this data?** — origin × consent × use-case matrix\n2. **Warehouse, lakehouse, or mesh — and what do we build vs buy?** — stage-driven architecture\n3. **What is our customer data worth?** — strategic value + M&A multiplier + productization paths\n4. **What data role do we hire next?** — stage-to-role map, centralize-vs-embed trigger\n\nThis skill does **not** cover tactical data engineering. For schema design, observability, query optimization, RAG, or ML platform implementation, see `engineering\u002Fdatabase-designer\u002F`, `engineering\u002Fobservability-designer\u002F`, `engineering\u002Fdata-quality-auditor\u002F`, `engineering\u002Fsql-database-assistant\u002F`, `engineering\u002Frag-architect\u002F`, `engineering\u002Fllm-cost-optimizer\u002F`.\n\n## Keywords\n\nCDO, chief data officer, AI training data, consent provenance, training rights, GDPR Article 6 lawful basis, GDPR Article 22, EU AI Act high-risk, ePrivacy, copyright fair use, hiQ v. LinkedIn, scraped data, synthetic data, data product, data mesh, lakehouse, medallion architecture, dbt, Snowflake, BigQuery, Databricks, Fivetran, Airbyte, reverse ETL, feature store, customer data as asset, data monetization, data productization, anonymization, k-anonymity, differential privacy, M&A data diligence, data org, analytics engineer, data engineer, data scientist, data product manager, centralize vs embed, hub and spoke\n\n## Quick Start\n\n```bash\n# Audit data sources for AI training eligibility\npython scripts\u002Fai_training_data_audit.py                              # uses embedded sample\npython scripts\u002Fai_training_data_audit.py path\u002Fto\u002Fsources.json\n\n# Pick data architecture + build-vs-buy + sequencing\npython scripts\u002Fdata_product_strategy_picker.py                        # uses embedded Series A SaaS\npython scripts\u002Fdata_product_strategy_picker.py path\u002Fto\u002Fprofile.json\n\n# Value the customer data corpus + productization viability\npython scripts\u002Fdata_asset_valuator.py                                 # uses embedded B2B sample\npython scripts\u002Fdata_asset_valuator.py path\u002Fto\u002Fcorpus.json\n```\n\n## Key Questions (ask these first)\n\n- **What decision does this data drive?** (If none, why are we collecting it?)\n- **What's the consent provenance of every source we want to train on?** (TOS-only is not the same as explicit opt-in.)\n- **Who are the internal data consumers, and how many distinct domains do they span?** (Drives centralize-vs-embed and warehouse-vs-mesh.)\n- **In an M&A scenario, is our data a moat or a liability?** (Customer carve-outs in MSAs can flip the answer.)\n- **Are we hiring an analytics engineer or a data scientist next?** (They solve different problems; founders confuse them.)\n- **Have we run an anonymization audit before any external sharing?** (k-anonymity ≥ 5 is the floor, not the ceiling.)\n\n## Core Responsibilities\n\n### 1. AI Training Data Rights\n\nThe 2026 question every startup is facing: **can we use customer data to train our model?**\n\nThe answer is rarely binary. It depends on three independent dimensions:\n\n| Dimension | Values |\n|---|---|\n| **Origin** | 1st-party-explicit-opt-in \u002F 1st-party-TOS-only \u002F partner-licensed \u002F scraped \u002F synthetic |\n| **Data class** | Anonymous aggregate \u002F behavioral \u002F PII \u002F 3rd-party content \u002F regulated (PHI, PCI, kids) |\n| **Use case** | In-product personalization \u002F fine-tune our model \u002F train foundation model \u002F external sharing |\n\nEach combination produces GO \u002F MITIGATE \u002F NO-GO. **Run** `ai_training_data_audit.py` on a JSON inventory of sources.\n\nSee `references\u002Fai_training_data_rights.md` for the full matrix + GDPR Art. 6 lawful basis decision tree + EU AI Act high-risk triggers.\n\n### 2. Data Product Strategy\n\n**Architecture choice (warehouse vs lakehouse vs mesh) is stage-driven, not preference-driven:**\n\n- **Warehouse only** (Snowflake \u002F BigQuery \u002F Postgres): ≤5 data consumers, \u003C2TB, no ML use cases\n- **Lakehouse** (warehouse + object storage, often Databricks or Snowflake-with-Iceberg): 5–25 data consumers, 2TB–1PB, 1–3 ML use cases\n- **Data mesh**: 25+ data consumers across 4+ domains, federated ownership culture in place\n\n**Build vs buy is decided per layer:**\n\n| Layer | Buy unless | Build only if |\n|---|---|---|\n| Storage \u002F warehouse | Never build | (You’re a data infra company) |\n| ELT \u002F ingest | Never build | Source isn’t supported by Fivetran\u002FAirbyte |\n| Modeling (dbt) | Always build | This is your IP |\n| BI \u002F dashboards | Buy at \u003C100 consumers | Embedded analytics for customers |\n| Feature store | Defer until 3+ prod models | Then build OR buy Tecton\u002FHopsworks |\n| ML platform | Defer until 5+ prod models | Then buy SageMaker\u002FVertex\u002FDatabricks |\n\n**Run** `data_product_strategy_picker.py` for a stage-specific recommendation. See `references\u002Fdata_product_strategy.md` for kill criteria per architecture and the build-vs-buy decision tree.\n\n### 3. B2B Customer-Data-as-Asset\n\n**The shift:** at Series B+, customer data is no longer just operational — it’s an asset that can be:\n- A defensibility moat (replicating requires years of customer cohort)\n- An M&A multiplier (1.2x–2x ARR uplift for strategic buyers)\n- A direct revenue stream (anonymized industry benchmarks, embedding endpoints, licensing)\n\nBut it can also be a **liability**:\n- 47\u002F380 customers with MSA carve-outs makes productization legally infeasible\n- Anonymization audits often reveal re-identification risk above tolerable thresholds\n- Regulatory exposure increases linearly with productization (GDPR Art. 28 processors vs Art. 26 joint controllers)\n\n**Run** `data_asset_valuator.py` with corpus characteristics to get strategic value score + productization paths + risk-adjusted value.\n\nSee `references\u002Fcustomer_data_as_asset.md` for the valuation framework, M&A diligence prep checklist, and contractual constraint audit pattern.\n\n### 4. Data Team Org Evolution\n\n**The wrong question:** \"Should we hire a data scientist?\"\n**The right question:** \"What’s the next decision we can’t make because we lack data, and what role unblocks that?\"\n\nStage-to-role map (B2B SaaS baseline):\n\n| Stage | First hire | Then | Then |\n|---|---|---|---|\n| Pre-seed \u002F seed | Founder-as-analyst (SQL + spreadsheets) | — | — |\n| Series A (Series A) | Analyst | Analytics engineer (dbt) | — |\n| Series B | Data engineer | Senior analyst (embedded in GTM) | Data PM (if 3+ teams need data) |\n| Growth | Manager of analytics | ML engineer (if model is core) | Head of Data |\n| Late-stage | Head of Data → CDO | Specialized: BI, MLE, DPO | Federated owners per domain (mesh) |\n\n**Centralize-vs-embed trigger:** when 3+ functional areas (sales, marketing, product, ops, CS) need bespoke data weekly, the central team becomes the bottleneck. Move to hub-and-spoke (central platform + embedded analysts) before that becomes a hiring crisis.\n\nSee `references\u002Fdata_team_org_evolution.md`.\n\n## Workflows\n\n### Workflow 1: AI Training Decision (1 hour)\n**Goal:** Decide whether a specific data source can train a specific use case.\n\n```bash\n# 1. Build sources.json with one entry per data source\n# 2. Run the audit\npython scripts\u002Fai_training_data_audit.py sources.json\n# 3. For each MITIGATE: assign owner + remediation\n# 4. For each NO-GO: document the kill reason for the legal log\n# 5. Cross-check with cs-general-counsel-advisor on top-3 mitigation items\n# 6. Log via \u002Fcs:decide\n```\n\n### Workflow 2: Architecture Decision (1 day)\n**Goal:** Pick warehouse \u002F lakehouse \u002F mesh and the build-vs-buy split for the next 12 months.\n\n```bash\npython scripts\u002Fdata_product_strategy_picker.py profile.json\n# Cross-check with cs-cto-advisor on engineering capacity\n# Cross-check with cs-cfo-advisor on 3-year TCO\n# Log via \u002Fcs:decide; consider \u002Fcs:freeze 90 if signing a multi-year SaaS contract\n```\n\n### Workflow 3: Data Asset Valuation for M&A Prep (3 days)\n**Goal:** Value the data corpus and prepare for due diligence.\n\n1. Inventory the corpus: size, freshness, exclusivity, customer overlap, contractual restrictions\n2. Run `data_asset_valuator.py`\n3. Run the M&A diligence prep checklist in `customer_data_as_asset.md`\n4. Surface contractual carve-outs to cs-general-counsel-advisor for re-papering plan\n5. Decide productization path (benchmark report \u002F embedding endpoint \u002F direct license)\n6. Log via \u002Fcs:decide\n\n### Workflow 4: Data Team Roadmap (1 week)\n**Goal:** Build the next 18 months of data hires aligned to business decisions.\n\n1. List the top 5 decisions the business can’t make today due to missing data or analysis\n2. Map each decision to the role that unblocks it\n3. Sequence hires (one role at a time, ramp before next)\n4. Cross-check with cs-chro-advisor on comp bands and leveling\n5. Identify the centralize-vs-embed trigger date\n\n## Output Standards (when invoked via cs-cdo-advisor)\n\n```\n**Bottom Line:** [one sentence — decision and rationale]\n**The Decision:** [one of the 4 framings]\n**The Evidence:** [numbers, not adjectives]\n**How to Act:** [3 concrete next steps]\n**Your Decision:** [the call only the founder can make]\n```\n\n## Adjacent Skills\n\n- `..\u002Fcto-advisor\u002F` — architecture capacity, scaling cliffs\n- `..\u002Fciso-advisor\u002F` — data security, threat modeling for productized data\n- `..\u002Fgeneral-counsel-advisor\u002F` — contractual constraints, DPA, training-data rights\n- `..\u002Fcfo-advisor\u002F` — build-vs-buy TCO, M&A valuation math\n- `..\u002Fchro-advisor\u002F` — data team hiring, leveling, comp\n- `..\u002F..\u002F..\u002Fengineering\u002Fdatabase-designer\u002F` — tactical schema design\n- `..\u002F..\u002F..\u002Fengineering\u002Frag-architect\u002F` — tactical AI\u002FRAG implementation\n- `..\u002F..\u002F..\u002Fengineering\u002Fllm-cost-optimizer\u002F` — model cost management\n\n## References\n\n- [ai_training_data_rights.md](references\u002Fai_training_data_rights.md) — The training-rights matrix + GDPR Art. 6 \u002F EU AI Act decision tree\n- [data_product_strategy.md](references\u002Fdata_product_strategy.md) — Warehouse \u002F lakehouse \u002F mesh kill criteria + build-vs-buy decision tree\n- [customer_data_as_asset.md](references\u002Fcustomer_data_as_asset.md) — Valuation framework + M&A diligence prep + productization paths\n- [data_team_org_evolution.md](references\u002Fdata_team_org_evolution.md) — Stage-to-role map + centralize-vs-embed trigger\n\n---\n\n**Version:** 1.0.0\n**Status:** Production Ready\n**Disclaimer:** Decisions touching training data rights, data productization, or M&A data diligence should involve qualified counsel. This skill surfaces decisions and tradeoffs — it does not replace legal review.\n","","imported","https:\u002F\u002Fgithub.com\u002Falirezarezvani\u002Fclaude-skills","user_system_seed","SkillOPIC",true,84,1595,"2026-05-16 13:50:05",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"代码审查","review","mdi-magnify-scan","代码质量分析、安全审查",4,145,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"4484f56a-dfb3-49e3-b79d-7054bd40bdfb","1.0.0","chief-data-officer-advisor.zip",37285,"uploads\u002Fskills\u002Ff77e3b13-80e7-4ad1-bcd3-b54e67d58784\u002Fchief-data-officer-advisor.zip","427c0d2255c44c50a67a710b8561bcc88344abc2fdf0d1315725a98de2b0586c","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":11560},{\"path\":\"references\u002Fai_training_data_rights.md\",\"isDirectory\":false,\"size\":7365},{\"path\":\"references\u002Fcustomer_data_as_asset.md\",\"isDirectory\":false,\"size\":10285},{\"path\":\"references\u002Fdata_product_strategy.md\",\"isDirectory\":false,\"size\":8612},{\"path\":\"references\u002Fdata_team_org_evolution.md\",\"isDirectory\":false,\"size\":10446},{\"path\":\"scripts\u002Fai_training_data_audit.py\",\"isDirectory\":false,\"size\":19434},{\"path\":\"scripts\u002Fdata_asset_valuator.py\",\"isDirectory\":false,\"size\":14664},{\"path\":\"scripts\u002Fdata_product_strategy_picker.py\",\"isDirectory\":false,\"size\":14828}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]