[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-d17e8581-b0fd-482d-8d4c-99f11246db6b":3,"$fFkXO8zrilDc8R39Aabsld3TVawp_AMrVHWDntIYWdMc":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"d17e8581-b0fd-482d-8d4c-99f11246db6b","ingest-youtube","将YouTube视频字幕提取到可查询的Markdown保险库中，使用yt-dlp字幕发现、VTT清理、元数据前缀和捕获-seed占位符。","cat_design_graphic","mod_design","sickn33,design","---\nname: ingest-youtube\ndescription: \"Pull a YouTube video transcript into a queryable markdown vault with yt-dlp subtitle discovery, VTT cleanup, metadata frontmatter, and capture-seed stubs.\"\nrisk: safe\nsource: community\nsource_repo: adelaidasofia\u002Fai-brain-starter\nsource_type: community\ndate_added: \"2026-05-09\"\nlicense: MIT\nlicense_source: \"https:\u002F\u002Fgithub.com\u002Fadelaidasofia\u002Fai-brain-starter\u002Fblob\u002Fmain\u002FLICENSE\"\nupstream: \"https:\u002F\u002Fgithub.com\u002Fadelaidasofia\u002Fai-brain-starter\u002Ftree\u002Fmain\u002Fskills\u002Fingest-youtube\"\n---\n\n# ingest-youtube — YouTube-to-vault connector\n\nPulls YouTube transcripts into a markdown vault as queryable typed-memory entries that downstream skills (knowledge graph extraction, voice-fingerprint training, content repurposing, action-item extraction) can act on.\n\nSame pattern as ingest-slack, ingest-whatsapp, ingest-notion, ingest-linear, ingest-github, ingest-gmail. Adding YouTube means a new normalizer, not a new architecture.\n\n## When to use\n\n- User pastes a YouTube URL and asks for a transcript or summary\n- User says `\u002Fingest-youtube \u003Curl>` for a single video\n- User asks to capture, sync, ingest, transcribe, or pull a talk\u002Fpodcast\u002Fkeynote into the vault\n\nDo NOT use for:\n- Downloading the actual video file (use `yt-dlp` directly with `-f best`)\n- Channel-wide ingestion or `--days` windows; this script ingests one video URL at a time\n- Live streams (transcripts are not stable)\n- Non-YouTube sources (Vimeo, Twitch, Twitter Spaces have their own connectors)\n- One-off transcript reads where the user does not want a vault file (run `yt-dlp --write-auto-sub` directly and pipe to stdout)\n\n## How it works\n\n1. Parse the input as one YouTube video URL.\n2. Verify `yt-dlp` is installed. If not, the script exits with install instructions: `brew install yt-dlp` (macOS) or `pip3 install --user yt-dlp`.\n3. Call `yt-dlp --list-subs \u003Curl>` to enumerate available subtitles.\n4. Subtitle priority: manual subs > auto-generated captions. Manual subs preserve creator-provided punctuation and speaker labels; auto-gen is uppercase + no punctuation.\n5. Download the highest-priority subtitle as VTT via `yt-dlp --write-sub --sub-lang \u003Clang> --skip-download`. Default language preference: `en,es` (English first, Spanish second).\n6. Strip VTT timing markers and merge into clean prose paragraphs. Deduplicate repeated lines (auto-generated VTTs are line-doubled). Preserve speaker labels if the source had them.\n7. Pull video metadata (title, channel, upload date, duration, video_id, URL) via `yt-dlp --print-json --skip-download`.\n8. Slugify the channel name and video title. Write to `External Inputs\u002FYouTube\u002F\u003Cchannel-slug>\u002F\u003CYYYY-MM-DD>-\u003Cvideo-slug>.md`.\n9. Scan transcript for trigger keywords (decision, framework, model, principle, \"the lesson is\", playbook, anti-pattern, case study). For each match, create a writing-seed stub at `Meta\u002FCaptures\u002F\u003CYYYY-MM-DD>-youtube-\u003Cchannel-slug>-\u003Cvideo-id>.md` so the seed lands in the captures aggregator.\n10. Print summary: file path, transcript word count, language, seeds detected.\n\n## Invocation\n\n```bash\npython3 ingest.py \u003Cyoutube-url> [--vault \u003Cpath>] [--lang \u003Ccode>]\n```\n\nDefaults:\n- `--vault`: `$VAULT_ROOT` env var or current directory\n- `--lang`: `en,es` (English first, Spanish second; matches a common bilingual default)\n- `--whisper`: accepted as a future fallback flag, but this version writes a stub when no subtitles are available\n\n## Output contract\n\nThe vault file at `External Inputs\u002FYouTube\u002F\u003Cchannel-slug>\u002F\u003CYYYY-MM-DD>-\u003Cvideo-slug>.md` has frontmatter:\n\n```yaml\n---\ntype: external-input\nsource: youtube\nvideo_id: \u003C11-char ID>\nurl: https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=\u003Cid>\nchannel: \u003Cchannel-name>\nchannel_url: https:\u002F\u002Fwww.youtube.com\u002F\u003Chandle>\ntitle: \u003Cvideo title>\nupload_date: \u003CYYYY-MM-DD>\nduration_seconds: \u003Cint>\nlanguage: \u003CISO code>\nsubtitle_source: manual | auto | whisper\nword_count: \u003Cint>\ningested_at: \u003CISO 8601 timestamp>\n---\n```\n\nBody is the cleaned transcript as paragraph prose. If the source had speaker labels, format as `**\u003Cspeaker>:** \u003Ctext>` per turn.\n\n## Idempotency\n\nRe-ingesting the same video URL overwrites the same vault file. The seed stub filenames hash the video_id, so the same source video produces the same stub filename across re-runs. Re-runs refresh, never duplicate.\n\n## Missing subtitles\n\nIf `yt-dlp --list-subs` returns no manual or auto subtitles, the script writes a stub vault note with the video metadata and source URL instead of failing silently. The `--whisper` flag is reserved for a future local transcription fallback and currently reports that the fallback is not implemented.\n\nFor a manual fallback today, download audio with `yt-dlp`, transcribe it with your local Whisper workflow, and add captions or transcript text before rerunning the ingest.\n\n## Limitations\n\n- Ingests one YouTube video URL per run; channel handles, playlists, and `--days` windows are out of scope.\n- Depends on subtitles returned by `yt-dlp`; videos without subtitles produce a metadata stub, not a transcript.\n- Does not download video files or perform built-in Whisper transcription in this version.\n- Network availability, YouTube subtitle access, and local `yt-dlp` behavior determine whether ingest succeeds.\n\n## Acceptance test\n\nRun against the first YouTube video ever uploaded:\n\n```bash\npython3 ingest.py \"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=jNQXAC9IVRw\" --vault \u002Ftmp\u002Ftest\n```\n\nExpected output:\n```\nWrote 39 words to \u002Ftmp\u002Ftest\u002FExternal Inputs\u002FYouTube\u002Fjawed\u002F2005-04-24-me-at-the-zoo.md. Language: en. Subtitle source: manual.\n```\n\nThe output file contains valid frontmatter and a clean prose body.\n\n## Dependencies\n\n- `yt-dlp` (required): install via `brew install yt-dlp` or `pip3 install --user yt-dlp`\n- `whisper-cpp` (optional for a manual fallback outside this script)\n\n## Source\n\nBundled in [adelaidasofia\u002Fai-brain-starter](https:\u002F\u002Fgithub.com\u002Fadelaidasofia\u002Fai-brain-starter), a verification harness around an AI agent so memory compounds instead of corrupts. The skill is part of the ingest-* family of vault connectors.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,101,625,"2026-05-16 13:23:47",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"设计创意","design","mdi-palette-outline","UI 设计、生成艺术、品牌视觉等创意 Skill",3,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"视觉创意","graphic","mdi-brush","海报、Logo、插画等视觉创作",2,48,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"a218fd4e-eb52-48e5-98e5-1753a52782a2","1.0.0","ingest-youtube.zip",6815,"uploads\u002Fskills\u002Fd17e8581-b0fd-482d-8d4c-99f11246db6b\u002Fingest-youtube.zip","c1e4ca9679a9a1f0aa9891f36f8ef4d546ac3e79205662c3d030eb56fc9696da","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":6029},{\"path\":\"ingest.py\",\"isDirectory\":false,\"size\":10733}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]