[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-dfaff466-08f5-4e54-a8e5-227ba04cee70":3,"$fx-aXgfP1NFEdJ1BfiFjBkDC-NC6D0x2WEbqkaRIUYK0":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"dfaff466-08f5-4e54-a8e5-227ba04cee70","daily-news-report","根据预设的URL列表抓取内容，筛选高质量技术信息，并生成每日Markdown报告。","cat_life_career","mod_other","sickn33,other","---\nname: daily-news-report\ndescription: \"Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Daily News Report v3.0\n\n> **Architecture Upgrade**: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching\n\n## Core Architecture\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                        Main Agent (Orchestrator)                    │\n│  Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation    │\n├─────────────────────────────────────────────────────────────────────┤\n│                                                                      │\n│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │\n│   │ 1. Init     │ → │ 2. Dispatch │ → │ 3. Monitor  │ → │ 4. Evaluate │     │\n│   │ Read Config │    │ Assign Tasks│    │ Collect Res │    │ Filter\u002FSort │     │\n│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │\n│         │                  │                  │                  │           │\n│         ▼                  ▼                  ▼                  ▼           │\n│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │\n│   │ 5. Decision │ ← │ Enough 20?  │    │ 6. Generate │ → │ 7. Update   │     │\n│   │ Cont\u002FStop   │    │ Y\u002FN         │    │ Report File │    │ Cache Stats │     │\n│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │\n│                                                                      │\n└──────────────────────────────────────────────────────────────────────┘\n         ↓ Dispatch                          ↑ Return Results\n┌─────────────────────────────────────────────────────────────────────┐\n│                        SubAgent Execution Layer                      │\n├─────────────────────────────────────────────────────────────────────┤\n│                                                                      │\n│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐              │\n│   │ Worker A    │   │ Worker B    │   │ Browser     │              │\n│   │ (WebFetch)  │   │ (WebFetch)  │   │ (Headless)  │              │\n│   │ Tier1 Batch │   │ Tier2 Batch │   │ JS Render   │              │\n│   └─────────────┘   └─────────────┘   └─────────────┘              │\n│         ↓                 ↓                 ↓                        │\n│   ┌─────────────────────────────────────────────────────────────┐   │\n│   │                    Structured Result Return                 │   │\n│   │  { status, data: [...], errors: [...], metadata: {...} }    │   │\n│   └─────────────────────────────────────────────────────────────┘   │\n│                                                                      │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n## Configuration Files\n\nThis skill uses the following configuration files:\n\n| File | Purpose |\n|------|---------|\n| `sources.json` | Source configuration, priorities, scrape methods |\n| `cache.json` | Cached data, historical stats, deduplication fingerprints |\n\n## Execution Process Details\n\n### Phase 1: Initialization\n\n```yaml\nSteps:\n  1. Determine date (user argument or current date)\n  2. Read sources.json for source configurations\n  3. Read cache.json for historical data\n  4. Create output directory NewsReport\u002F\n  5. Check if a partial report exists for today (append mode)\n```\n\n### Phase 2: Dispatch SubAgents\n\n**Strategy**: Parallel dispatch, batch execution, early stopping mechanism\n\n```yaml\nWave 1 (Parallel):\n  - Worker A: Tier1 Batch A (HN, HuggingFace Papers)\n  - Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)\n\nWait for results → Evaluate count\n\nIf \u003C 15 high-quality items:\n  Wave 2 (Parallel):\n    - Worker C: Tier2 Batch A (James Clear, FS Blog)\n    - Worker D: Tier2 Batch B (HackerNoon, Scott Young)\n\nIf still \u003C 20 items:\n  Wave 3 (Browser):\n    - Browser Worker: ProductHunt, Latent Space (Require JS rendering)\n```\n\n### Phase 3: SubAgent Task Format\n\nTask format received by each SubAgent:\n\n```yaml\ntask: fetch_and_extract\nsources:\n  - id: hn\n    url: https:\u002F\u002Fnews.ycombinator.com\n    extract: top_10\n  - id: hf_papers\n    url: https:\u002F\u002Fhuggingface.co\u002Fpapers\n    extract: top_voted\n\noutput_schema:\n  items:\n    - source_id: string      # Source Identifier\n      title: string          # Title\n      summary: string        # 2-4 sentence summary\n      key_points: string[]   # Max 3 key points\n      url: string            # Original URL\n      keywords: string[]     # Keywords\n      quality_score: 1-5     # Quality Score\n\nconstraints:\n  filter: \"Cutting-edge Tech\u002FDeep Tech\u002FProductivity\u002FPractical Info\"\n  exclude: \"General Science\u002FMarketing Puff\u002FOverly Academic\u002FJob Posts\"\n  max_items_per_source: 10\n  skip_on_error: true\n\nreturn_format: JSON\n```\n\n### Phase 4: Main Agent Monitoring & Feedback\n\nMain Agent Responsibilities:\n\n```yaml\nMonitoring:\n  - Check SubAgent return status (success\u002Fpartial\u002Ffailed)\n  - Count collected items\n  - Record success rate per source\n\nFeedback Loop:\n  - If a SubAgent fails, decide whether to retry or skip\n  - If a source fails persistently, mark as disabled\n  - Dynamically adjust source selection for subsequent batches\n\nDecision:\n  - Items >= 25 AND HighQuality >= 20 → Stop scraping\n  - Items \u003C 15 → Continue to next batch\n  - All batches done but \u003C 20 → Generate with available content (Quality over Quantity)\n```\n\n### Phase 5: Evaluation & Filtering\n\n```yaml\nDeduplication:\n  - Exact URL match\n  - Title similarity (>80% considered duplicate)\n  - Check cache.json to avoid history duplicates\n\nScore Calibration:\n  - Unify scoring standards across SubAgents\n  - Adjust weights based on source credibility\n  - Bonus points for manually curated high-quality sources\n\nSorting:\n  - Descending order by quality_score\n  - Sort by source priority if scores are equal\n  - Take Top 20\n```\n\n### Phase 6: Browser Scraping (MCP Chrome DevTools)\n\nFor pages requiring JS rendering, use a headless browser:\n\n```yaml\nProcess:\n  1. Call mcp__chrome-devtools__new_page to open page\n  2. Call mcp__chrome-devtools__wait_for to wait for content load\n  3. Call mcp__chrome-devtools__take_snapshot to get page structure\n  4. Parse snapshot to extract required content\n  5. Call mcp__chrome-devtools__close_page to close page\n\nApplicable Scenarios:\n  - ProductHunt (403 on WebFetch)\n  - Latent Space (Substack JS rendering)\n  - Other SPA applications\n```\n\n### Phase 7: Generate Report\n\n```yaml\nOutput:\n  - Directory: NewsReport\u002F\n  - Filename: YYYY-MM-DD-news-report.md\n  - Format: Standard Markdown\n\nContent Structure:\n  - Title + Date\n  - Statistical Summary (Source count, items collected)\n  - 20 High-Quality Items (Template based)\n  - Generation Info (Version, Timestamps)\n```\n\n### Phase 8: Update Cache\n\n```yaml\nUpdate cache.json:\n  - last_run: Record this run info\n  - source_stats: Update stats per source\n  - url_cache: Add processed URLs\n  - content_hashes: Add content fingerprints\n  - article_history: Record included articles\n```\n\n## SubAgent Call Examples\n\n### Using general-purpose Agent\n\nSince custom agents require session restart to be discovered, use general-purpose and inject worker prompts:\n\n```\nTask Call:\n  subagent_type: general-purpose\n  model: haiku\n  prompt: |\n    You are a stateless execution unit. Only do the assigned task and return structured JSON.\n\n    Task: Scrape the following URLs and extract content\n\n    URLs:\n    - https:\u002F\u002Fnews.ycombinator.com (Extract Top 10)\n    - https:\u002F\u002Fhuggingface.co\u002Fpapers (Extract top voted papers)\n\n    Output Format:\n    {\n      \"status\": \"success\" | \"partial\" | \"failed\",\n      \"data\": [\n        {\n          \"source_id\": \"hn\",\n          \"title\": \"...\",\n          \"summary\": \"...\",\n          \"key_points\": [\"...\", \"...\", \"...\"],\n          \"url\": \"...\",\n          \"keywords\": [\"...\", \"...\"],\n          \"quality_score\": 4\n        }\n      ],\n      \"errors\": [],\n      \"metadata\": { \"processed\": 2, \"failed\": 0 }\n    }\n\n    Filter Criteria:\n    - Keep: Cutting-edge Tech\u002FDeep Tech\u002FProductivity\u002FPractical Info\n    - Exclude: General Science\u002FMarketing Puff\u002FOverly Academic\u002FJob Posts\n\n    Return JSON directly, no explanation.\n```\n\n### Using worker Agent (Requires session restart)\n\n```\nTask Call:\n  subagent_type: worker\n  prompt: |\n    task: fetch_and_extract\n    input:\n      urls:\n        - https:\u002F\u002Fnews.ycombinator.com\n        - https:\u002F\u002Fhuggingface.co\u002Fpapers\n    output_schema:\n      - source_id: string\n      - title: string\n      - summary: string\n      - key_points: string[]\n      - url: string\n      - keywords: string[]\n      - quality_score: 1-5\n    constraints:\n      filter: Cutting-edge Tech\u002FDeep Tech\u002FProductivity\u002FPractical Info\n      exclude: General Science\u002FMarketing Puff\u002FOverly Academic\n```\n\n## Output Template\n\n```markdown\n# Daily News Report (YYYY-MM-DD)\n\n> Curated from N sources today, containing 20 high-quality items\n> Generation Time: X min | Version: v3.0\n>\n> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.\n\n---\n\n## 1. Title\n\n- **Summary**: 2-4 lines overview\n- **Key Points**:\n  1. Point one\n  2. Point two\n  3. Point three\n- **Source**: Link\n- **Keywords**: `keyword1` `keyword2` `keyword3`\n- **Score**: ⭐⭐⭐⭐⭐ (5\u002F5)\n\n---\n\n## 2. Title\n...\n\n---\n\n*Generated by Daily News Report v3.0*\n*Sources: HN, HuggingFace, OneUsefulThing, ...*\n```\n\n## Constraints & Principles\n\n1.  **Quality over Quantity**: Low-quality content does not enter the report.\n2.  **Early Stop**: Stop scraping once 20 high-quality items are reached.\n3.  **Parallel First**: SubAgents in the same batch execute in parallel.\n4.  **Fault Tolerance**: Failure of a single source does not affect the whole process.\n5.  **Cache Reuse**: Avoid re-scraping the same content.\n6.  **Main Agent Control**: All decisions are made by the Main Agent.\n7.  **Fallback Awareness**: Detect sub-agent availability, gracefully degrade if unavailable.\n\n## Expected Performance\n\n| Scenario | Expected Time | Note |\n|---|---|---|\n| Optimal | ~2 mins | Tier1 sufficient, no browser needed |\n| Normal | ~3-4 mins | Requires Tier2 supplement |\n| Browser Needed | ~5-6 mins | Includes JS rendered pages |\n\n## Error Handling\n\n| Error Type | Handling |\n|---|---|\n| SubAgent Timeout | Log error, continue to next |\n| Source 403\u002F404 | Mark disabled, update sources.json |\n| Extraction Failed | Return raw content, Main Agent decides |\n| Browser Crash | Skip source, log entry |\n\n## Compatibility & Fallback\n\nTo ensure usability across different Agent environments, the following checks must be performed:\n\n1.  **Environment Check**:\n    -   In Phase 1 initialization, attempt to detect if `worker` sub-agent exists.\n    -   If not exists (or plugin not installed), automatically switch to **Serial Execution Mode**.\n\n2.  **Serial Execution Mode**:\n    -   Do not use parallel block.\n    -   Main Agent executes scraping tasks for each source sequentially.\n    -   Slower, but guarantees basic functionality.\n\n3.  **User Alert**:\n    -   MUST include a clear warning in the generated report header indicating the current degraded mode.\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,95,1093,"2026-05-16 13:13:53",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"ab109265-4c7d-49bc-9866-894bb1ed7115","1.0.0","daily-news-report.zip",6411,"uploads\u002Fskills\u002Fdfaff466-08f5-4e54-a8e5-227ba04cee70\u002Fdaily-news-report.zip","14ea55fd4dfa49456f7aa84d6650ee731ce56e4a5e154a17ae96d6b6d2392ed2","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":13355},{\"path\":\"cache.json\",\"isDirectory\":false,\"size\":976},{\"path\":\"sources.json\",\"isDirectory\":false,\"size\":4950}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]