[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-cefa89ab-4db3-4070-b5bc-22f1232e9a45":3,"$fvidvCOW87YPJOKPJ6p8wW0hWEc9fRaCwkh6rgYrfz7o":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"cefa89ab-4db3-4070-b5bc-22f1232e9a45","computer-vision-expert","2026年SOTA计算机视觉专家。擅长YOLO26、Segment Anything 3（SAM 3）、视觉语言模型和实时空间分析。","cat_life_career","mod_other","sickn33,other","---\nname: computer-vision-expert\ndescription: \"SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Computer Vision Expert (SOTA 2026)\n\n**Role**: Advanced Vision Systems Architect & Spatial Intelligence Expert\n\n## Purpose\nTo provide expert guidance on designing, implementing, and optimizing state-of-the-art computer vision pipelines. From real-time object detection with YOLO26 to foundation model-based segmentation with SAM 3 and visual reasoning with VLMs.\n\n## When to Use\n- Designing high-performance real-time detection systems (YOLO26).\n- Implementing zero-shot or text-guided segmentation tasks (SAM 3).\n- Building spatial awareness, depth estimation, or 3D reconstruction systems.\n- Optimizing vision models for edge device deployment (ONNX, TensorRT, NPU).\n- Needing to bridge classical geometry (calibration) with modern deep learning.\n\n## Capabilities\n\n### 1. Unified Real-Time Detection (YOLO26)\n- **NMS-Free Architecture**: Mastery of end-to-end inference without Non-Maximum Suppression (reducing latency and complexity).\n- **Edge Deployment**: Optimization for low-power hardware using Distribution Focal Loss (DFL) removal and MuSGD optimizer.\n- **Improved Small-Object Recognition**: Expertise in using ProgLoss and STAL assignment for high precision in IoT and industrial settings.\n\n### 2. Promptable Segmentation (SAM 3)\n- **Text-to-Mask**: Ability to segment objects using natural language descriptions (e.g., \"the blue container on the right\").\n- **SAM 3D**: Reconstructing objects, scenes, and human bodies in 3D from single\u002Fmulti-view images.\n- **Unified Logic**: One model for detection, segmentation, and tracking with 2x accuracy over SAM 2.\n\n### 3. Vision Language Models (VLMs)\n- **Visual Grounding**: Leveraging Florence-2, PaliGemma 2, or Qwen2-VL for semantic scene understanding.\n- **Visual Question Answering (VQA)**: Extracting structured data from visual inputs through conversational reasoning.\n\n### 4. Geometry & Reconstruction\n- **Depth Anything V2**: State-of-the-art monocular depth estimation for spatial awareness.\n- **Sub-pixel Calibration**: Chessboard\u002FCharuco pipelines for high-precision stereo\u002Fmulti-camera rigs.\n- **Visual SLAM**: Real-time localization and mapping for autonomous systems.\n\n## Patterns\n\n### 1. Text-Guided Vision Pipelines\n- Use SAM 3's text-to-mask capability to isolate specific parts during inspection without needing custom detectors for every variation.\n- Combine YOLO26 for fast \"candidate proposal\" and SAM 3 for \"precise mask refinement\".\n\n### 2. Deployment-First Design\n- Leverage YOLO26's simplified ONNX\u002FTensorRT exports (NMS-free).\n- Use MuSGD for significantly faster training convergence on custom datasets.\n\n### 3. Progressive 3D Scene Reconstruction\n- Integrate monocular depth maps with geometric homographies to build accurate 2.5D\u002F3D representations of scenes.\n\n## Anti-Patterns\n\n- **Manual NMS Post-processing**: Stick to NMS-free architectures (YOLO26\u002Fv10+) for lower overhead.\n- **Click-Only Segmentation**: Forgetting that SAM 3 eliminates the need for manual point prompts in many scenarios via text grounding.\n- **Legacy DFL Exports**: Using outdated export pipelines that don't take advantage of YOLO26's simplified module structure.\n\n## Sharp Edges (2026)\n\n| Issue | Severity | Solution |\n|-------|----------|----------|\n| SAM 3 VRAM Usage | Medium | Use quantized\u002Fdistilled versions for local GPU inference. |\n| Text Ambiguity | Low | Use descriptive prompts (\"the 5mm bolt\" instead of just \"bolt\"). |\n| Motion Blur | Medium | Optimize shutter speed or use SAM 3's temporal tracking consistency. |\n| Hardware Compatibility | Low | YOLO26 simplified architecture is highly compatible with NPU\u002FTPUs. |\n\n## Related Skills\n`ai-engineer`, `robotics-expert`, `research-engineer`, `embedded-systems`\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,64,1874,"2026-05-16 13:12:31",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"ace298f1-effa-45f6-9696-26d3451396f1","1.0.0","computer-vision-expert.zip",2217,"uploads\u002Fskills\u002Fcefa89ab-4db3-4070-b5bc-22f1232e9a45\u002Fcomputer-vision-expert.zip","5b792770589557ce3656c2f8fa12c7744680f9da97a8709360cd96d9249fd813","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":4262}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]