应用简介
使用Pipecat、Gemini和OpenAI构建一个低延迟、灵感来自钢铁侠的战术语音助手(F.R.I.D.A.Y.)。
--- name: pipecat-friday-agent description: "Build a low-latency, Iron Man-inspired tactical voice assistant (F.R.I.D.A.Y.) using Pipecat, Gemini, and OpenAI." category: voice-agents risk: safe source: community date_added: "2026-03-10" tags: [pipecat, voice, gemini, openai, python] tools: [pipecat] --- # Pipecat Friday Agent ## Overview This skill provides a blueprint for building **F.R.I.D.A.Y.** (Replacement Integrated Digital Assistant Youth), a local voice assistant inspired by the tactical AI from the Iron Man films. It uses the **Pipecat** framework to orchestrate a low-latency pipeline: - **STT**: OpenAI Whisper (`whisper-1`) or `gpt-4o-transcribe` - **LLM**: Google Gemini 2.5 Flash (via a compatibility shim) - **TTS**: OpenAI TTS (`nova` voice) - **Transport**: Local Audio (Hardware Mic/Speakers) ## When to Use This Skill - Use when you want to build a real-time, conversational voice agent. - Use when working with the Pipecat framework for pipeline-based AI. - Use when you need to integrate multiple providers (Google and OpenAI) into a single voice loop. - Use when building Iron Man-themed or tactical-themed voice applications. ## How It Works ### Step 1: Install Dependencies You will need the Pipecat framework and its service providers installed: ```bash pip install pipecat-ai[openai,google,silero] python-dotenv ``` ### Step 2: Configure Environment Create a `.env` file with your API keys: ```env OPENAI_API_KEY=your_openai_key GOOGLE_API_KEY=your_google_key ``` ### Step 3: Run the Agent Execute the provided Python script to start the interface: ```bash python scripts/friday_agent.py ``` ## Core Concepts ### Pipeline Architecture The agent follows a linear pipeline: `Mic -> VAD -> STT -> LLM -> TTS -> Speaker`. This allows for granular control over each stage, unlike end-to-end speech-to-speech models. ### Google Compatibility Shim Since Google's Gemini API has a different message format than OpenAI's standard (which Pipecat aggregators expect), the script includes a `GoogleSafeContext` and `GoogleSafeMessage` class to bridge the gap. ## Best Practices - ✅ **Use Silero VAD**: It is robust for local hardware and prevents background noise from triggering the LLM. - ✅ **Concise Prompts**: Tactical agents should give short, data-dense responses to minimize latency. - ✅ **Sample Rate Match**: OpenAI TTS outputs at 24kHz; ensure your `audio_out_sample_rate` matches to avoid high-pitched or slowed audio. - ❌ **No Polite Fillers**: Avoid "Hello, how can I help you today?" Instead, use "Systems nominal. Ready for commands." ## Troubleshooting - **Problem:** Audio is choppy or delayed. - **Solution:** Check your `OUTPUT_DEVICE` index. Run a script like `test_audio_output.py` to find the correct hardware index for your OS. - **Problem:** "Validation error" for message format. - **Solution:** Ensure the `GoogleSafeContext` shim is correctly translating OpenAI-style dicts to Gemini-style schema. ## Related Skills - `@voice-agents` - General principles of voice AI. - `@agent-tool-builder` - Add tools (Search, Lights, etc.) to your Friday agent. - `@llm-architect` - Optimizing the LLM layer. ## Limitations - Use this skill only when the task clearly matches the scope described above. - Do not treat the output as a substitute for environment-specific validation, testing, or expert review. - Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
sickn33
coding
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
编程开发
后端开发
文件结构
scripts
SKILL.md3.4 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息