应用简介
使用Azure OpenAI的实时API从文本内容生成真实音频叙述。
---
name: podcast-generation
description: "Generate real audio narratives from text content using Azure OpenAI's Realtime API."
risk: unknown
source: community
date_added: "2026-02-27"
---
# Podcast Generation with GPT Realtime Mini
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
## Quick Start
1. Configure environment variables for Realtime API
2. Connect via WebSocket to Azure OpenAI Realtime endpoint
3. Send text prompt, collect PCM audio chunks + transcript
4. Convert PCM to WAV format
5. Return base64-encoded audio to frontend for playback
## Environment Configuration
```env
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
```
**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.
## Core Workflow
### Backend Audio Generation
```python
from openai import AsyncOpenAI
import base64
# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
client = AsyncOpenAI(
websocket_base_url=ws_url,
api_key=api_key
)
audio_chunks = []
transcript_parts = []
async with client.realtime.connect(model="gpt-realtime-mini") as conn:
# Configure for audio-only output
await conn.session.update(session={
"output_modalities": ["audio"],
"instructions": "You are a narrator. Speak naturally."
})
# Send text to narrate
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": prompt}]
})
await conn.response.create()
# Collect streaming events
async for event in conn:
if event.type == "response.output_audio.delta":
audio_chunks.append(base64.b64decode(event.delta))
elif event.type == "response.output_audio_transcript.delta":
transcript_parts.append(event.delta)
elif event.type == "response.done":
break
# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
```
### Frontend Audio Playback
```javascript
// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
const bytes = atob(base64);
const arr = new Uint8Array(bytes.length);
for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
return new Blob([arr], { type: mimeType });
};
const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();
```
## Voice Options
| Voice | Character |
|-------|-----------|
| alloy | Neutral |
| echo | Warm |
| fable | Expressive |
| onyx | Deep |
| nova | Friendly |
| shimmer | Clear |
## Realtime API Events
- `response.output_audio.delta` - Base64 audio chunk
- `response.output_audio_transcript.delta` - Transcript text
- `response.done` - Generation complete
- `error` - Handle with `event.error.message`
## Audio Format
- **Input**: Text prompt
- **Output**: PCM audio (24kHz, 16-bit, mono)
- **Storage**: Base64-encoded WAV
## References
- **Full architecture**: See references/architecture.md for complete stack design
- **Code examples**: See references/code-examples.md for production patterns
- **PCM conversion**: Use scripts/pcm_to_wav.py for audio format conversion
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.
## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
sickn33
coding
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
编程开发
后端开发
文件结构
SKILL.md3.8 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息