[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-2aa69c1b-285c-4b5e-9c08-28cd58a06e7b":3,"$fEEjXpDAKVLxa3tLTYkQaAEmH01iEPk06ipwRQMaYNB8":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"2aa69c1b-285c-4b5e-9c08-28cd58a06e7b","azure-ai-voicelive-py","使用双向WebSocket通信构建实时语音AI应用。","cat_coding_devops","mod_coding","sickn33,coding","---\nname: azure-ai-voicelive-py\ndescription: \"Build real-time voice AI applications with bidirectional WebSocket communication.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# Azure AI Voice Live SDK\n\nBuild real-time voice AI applications with bidirectional WebSocket communication.\n\n## Installation\n\n```bash\npip install azure-ai-voicelive aiohttp azure-identity\n```\n\n## Environment Variables\n\n```bash\nAZURE_COGNITIVE_SERVICES_ENDPOINT=https:\u002F\u002F\u003Cregion>.api.cognitive.microsoft.com\n# For API key auth (not recommended for production)\nAZURE_COGNITIVE_SERVICES_KEY=\u003Capi-key>\n```\n\n## Authentication\n\n**DefaultAzureCredential (preferred)**:\n```python\nfrom azure.ai.voicelive.aio import connect\nfrom azure.identity.aio import DefaultAzureCredential\n\nasync with connect(\n    endpoint=os.environ[\"AZURE_COGNITIVE_SERVICES_ENDPOINT\"],\n    credential=DefaultAzureCredential(),\n    model=\"gpt-4o-realtime-preview\",\n    credential_scopes=[\"https:\u002F\u002Fcognitiveservices.azure.com\u002F.default\"]\n) as conn:\n    ...\n```\n\n**API Key**:\n```python\nfrom azure.ai.voicelive.aio import connect\nfrom azure.core.credentials import AzureKeyCredential\n\nasync with connect(\n    endpoint=os.environ[\"AZURE_COGNITIVE_SERVICES_ENDPOINT\"],\n    credential=AzureKeyCredential(os.environ[\"AZURE_COGNITIVE_SERVICES_KEY\"]),\n    model=\"gpt-4o-realtime-preview\"\n) as conn:\n    ...\n```\n\n## Quick Start\n\n```python\nimport asyncio\nimport os\nfrom azure.ai.voicelive.aio import connect\nfrom azure.identity.aio import DefaultAzureCredential\n\nasync def main():\n    async with connect(\n        endpoint=os.environ[\"AZURE_COGNITIVE_SERVICES_ENDPOINT\"],\n        credential=DefaultAzureCredential(),\n        model=\"gpt-4o-realtime-preview\",\n        credential_scopes=[\"https:\u002F\u002Fcognitiveservices.azure.com\u002F.default\"]\n    ) as conn:\n        # Update session with instructions\n        await conn.session.update(session={\n            \"instructions\": \"You are a helpful assistant.\",\n            \"modalities\": [\"text\", \"audio\"],\n            \"voice\": \"alloy\"\n        })\n        \n        # Listen for events\n        async for event in conn:\n            print(f\"Event: {event.type}\")\n            if event.type == \"response.audio_transcript.done\":\n                print(f\"Transcript: {event.transcript}\")\n            elif event.type == \"response.done\":\n                break\n\nasyncio.run(main())\n```\n\n## Core Architecture\n\n### Connection Resources\n\nThe `VoiceLiveConnection` exposes these resources:\n\n| Resource | Purpose | Key Methods |\n|----------|---------|-------------|\n| `conn.session` | Session configuration | `update(session=...)` |\n| `conn.response` | Model responses | `create()`, `cancel()` |\n| `conn.input_audio_buffer` | Audio input | `append()`, `commit()`, `clear()` |\n| `conn.output_audio_buffer` | Audio output | `clear()` |\n| `conn.conversation` | Conversation state | `item.create()`, `item.delete()`, `item.truncate()` |\n| `conn.transcription_session` | Transcription config | `update(session=...)` |\n\n## Session Configuration\n\n```python\nfrom azure.ai.voicelive.models import RequestSession, FunctionTool\n\nawait conn.session.update(session=RequestSession(\n    instructions=\"You are a helpful voice assistant.\",\n    modalities=[\"text\", \"audio\"],\n    voice=\"alloy\",  # or \"echo\", \"shimmer\", \"sage\", etc.\n    input_audio_format=\"pcm16\",\n    output_audio_format=\"pcm16\",\n    turn_detection={\n        \"type\": \"server_vad\",\n        \"threshold\": 0.5,\n        \"prefix_padding_ms\": 300,\n        \"silence_duration_ms\": 500\n    },\n    tools=[\n        FunctionTool(\n            type=\"function\",\n            name=\"get_weather\",\n            description=\"Get current weather\",\n            parameters={\n                \"type\": \"object\",\n                \"properties\": {\n                    \"location\": {\"type\": \"string\"}\n                },\n                \"required\": [\"location\"]\n            }\n        )\n    ]\n))\n```\n\n## Audio Streaming\n\n### Send Audio (Base64 PCM16)\n\n```python\nimport base64\n\n# Read audio chunk (16-bit PCM, 24kHz mono)\naudio_chunk = await read_audio_from_microphone()\nb64_audio = base64.b64encode(audio_chunk).decode()\n\nawait conn.input_audio_buffer.append(audio=b64_audio)\n```\n\n### Receive Audio\n\n```python\nasync for event in conn:\n    if event.type == \"response.audio.delta\":\n        audio_bytes = base64.b64decode(event.delta)\n        await play_audio(audio_bytes)\n    elif event.type == \"response.audio.done\":\n        print(\"Audio complete\")\n```\n\n## Event Handling\n\n```python\nasync for event in conn:\n    match event.type:\n        # Session events\n        case \"session.created\":\n            print(f\"Session: {event.session}\")\n        case \"session.updated\":\n            print(\"Session updated\")\n        \n        # Audio input events\n        case \"input_audio_buffer.speech_started\":\n            print(f\"Speech started at {event.audio_start_ms}ms\")\n        case \"input_audio_buffer.speech_stopped\":\n            print(f\"Speech stopped at {event.audio_end_ms}ms\")\n        \n        # Transcription events\n        case \"conversation.item.input_audio_transcription.completed\":\n            print(f\"User said: {event.transcript}\")\n        case \"conversation.item.input_audio_transcription.delta\":\n            print(f\"Partial: {event.delta}\")\n        \n        # Response events\n        case \"response.created\":\n            print(f\"Response started: {event.response.id}\")\n        case \"response.audio_transcript.delta\":\n            print(event.delta, end=\"\", flush=True)\n        case \"response.audio.delta\":\n            audio = base64.b64decode(event.delta)\n        case \"response.done\":\n            print(f\"Response complete: {event.response.status}\")\n        \n        # Function calls\n        case \"response.function_call_arguments.done\":\n            result = handle_function(event.name, event.arguments)\n            await conn.conversation.item.create(item={\n                \"type\": \"function_call_output\",\n                \"call_id\": event.call_id,\n                \"output\": json.dumps(result)\n            })\n            await conn.response.create()\n        \n        # Errors\n        case \"error\":\n            print(f\"Error: {event.error.message}\")\n```\n\n## Common Patterns\n\n### Manual Turn Mode (No VAD)\n\n```python\nawait conn.session.update(session={\"turn_detection\": None})\n\n# Manually control turns\nawait conn.input_audio_buffer.append(audio=b64_audio)\nawait conn.input_audio_buffer.commit()  # End of user turn\nawait conn.response.create()  # Trigger response\n```\n\n### Interrupt Handling\n\n```python\nasync for event in conn:\n    if event.type == \"input_audio_buffer.speech_started\":\n        # User interrupted - cancel current response\n        await conn.response.cancel()\n        await conn.output_audio_buffer.clear()\n```\n\n### Conversation History\n\n```python\n# Add system message\nawait conn.conversation.item.create(item={\n    \"type\": \"message\",\n    \"role\": \"system\",\n    \"content\": [{\"type\": \"input_text\", \"text\": \"Be concise.\"}]\n})\n\n# Add user message\nawait conn.conversation.item.create(item={\n    \"type\": \"message\",\n    \"role\": \"user\", \n    \"content\": [{\"type\": \"input_text\", \"text\": \"Hello!\"}]\n})\n\nawait conn.response.create()\n```\n\n## Voice Options\n\n| Voice | Description |\n|-------|-------------|\n| `alloy` | Neutral, balanced |\n| `echo` | Warm, conversational |\n| `shimmer` | Clear, professional |\n| `sage` | Calm, authoritative |\n| `coral` | Friendly, upbeat |\n| `ash` | Deep, measured |\n| `ballad` | Expressive |\n| `verse` | Storytelling |\n\nAzure voices: Use `AzureStandardVoice`, `AzureCustomVoice`, or `AzurePersonalVoice` models.\n\n## Audio Formats\n\n| Format | Sample Rate | Use Case |\n|--------|-------------|----------|\n| `pcm16` | 24kHz | Default, high quality |\n| `pcm16-8000hz` | 8kHz | Telephony |\n| `pcm16-16000hz` | 16kHz | Voice assistants |\n| `g711_ulaw` | 8kHz | Telephony (US) |\n| `g711_alaw` | 8kHz | Telephony (EU) |\n\n## Turn Detection Options\n\n```python\n# Server VAD (default)\n{\"type\": \"server_vad\", \"threshold\": 0.5, \"silence_duration_ms\": 500}\n\n# Azure Semantic VAD (smarter detection)\n{\"type\": \"azure_semantic_vad\"}\n{\"type\": \"azure_semantic_vad_en\"}  # English optimized\n{\"type\": \"azure_semantic_vad_multilingual\"}\n```\n\n## Error Handling\n\n```python\nfrom azure.ai.voicelive.aio import ConnectionError, ConnectionClosed\n\ntry:\n    async with connect(...) as conn:\n        async for event in conn:\n            if event.type == \"error\":\n                print(f\"API Error: {event.error.code} - {event.error.message}\")\nexcept ConnectionClosed as e:\n    print(f\"Connection closed: {e.code} - {e.reason}\")\nexcept ConnectionError as e:\n    print(f\"Connection error: {e}\")\n```\n\n## References\n\n- **Detailed API Reference**: See references\u002Fapi-reference.md\n- **Complete Examples**: See references\u002Fexamples.md\n- **All Models & Types**: See references\u002Fmodels.md\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,220,1967,"2026-05-16 13:05:40",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"DevOps","devops","mdi-cog-outline","CI\u002FCD、容器化、部署运维",3,162,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"4672f7b0-b9fa-4d97-895f-3a74b5d84943","1.0.0","azure-ai-voicelive-py.zip",3061,"uploads\u002Fskills\u002F2aa69c1b-285c-4b5e-9c08-28cd58a06e7b\u002Fazure-ai-voicelive-py.zip","68b43c95d747a79f1a14dd2028b7c605139b8d7ef23ae14e8d31f7548c300e04","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":9174}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]