[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-5e02ecea-7188-4f84-b078-a93665d1dcdf":3,"$fv_-b-vQucfGtymFumYmLo0O_BH5qa1NLCvq2B1NRr78":42},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":33},"5e02ecea-7188-4f84-b078-a93665d1dcdf","azure-speech-to-text-rest-py","Azure语音转文本REST API（Python）用于短音频。用于无需语音SDK的简单音频文件语音识别，文件长度不超过60秒。","cat_coding_backend","mod_coding","sickn33,coding","---\nname: azure-speech-to-text-rest-py\ndescription: Azure Speech to Text REST API for short audio (Python). Use for simple speech recognition of audio files up to 60 seconds without the Speech SDK.\nrisk: unknown\nsource: community\ndate_added: '2026-02-27'\n---\n\n# Azure Speech to Text REST API for Short Audio\n\nSimple REST API for speech-to-text transcription of short audio files (up to 60 seconds). No SDK required - just HTTP requests.\n\n## Prerequisites\n\n1. **Azure subscription** - [Create one free](https:\u002F\u002Fazure.microsoft.com\u002Ffree\u002F)\n2. **Speech resource** - Create in [Azure Portal](https:\u002F\u002Fportal.azure.com\u002F#create\u002FMicrosoft.CognitiveServicesSpeechServices)\n3. **Get credentials** - After deployment, go to resource > Keys and Endpoint\n\n## Environment Variables\n\n```bash\n# Required\nAZURE_SPEECH_KEY=\u003Cyour-speech-resource-key>\nAZURE_SPEECH_REGION=\u003Cregion>  # e.g., eastus, westus2, westeurope\n\n# Alternative: Use endpoint directly\nAZURE_SPEECH_ENDPOINT=https:\u002F\u002F\u003Cregion>.stt.speech.microsoft.com\n```\n\n## Installation\n\n```bash\npip install requests\n```\n\n## Quick Start\n\n```python\nimport os\nimport requests\n\ndef transcribe_audio(audio_file_path: str, language: str = \"en-US\") -> dict:\n    \"\"\"Transcribe short audio file (max 60 seconds) using REST API.\"\"\"\n    region = os.environ[\"AZURE_SPEECH_REGION\"]\n    api_key = os.environ[\"AZURE_SPEECH_KEY\"]\n    \n    url = f\"https:\u002F\u002F{region}.stt.speech.microsoft.com\u002Fspeech\u002Frecognition\u002Fconversation\u002Fcognitiveservices\u002Fv1\"\n    \n    headers = {\n        \"Ocp-Apim-Subscription-Key\": api_key,\n        \"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\",\n        \"Accept\": \"application\u002Fjson\"\n    }\n    \n    params = {\n        \"language\": language,\n        \"format\": \"detailed\"  # or \"simple\"\n    }\n    \n    with open(audio_file_path, \"rb\") as audio_file:\n        response = requests.post(url, headers=headers, params=params, data=audio_file)\n    \n    response.raise_for_status()\n    return response.json()\n\n# Usage\nresult = transcribe_audio(\"audio.wav\", \"en-US\")\nprint(result[\"DisplayText\"])\n```\n\n## Audio Requirements\n\n| Format | Codec | Sample Rate | Notes |\n|--------|-------|-------------|-------|\n| WAV | PCM | 16 kHz, mono | **Recommended** |\n| OGG | OPUS | 16 kHz, mono | Smaller file size |\n\n**Limitations:**\n- Maximum 60 seconds of audio\n- For pronunciation assessment: maximum 30 seconds\n- No partial\u002Finterim results (final only)\n\n## Content-Type Headers\n\n```python\n# WAV PCM 16kHz\n\"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\"\n\n# OGG OPUS\n\"Content-Type\": \"audio\u002Fogg; codecs=opus\"\n```\n\n## Response Formats\n\n### Simple Format (default)\n\n```python\nparams = {\"language\": \"en-US\", \"format\": \"simple\"}\n```\n\n```json\n{\n  \"RecognitionStatus\": \"Success\",\n  \"DisplayText\": \"Remind me to buy 5 pencils.\",\n  \"Offset\": \"1236645672289\",\n  \"Duration\": \"1236645672289\"\n}\n```\n\n### Detailed Format\n\n```python\nparams = {\"language\": \"en-US\", \"format\": \"detailed\"}\n```\n\n```json\n{\n  \"RecognitionStatus\": \"Success\",\n  \"Offset\": \"1236645672289\",\n  \"Duration\": \"1236645672289\",\n  \"NBest\": [\n    {\n      \"Confidence\": 0.9052885,\n      \"Display\": \"What's the weather like?\",\n      \"ITN\": \"what's the weather like\",\n      \"Lexical\": \"what's the weather like\",\n      \"MaskedITN\": \"what's the weather like\"\n    }\n  ]\n}\n```\n\n## Chunked Transfer (Recommended)\n\nFor lower latency, stream audio in chunks:\n\n```python\nimport os\nimport requests\n\ndef transcribe_chunked(audio_file_path: str, language: str = \"en-US\") -> dict:\n    \"\"\"Stream audio in chunks for lower latency.\"\"\"\n    region = os.environ[\"AZURE_SPEECH_REGION\"]\n    api_key = os.environ[\"AZURE_SPEECH_KEY\"]\n    \n    url = f\"https:\u002F\u002F{region}.stt.speech.microsoft.com\u002Fspeech\u002Frecognition\u002Fconversation\u002Fcognitiveservices\u002Fv1\"\n    \n    headers = {\n        \"Ocp-Apim-Subscription-Key\": api_key,\n        \"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\",\n        \"Accept\": \"application\u002Fjson\",\n        \"Transfer-Encoding\": \"chunked\",\n        \"Expect\": \"100-continue\"\n    }\n    \n    params = {\"language\": language, \"format\": \"detailed\"}\n    \n    def generate_chunks(file_path: str, chunk_size: int = 1024):\n        with open(file_path, \"rb\") as f:\n            while chunk := f.read(chunk_size):\n                yield chunk\n    \n    response = requests.post(\n        url, \n        headers=headers, \n        params=params, \n        data=generate_chunks(audio_file_path)\n    )\n    \n    response.raise_for_status()\n    return response.json()\n```\n\n## Authentication Options\n\n### Option 1: Subscription Key (Simple)\n\n```python\nheaders = {\n    \"Ocp-Apim-Subscription-Key\": os.environ[\"AZURE_SPEECH_KEY\"]\n}\n```\n\n### Option 2: Bearer Token\n\n```python\nimport requests\nimport os\n\ndef get_access_token() -> str:\n    \"\"\"Get access token from the token endpoint.\"\"\"\n    region = os.environ[\"AZURE_SPEECH_REGION\"]\n    api_key = os.environ[\"AZURE_SPEECH_KEY\"]\n    \n    token_url = f\"https:\u002F\u002F{region}.api.cognitive.microsoft.com\u002Fsts\u002Fv1.0\u002FissueToken\"\n    \n    response = requests.post(\n        token_url,\n        headers={\n            \"Ocp-Apim-Subscription-Key\": api_key,\n            \"Content-Type\": \"application\u002Fx-www-form-urlencoded\",\n            \"Content-Length\": \"0\"\n        }\n    )\n    response.raise_for_status()\n    return response.text\n\n# Use token in requests (valid for 10 minutes)\ntoken = get_access_token()\nheaders = {\n    \"Authorization\": f\"Bearer {token}\",\n    \"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\",\n    \"Accept\": \"application\u002Fjson\"\n}\n```\n\n## Query Parameters\n\n| Parameter | Required | Values | Description |\n|-----------|----------|--------|-------------|\n| `language` | **Yes** | `en-US`, `de-DE`, etc. | Language of speech |\n| `format` | No | `simple`, `detailed` | Result format (default: simple) |\n| `profanity` | No | `masked`, `removed`, `raw` | Profanity handling (default: masked) |\n\n## Recognition Status Values\n\n| Status | Description |\n|--------|-------------|\n| `Success` | Recognition succeeded |\n| `NoMatch` | Speech detected but no words matched |\n| `InitialSilenceTimeout` | Only silence detected |\n| `BabbleTimeout` | Only noise detected |\n| `Error` | Internal service error |\n\n## Profanity Handling\n\n```python\n# Mask profanity with asterisks (default)\nparams = {\"language\": \"en-US\", \"profanity\": \"masked\"}\n\n# Remove profanity entirely\nparams = {\"language\": \"en-US\", \"profanity\": \"removed\"}\n\n# Include profanity as-is\nparams = {\"language\": \"en-US\", \"profanity\": \"raw\"}\n```\n\n## Error Handling\n\n```python\nimport requests\n\ndef transcribe_with_error_handling(audio_path: str, language: str = \"en-US\") -> dict | None:\n    \"\"\"Transcribe with proper error handling.\"\"\"\n    region = os.environ[\"AZURE_SPEECH_REGION\"]\n    api_key = os.environ[\"AZURE_SPEECH_KEY\"]\n    \n    url = f\"https:\u002F\u002F{region}.stt.speech.microsoft.com\u002Fspeech\u002Frecognition\u002Fconversation\u002Fcognitiveservices\u002Fv1\"\n    \n    try:\n        with open(audio_path, \"rb\") as audio_file:\n            response = requests.post(\n                url,\n                headers={\n                    \"Ocp-Apim-Subscription-Key\": api_key,\n                    \"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\",\n                    \"Accept\": \"application\u002Fjson\"\n                },\n                params={\"language\": language, \"format\": \"detailed\"},\n                data=audio_file\n            )\n        \n        if response.status_code == 200:\n            result = response.json()\n            if result.get(\"RecognitionStatus\") == \"Success\":\n                return result\n            else:\n                print(f\"Recognition failed: {result.get('RecognitionStatus')}\")\n                return None\n        elif response.status_code == 400:\n            print(f\"Bad request: Check language code or audio format\")\n        elif response.status_code == 401:\n            print(f\"Unauthorized: Check API key or token\")\n        elif response.status_code == 403:\n            print(f\"Forbidden: Missing authorization header\")\n        else:\n            print(f\"Error {response.status_code}: {response.text}\")\n        \n        return None\n        \n    except requests.exceptions.RequestException as e:\n        print(f\"Request failed: {e}\")\n        return None\n```\n\n## Async Version\n\n```python\nimport os\nimport aiohttp\nimport asyncio\n\nasync def transcribe_async(audio_file_path: str, language: str = \"en-US\") -> dict:\n    \"\"\"Async version using aiohttp.\"\"\"\n    region = os.environ[\"AZURE_SPEECH_REGION\"]\n    api_key = os.environ[\"AZURE_SPEECH_KEY\"]\n    \n    url = f\"https:\u002F\u002F{region}.stt.speech.microsoft.com\u002Fspeech\u002Frecognition\u002Fconversation\u002Fcognitiveservices\u002Fv1\"\n    \n    headers = {\n        \"Ocp-Apim-Subscription-Key\": api_key,\n        \"Content-Type\": \"audio\u002Fwav; codecs=audio\u002Fpcm; samplerate=16000\",\n        \"Accept\": \"application\u002Fjson\"\n    }\n    \n    params = {\"language\": language, \"format\": \"detailed\"}\n    \n    async with aiohttp.ClientSession() as session:\n        with open(audio_file_path, \"rb\") as f:\n            audio_data = f.read()\n        \n        async with session.post(url, headers=headers, params=params, data=audio_data) as response:\n            response.raise_for_status()\n            return await response.json()\n\n# Usage\nresult = asyncio.run(transcribe_async(\"audio.wav\", \"en-US\"))\nprint(result[\"DisplayText\"])\n```\n\n## Supported Languages\n\nCommon language codes (see [full list](https:\u002F\u002Flearn.microsoft.com\u002Fazure\u002Fai-services\u002Fspeech-service\u002Flanguage-support)):\n\n| Code | Language |\n|------|----------|\n| `en-US` | English (US) |\n| `en-GB` | English (UK) |\n| `de-DE` | German |\n| `fr-FR` | French |\n| `es-ES` | Spanish (Spain) |\n| `es-MX` | Spanish (Mexico) |\n| `zh-CN` | Chinese (Mandarin) |\n| `ja-JP` | Japanese |\n| `ko-KR` | Korean |\n| `pt-BR` | Portuguese (Brazil) |\n\n## Best Practices\n\n1. **Use WAV PCM 16kHz mono** for best compatibility\n2. **Enable chunked transfer** for lower latency\n3. **Cache access tokens** for 9 minutes (valid for 10)\n4. **Specify the correct language** for accurate recognition\n5. **Use detailed format** when you need confidence scores\n6. **Handle all RecognitionStatus values** in production code\n\n## When NOT to Use This API\n\nUse the Speech SDK or Batch Transcription API instead when you need:\n\n- Audio longer than 60 seconds\n- Real-time streaming transcription\n- Partial\u002Finterim results\n- Speech translation\n- Custom speech models\n- Batch transcription of many files\n\n## Reference Files\n\n| File | Contents |\n|------|----------|\n| references\u002Fpronunciation-assessment.md | Pronunciation assessment parameters and scoring |\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,101,1541,"2026-05-16 13:07:45",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":25,"skillCount":32,"createdAt":26},"后端开发","backend","mdi-server","API、数据库、服务端架构",296,[34],{"id":35,"skillId":4,"version":36,"fileName":37,"fileSize":38,"filePath":39,"fileHash":40,"manifest":41,"createdAt":19},"00d26718-4309-4139-aaa1-3400aec78611","1.0.0","azure-speech-to-text-rest-py.zip",3537,"uploads\u002Fskills\u002F5e02ecea-7188-4f84-b078-a93665d1dcdf\u002Fazure-speech-to-text-rest-py.zip","c989f8db0edcaa12370724ec67149cb266f923fe43000f9669287e9536d1cd55","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10879}]",{"code":43,"message":44,"data":45},200,"success",{"items":46,"stats":47,"page":50},[],{"averageRating":48,"totalRatings":48,"ratingCounts":49},0,[48,48,48,48,48],{"limit":51,"offset":48,"hasMore":52,"nextOffset":51,"ratedOnly":16},15,false]