skills/baidu-speech-to-text/SKILL.md
百度语音识别 - 将语音消息转换为文本。支持中文普通话、英语、粤语、四川话。专为国内服务器环境优化,自动绕过代理访问百度 API。
npx skillsauth add castle-x/skills-x baidu-speech-to-textInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
本 skill 用于将用户发送的语音消息(ogg/opus 格式)转换为文本。使用百度语音识别 API,专为国内服务器 + 代理环境优化。
proxychains4 代理访问海外服务(Discord、WhatsApp 等)unset LD_PRELOAD 绕过 proxychains 注入| 脚本 | 路径 | 用途 |
|------|------|------|
| ogg_to_text.sh | /root/.openclaw/workspace/scripts/ogg_to_text.sh | 推荐 - 简单易用 |
| speech_to_text.sh | /root/.openclaw/workspace/scripts/speech_to_text.sh | 完整功能版 |
| baidu_speech_to_text.py | /root/.openclaw/workspace/scripts/baidu_speech_to_text.py | Python 主脚本 |
/root/.openclaw/workspace/scripts/ogg_to_text.sh <音频文件路径>
# 转换用户语音消息
/root/.openclaw/workspace/scripts/ogg_to_text.sh /root/.openclaw/media/inbound/xxxxx.ogg
| 参数 | 语言 |
|------|------|
| --lang zh | 中文普通话(默认)|
| --lang en | 英语 |
| --lang cantonese | 粤语 |
| --lang sichuan | 四川话 |
/root/.openclaw/workspace/scripts/speech_to_text.sh <音频文件> --pro
用户通过 WhatsApp/Discord 发送的语音消息保存在:
/root/.openclaw/media/inbound/
文件格式通常为 .ogg(Opus 编码),脚本会自动转换为 PCM 格式。
当用户发送语音消息并请求转文本时:
/root/.openclaw/media/inbound/*.ogg)/root/.openclaw/workspace/scripts/ogg_to_text.sh /root/.openclaw/media/inbound/<文件名>.ogg
| 阶段 | 耗时 | 说明 | |------|------|------| | 获取 token | ~100ms | 可缓存优化 | | ogg 转 pcm | ~150-220ms | ffmpeg 转换 | | API 调用 | ~400-600ms | 主要耗时 | | 总计 | ~700-1000ms | |
使用环境变量提供百度 API 账号信息(请勿写入仓库):
export BAIDU_APP_ID="your_app_id"
export BAIDU_API_KEY="your_api_key"
export BAIDU_SECRET_KEY="your_secret_key"
端点:
| 错误 | 原因 | 解决方案 | |------|------|---------| | SSL 错误 | 代理影响 | 确保使用 wrapper 脚本(.sh),不要直接调用 Python | | 识别结果为空 | 静音或无语音 | 告知用户音频可能没有语音内容 | | 3301 错误 | 音频质量差 | 请用户重新录制 | | 3303 错误 | 语音过长 | 需要分段处理 |
tools
Design specification for CLI TUI (Terminal User Interface). This skill provides comprehensive guidelines for implementing interactive terminal UI components, including page layout structure, color schemes, keyboard navigation, and multi-level navigation principles.
documentation
Guide for contributing new skills to the skills-x collection. This skill should be used when users want to add new open-source skills from external sources (like agentskills.io or anthropics/skills) to the skills-x repository. It covers the complete workflow from discovery to publishing.
tools
Use when designing or refining UIs that must be visually minimal, low-noise, and icon-forward while staying understandable for new users, especially when reducing text, consolidating controls, or streamlining dialogs, toolbars, search panels, or list results.
development
Integrate PocketBase as a Go library using the github.com/castle-x/goutils/pocketbase (gopb) package to build single-binary full-stack applications. Use when building Go applications that need user authentication, embedding PocketBase into Go binary, registering custom API routes, managing default users, serving embedded SPA frontend, or deploying single-binary applications. NOT for using PocketBase as a standalone separate process.