skills/qwen-asr/SKILL.md
Speech-to-text using Qwen3-ASR-0.6B-4bit MLX model via a local FastAPI service. Transcribes audio files and URLs. Optimized for Apple Silicon. Use when user sends voice messages or audio that needs transcription.
npx skillsauth add stvlynn/skills qwen-asrInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Speech-to-text using Qwen3-ASR model, running locally on Apple Silicon via a FastAPI service.
mlx-community/Qwen3-ASR-0.6B-4bit (4-bit quantized, ~400MB)http://localhost:8100cd /path/to/skills/skills/qwen-asr
python3 -m venv venv
source venv/bin/activate
模型从 ModelScope/HuggingFace 镜像下载(国内更快):
pip install -r service/requirements.txt
bash service/start.sh
首次启动时会自动从 hf-mirror.com 下载模型(约 400MB),后续启动使用本地缓存。
# 检查服务健康状态
curl http://localhost:8100/health
# 查看模型信息
curl http://localhost:8100/info
# 测试转录(使用在线音频)
curl -X POST "http://localhost:8100/transcribe_url?audio_url=https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
macOS 上可以使用 launchd 设置开机自启:
# 创建 plist(自行修改路径)
cat > ~/Library/LaunchAgents/com.qwen.asr.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.qwen.asr</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>cd /path/to/skills/skills/qwen-asr && bash service/start.sh</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/qwen-asr.log</string>
<key>StandardErrorPath</key>
<string>/tmp/qwen-asr.err</string>
</dict>
</plist>
PLIST
launchctl load ~/Library/LaunchAgents/com.qwen.asr.plist
| 问题 | 解决方法 |
|------|----------|
| 端口 8100 被占用 | lsof -i :8100 找到占用进程,或修改 start.sh 中端口号 |
| 模型下载缓慢 | 脚本已自动设置 HF_ENDPOINT=https://hf-mirror.com |
| 503 Model not loaded | 模型仍在加载中,首次约需 10-30 秒 |
| ModuleNotFoundError: mlx | 确认使用 Apple Silicon Mac |
| 转录结果为空 | 检查音频格式(支持 wav/mp3/ogg/flac),或音频可能太短 |
| Endpoint | Method | Description |
|----------|--------|-------------|
| /health | GET | Health check |
| /info | GET | Model information |
| /transcribe | POST | Transcribe uploaded audio file |
| /transcribe_url | POST | Transcribe audio from URL |
# Transcribe local file
python3 scripts/asr.py audio.wav
# Transcribe from URL
python3 scripts/asr.py --url "https://example.com/audio.wav"
# Specify language
python3 scripts/asr.py audio.wav --lang zh
# Check service status
python3 scripts/asr.py --check
# Upload file
curl -X POST "http://localhost:8100/transcribe" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"
# From URL with language
curl -X POST "http://localhost:8100/transcribe_url?audio_url=<URL>&language=zh"
import requests
# From URL
response = requests.post(
"http://localhost:8100/transcribe_url",
params={"audio_url": "https://example.com/audio.wav", "language": "zh"}
)
text = response.json()["text"]
# From file
with open("audio.wav", "rb") as f:
response = requests.post("http://localhost:8100/transcribe", files={"file": f})
text = response.json()["text"]
{
"text": "transcribed text here",
"chunks": [],
"processing_time": 0.123
}
# Start
bash service/start.sh
# Check status
python3 scripts/asr.py --check
# Stop (find and kill process)
lsof -ti :8100 | xargs kill
tools
小红书搜索、发布、获取帖子详情。使用本地 MCP 服务器访问小红书内容,需要先登录。适用于搜索旅游攻略、美食推荐、获取帖子详情等场景。
tools
Manage Telegram sticker packs via tsticker CLI. Init, push, sync, download, and trace sticker packs. Use when user wants to create/update Telegram sticker packs, push stickers to Telegram, sync packs, or manage sticker collections. Integrates with create-sticker for end-to-end sticker generation → publish workflow.
tools
Reuse local Youtu-Tip GUI capabilities through a safe adapter CLI so OpenClaw/Codex-style agents can inspect desktop GUI state and perform guarded single-step actions on macOS.
development
Privacy-respecting web search powered by a local SearXNG instance. Use when searching the web, looking up information, researching topics, or needing quick answers. Supports multiple categories including general, images, videos, news, and more. Aggregates results from 70+ search engines without tracking.