/SKILL.md
Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram.
npx skillsauth add ada20204/qwen-voice qwen-voiceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use the bundled scripts. Configure DASHSCOPE_API_KEY in one of:
~/.config/qwen-voice/.env (recommended)<repo>/.qwen-voice/.env (dev/testing)python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg
python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3
Notes:
python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好,我是 Pi。' --voice Cherry --out /tmp/out.ogg
python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json
python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好,我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg
Notes:
.ogg output is Opus, suitable for Telegram voice messages.work/venv-dashscope (auto-created on first run)..ogg as a voice note.content-media
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
content-media
QQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
content-media
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
content-media
QQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。