Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ada20204/qwen-voice

Name: qwen-voice
Author: ada20204

/SKILL.md

npx skillsauth add ada20204/qwen-voice qwen-voice

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Qwen Voice (ASR + TTS)

Use the bundled scripts. Configure DASHSCOPE_API_KEY in one of:

~/.config/qwen-voice/.env (recommended)
<repo>/.qwen-voice/.env (dev/testing)

ASR (speech → text)

Non-timestamp (default)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

With timestamps (chunk-based)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

Notes:

Timestamps are generated by fixed-length chunking (not word-level alignment).
Input audio is converted to mono 16kHz WAV before sending.

TTS (text → speech)

Preset voice (default: Cherry)

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 Pi。' --voice Cherry --out /tmp/out.ogg

Clone voice (create once, reuse)

Create a voice profile from a sample audio:

python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json

Use the cloned voice to synthesize:

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

Notes:

.ogg output is Opus, suitable for Telegram voice messages.
Voice cloning uses DashScope customization endpoint + Qwen realtime TTS model.
Scripts use a local venv at work/venv-dashscope (auto-created on first run).

Typical chat workflow

When user sends voice message/audio: run ASR and reply with the transcribed text.
When user explicitly asks for voice reply: run TTS and send the generated .ogg as a voice note.

ada20204/qwen-voice

/SKILL.md

Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram.

5 stars

content-media

Updated Mar 29, 2026

$ install --global

skillsauth

npx skillsauth add ada20204/qwen-voice qwen-voice

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 1, 2026, 12:04 AM52.9s9 files scanned

SKILL.md

name:: qwen-voice
description:: Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram.

Qwen Voice (ASR + TTS)

Use the bundled scripts. Configure DASHSCOPE_API_KEY in one of:

~/.config/qwen-voice/.env (recommended)
<repo>/.qwen-voice/.env (dev/testing)

ASR (speech → text)

Non-timestamp (default)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

With timestamps (chunk-based)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

Notes:

Timestamps are generated by fixed-length chunking (not word-level alignment).
Input audio is converted to mono 16kHz WAV before sending.

TTS (text → speech)

Preset voice (default: Cherry)

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 Pi。' --voice Cherry --out /tmp/out.ogg

Clone voice (create once, reuse)

Create a voice profile from a sample audio:

python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json

Use the cloned voice to synthesize:

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

Notes:

.ogg output is Opus, suitable for Telegram voice messages.
Voice cloning uses DashScope customization endpoint + Qwen realtime TTS model.
Scripts use a local venv at work/venv-dashscope (auto-created on first run).

Typical chat workflow

When user sends voice message/audio: run ASR and reply with the transcribed text.
When user explicitly asks for voice reply: run TTS and send the generated .ogg as a voice note.

Related Skills

steipete/summarize

content-media

VerifiedTrustedCommunity

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

356,423SKILL.mdUpdated Apr 13, 2026

steipete/qqbot-media

content-media

VerifiedTrustedCommunity

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

356,423SKILL.mdUpdated Apr 13, 2026

openclaw/summarize

content-media

VerifiedTrustedCommunity

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

353,662SKILL.mdUpdated Apr 10, 2026

openclaw/qqbot-media

content-media

VerifiedTrustedCommunity

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

353,662SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ada20204/qwen-voice.git

# Copy into Claude Code skills folder (global)
cp -r qwen-voice/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ada20204/qwen-voice

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT