Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ValorInvestigator/qwen3-tts

Name: qwen3-tts
Author: ValorInvestigator

skills/qwen3-tts/SKILL.md

npx skillsauth add ValorInvestigator/claude-plugin-toolkit qwen3-tts

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Qwen3-TTS Story to Audio

Converts written stories (.md or .txt) into high-quality narrated audio using Qwen3-TTS voice cloning in Levi's voice. Handles full preprocessing: strips markdown, optimizes text for TTS, intelligently chunks for minimal seams, cross-fades between chunks, and normalizes the final output.

When to Use This Skill

Trigger when user:

Says "narrate this", "make audio", "convert to audio", "story to audio", "read this aloud"
Asks to generate voice audio from a story or article file
Mentions "qwen3", "qwen tts", "voice clone"
Wants to create an audiobook or podcast-style narration
Says "do this one in my voice" about a text file

What This Skill Does

Takes a story file (.md or .txt) as input
Preprocesses the text for TTS (strips markdown, expands abbreviations, optimizes punctuation)
Splits into smart chunks (targeting ~2,000-4,000 chars each for 1-3 min audio per chunk)
Generates audio using Qwen3-TTS with Levi's cloned voice
Cross-fades between chunks to eliminate seams
Applies EBU R128 normalization (-24 LUFS)
Outputs final MP3 at 192kbps

Execution Procedure

Step 1: Identify the input file

Ask the user which file to convert if not specified. The story files are typically in:

D:\Bingaman Master Files Old\Home Base Claude\new story\

Step 2: Run the generation script

cd "C:\Users\Big Levi\Desktop\Claude\Chatterbox-TTS-Extended"
python story_to_audio_qwen3.py "PATH_TO_STORY_FILE"

Flags:

--dry-run : Show chunks and preprocessing without generating
--output PATH : Custom output path (default: same dir as input, .mp3)
--start-section "HEADER" : Start narrating from a specific section header
--end-section "HEADER" : Stop narrating at a specific section header (exclusive)
--skip-metadata : Strip bylines, navigation, disclaimers, author bios (default: ON)
--chunk-chars N : Target chunk size in characters (default: 3000)
--crossfade-ms N : Cross-fade duration between chunks in ms (default: 500)

Examples:

# Convert entire story
python story_to_audio_qwen3.py "D:\...\new story\STORY_1_THE_REQUEST.md"

# Convert only Sections III-V
python story_to_audio_qwen3.py "D:\...\new story\STORY_2.md" --start-section "III" --end-section "VI"

# Dry run to preview chunks
python story_to_audio_qwen3.py "D:\...\new story\STORY_3.md" --dry-run

Step 3: Monitor progress

The script prints chunk-by-chunk progress:

[1/12] (3204 chars) Section I-II... -> 142.3s audio in 165.0s (RTF 1.16x)
[2/12] (2891 chars) Section III...  -> 128.7s audio in 149.5s (RTF 1.16x)

At ~1.16x real-time factor on the RTX 3080, a 30-minute story takes ~35 minutes to generate.

Step 4: Report results

Tell the user:

Output file path and size
Total duration
Generation time

Key Paths

| What | Path | |------|------| | Script | C:\Users\Big Levi\Desktop\Claude\Chatterbox-TTS-Extended\story_to_audio_qwen3.py | | Voice Reference | C:\Users\Big Levi\OneDrive\Documents\Sound Recordings\levi_voice_best_24k.wav | | Voice Transcript | "Now this article is just the beginning. I've mentioned I'll be exposing connections between disability rights organ and other legal networks in the next article, because what I've found suggests that even the lawyers who are" | | Story Files | D:\Bingaman Master Files Old\Home Base Claude\new story\ | | Model | Qwen/Qwen3-TTS-12Hz-1.7B-Base (auto-downloads from HuggingFace, ~4.5 GB) |

Voice Settings

Voice: Levi Bakke (cloned from 12-second reference)
Model: Qwen3-TTS-12Hz-1.7B-Base
GPU: NVIDIA RTX 3080 (10 GB VRAM, uses ~5.2 GB)
Generation: do_sample=True, top_k=50, temperature=0.9, repetition_penalty=1.05
Output: 192kbps MP3, EBU R128 normalized (-24 LUFS, -2 TP, 7 LRA)
Chunking: ~3,000 chars per chunk (1-3 min audio), sentence-boundary aware
Cross-fade: 500ms between chunks to eliminate seams

Text Preprocessing (automatic)

The script automatically:

Strips markdown headers, bold, italic, horizontal rules, block quotes
Removes metadata (bylines, navigation bars, disclaimers, author bios, footnotes)
Expands common abbreviations for natural speech:
- "DHS" -> "D.H.S" (spelled out with pauses)
- "APS" -> "A.P.S"
- "SOQ" -> "S.O.Q"
- "ORS" -> "O.R.S"
- "OAR" -> "O.A.R"
- "HIPAA" -> "H.I.P.A.A"
- "COVID" -> "Covid"
- "ODHS" -> "O.D.H.S"
- "MAR" -> "M.A.R"
Fixes pronunciation of place names:
- "La Grande" -> "La Grand" (prevents French pronunciation "La Grand-eh")
- "Lagrand" -> "La Grand"
- "Bingaman" -> "Bing-uh-min" (prevents "Bing-a-man" mispronunciation)
Converts em-dashes to natural pauses
Handles quoted speech naturally
Preserves paragraph/section pacing with appropriate silences

Troubleshooting

"SoX could not be found"

This is a cosmetic warning from librosa. It does not affect generation. Ignore it.

"flash-attn is not installed"

This is fine. The model falls back to PyTorch SDPA attention. Uses ~6.8 GB VRAM instead of ~5.4 GB. Both fit on RTX 3080.

CUDA out of memory

Close other GPU applications. The model needs ~5-7 GB VRAM. Check with:

python -c "import torch; print(f'{torch.cuda.mem_get_info()[0]/1024**3:.1f} GB free')"

cp1252 encoding error

The script uses ASCII-safe print statements. If you still get encoding errors, set:

set PYTHONIOENCODING=utf-8

Important Notes

This skill is for Levi's voice ONLY. For other voices, use batch_canary_qwen3_full.py directly
First run downloads the model (~4.5 GB). Subsequent runs load from cache in ~10 seconds
The voice clone prompt is pre-computed once and reused for all chunks (fast)
Each chunk generates at ~1.16x real-time on RTX 3080 (e.g., 60s audio takes ~70s to generate)
Output chunks are saved individually in a subdirectory for easy re-generation of any section

ValorInvestigator/qwen3-tts

skills/qwen3-tts/SKILL.md

Story to Audio — Qwen3-TTS Voice Clone in Levi's Voice

content-media

Updated Apr 22, 2026

$ install --global

skillsauth

npx skillsauth add ValorInvestigator/claude-plugin-toolkit qwen3-tts

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 10:59 PM136.4s1 file scanned

SKILL.md

name:: qwen3-tts
description:: Story to Audio — Qwen3-TTS Voice Clone in Levi's Voice
user_invocable:: true

Qwen3-TTS Story to Audio

When to Use This Skill

Trigger when user:

Says "narrate this", "make audio", "convert to audio", "story to audio", "read this aloud"
Asks to generate voice audio from a story or article file
Mentions "qwen3", "qwen tts", "voice clone"
Wants to create an audiobook or podcast-style narration
Says "do this one in my voice" about a text file

What This Skill Does

Takes a story file (.md or .txt) as input
Preprocesses the text for TTS (strips markdown, expands abbreviations, optimizes punctuation)
Splits into smart chunks (targeting ~2,000-4,000 chars each for 1-3 min audio per chunk)
Generates audio using Qwen3-TTS with Levi's cloned voice
Cross-fades between chunks to eliminate seams
Applies EBU R128 normalization (-24 LUFS)
Outputs final MP3 at 192kbps

Execution Procedure

Step 1: Identify the input file

Ask the user which file to convert if not specified. The story files are typically in:

D:\Bingaman Master Files Old\Home Base Claude\new story\

Step 2: Run the generation script

cd "C:\Users\Big Levi\Desktop\Claude\Chatterbox-TTS-Extended"
python story_to_audio_qwen3.py "PATH_TO_STORY_FILE"

Flags:

--dry-run : Show chunks and preprocessing without generating
--output PATH : Custom output path (default: same dir as input, .mp3)
--start-section "HEADER" : Start narrating from a specific section header
--end-section "HEADER" : Stop narrating at a specific section header (exclusive)
--skip-metadata : Strip bylines, navigation, disclaimers, author bios (default: ON)
--chunk-chars N : Target chunk size in characters (default: 3000)
--crossfade-ms N : Cross-fade duration between chunks in ms (default: 500)

Examples:

# Convert entire story
python story_to_audio_qwen3.py "D:\...\new story\STORY_1_THE_REQUEST.md"

# Convert only Sections III-V
python story_to_audio_qwen3.py "D:\...\new story\STORY_2.md" --start-section "III" --end-section "VI"

# Dry run to preview chunks
python story_to_audio_qwen3.py "D:\...\new story\STORY_3.md" --dry-run

Step 3: Monitor progress

The script prints chunk-by-chunk progress:

[1/12] (3204 chars) Section I-II... -> 142.3s audio in 165.0s (RTF 1.16x)
[2/12] (2891 chars) Section III...  -> 128.7s audio in 149.5s (RTF 1.16x)

At ~1.16x real-time factor on the RTX 3080, a 30-minute story takes ~35 minutes to generate.

Step 4: Report results

Tell the user:

Output file path and size
Total duration
Generation time

Key Paths

Voice Settings

Voice: Levi Bakke (cloned from 12-second reference)
Model: Qwen3-TTS-12Hz-1.7B-Base
GPU: NVIDIA RTX 3080 (10 GB VRAM, uses ~5.2 GB)
Generation: do_sample=True, top_k=50, temperature=0.9, repetition_penalty=1.05
Output: 192kbps MP3, EBU R128 normalized (-24 LUFS, -2 TP, 7 LRA)
Chunking: ~3,000 chars per chunk (1-3 min audio), sentence-boundary aware
Cross-fade: 500ms between chunks to eliminate seams

Text Preprocessing (automatic)

The script automatically:

Strips markdown headers, bold, italic, horizontal rules, block quotes
Removes metadata (bylines, navigation bars, disclaimers, author bios, footnotes)
Expands common abbreviations for natural speech:
- "DHS" -> "D.H.S" (spelled out with pauses)
- "APS" -> "A.P.S"
- "SOQ" -> "S.O.Q"
- "ORS" -> "O.R.S"
- "OAR" -> "O.A.R"
- "HIPAA" -> "H.I.P.A.A"
- "COVID" -> "Covid"
- "ODHS" -> "O.D.H.S"
- "MAR" -> "M.A.R"
Fixes pronunciation of place names:
- "La Grande" -> "La Grand" (prevents French pronunciation "La Grand-eh")
- "Lagrand" -> "La Grand"
- "Bingaman" -> "Bing-uh-min" (prevents "Bing-a-man" mispronunciation)
Converts em-dashes to natural pauses
Handles quoted speech naturally
Preserves paragraph/section pacing with appropriate silences

Troubleshooting

"SoX could not be found"

This is a cosmetic warning from librosa. It does not affect generation. Ignore it.

"flash-attn is not installed"

This is fine. The model falls back to PyTorch SDPA attention. Uses ~6.8 GB VRAM instead of ~5.4 GB. Both fit on RTX 3080.

CUDA out of memory

Close other GPU applications. The model needs ~5-7 GB VRAM. Check with:

python -c "import torch; print(f'{torch.cuda.mem_get_info()[0]/1024**3:.1f} GB free')"

cp1252 encoding error

The script uses ASCII-safe print statements. If you still get encoding errors, set:

set PYTHONIOENCODING=utf-8

Important Notes

This skill is for Levi's voice ONLY. For other voices, use batch_canary_qwen3_full.py directly
First run downloads the model (~4.5 GB). Subsequent runs load from cache in ~10 seconds
The voice clone prompt is pre-computed once and reused for all chunks (fast)
Each chunk generates at ~1.16x real-time on RTX 3080 (e.g., 60s audio takes ~70s to generate)
Output chunks are saved individually in a subdirectory for easy re-generation of any section

Related Skills

ValorInvestigator/skills/write-article

development

VerifiedTrustedCommunity

# Write Article -- Investigative Series in Levi Bakke's Voice You are ghostwriting publishable investigative journalism in Levi's voice. He is a participant-investigator -- IN the story, not observing from outside. ## BEFORE WRITING Read the style guide: [references/style-guide.md](references/style-guide.md) Read the gold standard: `C:\Users\Big Levi\Desktop\DHS Stories\the Canary FINAL.txt` ## THE WRITING PROCESS 1. **Gather** -- Read relevant timeline docs, investigation files, databases

SKILL.mdUpdated Apr 22, 2026

ValorInvestigator/skills/write-article

ValorInvestigator/web-search

development

VerifiedTrustedCommunity

Dual-engine web search using BOTH Firecrawl AND Brave Search simultaneously. ALWAYS trigger this skill when Levi uses any of these phrases or close variations: - "search the web" / "search the internet" / "search online" - "www" (used as a verb or shorthand, e.g. "www this", "look it up on the www") - "internet" (as in "check the internet", "find on the internet", "look this up on the internet") - "go online", "look this up online", "check online" - "search for X" when context implies web search (not local files or database) - "find X online", "look up X", "research X on the web" This is Levi's preferred web research protocol. Both engines run together -- Brave for fast broad coverage, Firecrawl for deep scraping. Never use just one without the other when this skill triggers.

SKILL.mdUpdated Apr 22, 2026

ValorInvestigator/web-search

ValorInvestigator/web-scraping

development

VerifiedTrustedCommunity

Web scraping with anti-bot bypass, content extraction, undocumented APIs and poison pill detection. Use when extracting content from websites, handling paywalls, implementing scraping cascades or processing social media. Covers requests, trafilatura, Playwright with stealth mode, yt-dlp and instaloader patterns.

SKILL.mdUpdated Apr 22, 2026

ValorInvestigator/web-scraping

ValorInvestigator/skills/text-to-voice

development

VerifiedTrustedCommunity

# Text to Voice -- Convert Articles to Audio Convert written articles to spoken audio (.mp3) using Google Cloud TTS with Chirp 3: HD Algieba voice. ## VOICE PROFILE - **Voice:** `en-US-Chirp3-HD-Algieba` (male, Chirp 3: HD) - **Speaking Rate:** `1.0` | **Volume Gain:** `0.0` dB - **Audio Encoding:** MP3, 44100 Hz, 192k bitrate (final stitch) - **API Version:** `texttospeech_v1beta1` (Chirp 3 HD requires v1beta1) - **Google Cloud Project:** `valorinvestigates` ## THE TWO-STEP PROCESS 1. **Rew

SKILL.mdUpdated Apr 22, 2026

ValorInvestigator/skills/text-to-voice

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ValorInvestigator/claude-plugin-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r claude-plugin-toolkit/skills/qwen3-tts ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ValorInvestigator/claude-plugin-toolkit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT