Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

laitszkin/docs-to-voice

Name: docs-to-voice
Author: laitszkin

skills/docs-to-voice/SKILL.md

npx skillsauth add laitszkin/apollo-toolkit docs-to-voice

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Docs to Voice

Standards

Evidence: Confirm project_dir, input source, mode, and environment-backed settings before generation.
Execution: Use apltk docs-to-voice to write audio plus matching timeline and subtitle files under project_dir/audio/{project_name}/.
Quality: Respect mode-specific options, sentence splitting rules, and post-process requirements such as ffmpeg for speed changes.
Output: Return the absolute output audio path together with the generated .timeline.json and .srt companions.

Workflow

Collect inputs.
- Require project_dir.
- Accept either raw text or one input text file.
- Set project_name; default to basename of project_dir.
Select mode.
- --mode say for local generation.
- --mode api for Model Studio API generation.
- If omitted, load DOCS_TO_VOICE_MODE from .env, then shell environment variables; fallback say.
Prepare output path.
- Build project_dir/audio/{project_name}/.
- Create directory if it does not exist.
Generate audio.
- say mode supports --voice, --rate, and punctuation-pause enhancement.
- api mode supports --api-endpoint, --api-model, --api-voice, and reads DASHSCOPE_API_KEY.
- api mode sends one request per sentence and concatenates all sentence audio into one final file.
- api mode auto discovers model max input length; only oversized sentences are split by that limit.
- --max-chars (or DOCS_TO_VOICE_MAX_CHARS) can override the sentence split limit; 0 disables chunking.
- --speech-rate (or DOCS_TO_VOICE_SPEECH_RATE) applies optional post-process speed adjustment and requires ffmpeg when value is not 1.
- API splitting uses model counting rules (for qwen3-tts, CJK chars count as 2 units).
Generate sentence-level timeline files.
- Write JSON timeline and SRT subtitle files next to audio output.
- In api mode, timeline start/end uses per-sentence audio durations whenever available.
Return completion details.
- Report absolute output audio path.

CLI reference

references/docs-to-voice.md — apltk docs-to-voice 工具的完整參數說明。在步驟 2 選擇 mode 前閱讀。

在執行產出前先閱讀 references/docs-to-voice.md 了解各 mode 的參數與環境變數設定方式。

Troubleshooting

say mode: confirm command -v say and command -v python3.
api mode: confirm command -v python3 and valid DASHSCOPE_API_KEY.
Long-text chunk merge (especially AIFF output): recommend command -v ffmpeg.
If output exists, use the overwrite or rename options shown in apltk docs-to-voice --help.

laitszkin/docs-to-voice

skills/docs-to-voice/SKILL.md

Convert text and document content into audio files and sentence-level subtitle timelines under project_dir/audio/{project_name}/. Supports both macOS say and Alibaba Cloud Model Studio API modes.

3 stars

development

Updated May 29, 2026

$ install --global

skillsauth

npx skillsauth add laitszkin/apollo-toolkit docs-to-voice

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 29, 2026, 7:39 AM251.7s5 files scanned

SKILL.md

name:: docs-to-voice
description:: Convert text and document content into audio files and sentence-level subtitle timelines under project_dir/audio/{project_name}/. Supports both macOS say and Alibaba Cloud Model Studio API modes.

Docs to Voice

Standards

Evidence: Confirm project_dir, input source, mode, and environment-backed settings before generation.
Execution: Use apltk docs-to-voice to write audio plus matching timeline and subtitle files under project_dir/audio/{project_name}/.
Quality: Respect mode-specific options, sentence splitting rules, and post-process requirements such as ffmpeg for speed changes.
Output: Return the absolute output audio path together with the generated .timeline.json and .srt companions.

Workflow

Collect inputs.
- Require project_dir.
- Accept either raw text or one input text file.
- Set project_name; default to basename of project_dir.
Select mode.
- --mode say for local generation.
- --mode api for Model Studio API generation.
- If omitted, load DOCS_TO_VOICE_MODE from .env, then shell environment variables; fallback say.
Prepare output path.
- Build project_dir/audio/{project_name}/.
- Create directory if it does not exist.
Generate audio.
- say mode supports --voice, --rate, and punctuation-pause enhancement.
- api mode supports --api-endpoint, --api-model, --api-voice, and reads DASHSCOPE_API_KEY.
- api mode sends one request per sentence and concatenates all sentence audio into one final file.
- api mode auto discovers model max input length; only oversized sentences are split by that limit.
- --max-chars (or DOCS_TO_VOICE_MAX_CHARS) can override the sentence split limit; 0 disables chunking.
- --speech-rate (or DOCS_TO_VOICE_SPEECH_RATE) applies optional post-process speed adjustment and requires ffmpeg when value is not 1.
- API splitting uses model counting rules (for qwen3-tts, CJK chars count as 2 units).
Generate sentence-level timeline files.
- Write JSON timeline and SRT subtitle files next to audio output.
- In api mode, timeline start/end uses per-sentence audio durations whenever available.
Return completion details.
- Report absolute output audio path.

CLI reference

references/docs-to-voice.md — apltk docs-to-voice 工具的完整參數說明。在步驟 2 選擇 mode 前閱讀。

在執行產出前先閱讀 references/docs-to-voice.md 了解各 mode 的參數與環境變數設定方式。

Troubleshooting

say mode: confirm command -v say and command -v python3.
api mode: confirm command -v python3 and valid DASHSCOPE_API_KEY.
Long-text chunk merge (especially AIFF output): recommend command -v ffmpeg.
If output exists, use the overwrite or rename options shown in apltk docs-to-voice --help.

Related Skills

laitszkin/create-skill

development

VerifiedTrustedCommunity

Guides the agent through creating a new Agent Skill from scratch. Use when the user wants to build a skill, create a new skill, scaffold a skill directory, or author a SKILL.md. Do NOT use for optimising or rewriting existing skills — use 'optimise-skill' for that. Do NOT use for editing files that are already part of a skill. Do NOT use for creating non-skill content like documentation, scripts, or project files.

5SKILL.mdUpdated Jul 13, 2026

laitszkin/create-skill

laitszkin/create-skill

development

VerifiedTrustedCommunity

5SKILL.mdUpdated Jul 11, 2026

laitszkin/create-skill

laitszkin/review-pr

development

VerifiedTrustedCommunity

Review a pull request — interactive PR selection via `gh`, 4-dimension code review (hallucinated code, architecture, performance, test validity), then post severity-graded comments with fix suggestions on the PR. Not for spec-based review — use `review` instead.

5SKILL.mdUpdated Jun 11, 2026

laitszkin/version-release

tools

VerifiedTrustedCommunity

協助完成自動化版本發佈。同步文檔、更新版本號、推送 tag 並建立 GitHub Release。

5SKILL.mdUpdated May 29, 2026

laitszkin/version-release

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/laitszkin/apollo-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r apollo-toolkit/skills/docs-to-voice ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

laitszkin/apollo-toolkit

3 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT