0xfango/listenhub-2/SKILL.md
Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
npx skillsauth add openclaw/skills listenhubInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Four modes, one entry point:
Users don't need to remember APIs, modes, or parameters. Just say what you want. </purpose>
<instructions>The scripts are the ONLY interface. Period.
┌─────────────────────────────────────────────────────────┐
│ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │
│ ▲ │
│ │ │
│ This is the ONLY path. │
│ Direct API calls are FORBIDDEN. │
└─────────────────────────────────────────────────────────┘
MUST:
**/skills/listenhub/scripts/MUST NOT:
Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
Scripts are located at **/skills/listenhub/scripts/ relative to your working context.
Different AI clients use different dot-directories:
.claude/skills/listenhub/scripts/.cursor/, .windsurf/, etc.)Resolution: Use glob pattern **/skills/listenhub/scripts/*.sh to locate scripts reliably, or resolve from the SKILL.md file's own path.
The following are internal implementation details that AI cannot reliably know:
| Category | Examples | How to Obtain |
|----------|----------|---------------|
| API Base URL | api.marswave.ai/... | ✗ Cannot — internal to scripts |
| Endpoints | podcast/episodes, etc. | ✗ Cannot — internal to scripts |
| Speaker IDs | cozy-man-english, etc. | ✓ Call get-speakers.sh |
| Request schemas | JSON body structure | ✗ Cannot — internal to scripts |
| Response formats | Episode ID, status codes | ✓ Documented per script |
Rule: If information is not in this SKILL.md or retrievable via a script (like get-speakers.sh), assume you don't know it.
Hide complexity, reveal magic.
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link.
api.marswave.ai) for processing. Do not pass sensitive or confidential information as input.--source-url parameter accepts external URLs whose content is fetched and processed by the backend. Only use trusted URLs.raw.githubusercontent.com (read-only, no code execution). Set LISTENHUB_SKIP_VERSION_CHECK=1 to disable.API key stored in $LISTENHUB_API_KEY. Check on first use:
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
If setup needed, guide user:
lh_sk_... part)Image generation uses the same ListenHub API key stored in $LISTENHUB_API_KEY.
Image generation output path defaults to the user downloads directory, stored in $LISTENHUB_OUTPUT_DIR.
On first image generation, the script auto-guides configuration:
Security: Never expose full API keys in output.
Auto-detect mode from user input:
→ Podcast (1-2 speakers)
Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers.
Default mode: quick unless explicitly requested.
If speakers are not specified, call get-speakers.sh and select the first speakerId
matching the chosen language.
If reference materials are provided, pass them as --source-url or --source-text.
When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:
language from user input,mode=quick,get-speakers.sh matching the language,→ Explain (Explainer video)
→ TTS (Text-to-speech)
TTS defaults to FlowSpeech direct for single-pass text or URL narration.
Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry.
Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.
When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:
direct to avoid altering content.type=url, plain text uses type=text.get-speakers and pick the first speakerId matching language.scripts.Example guidance:
“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”
→ Image Generation
Reference Images via Image Hosts
When reference images are local files, upload to a known image host and use the direct image URL in --reference-images.
Recommended hosts: imgbb.com, sm.ms, postimages.org, imgur.com.
Direct image URLs should end with .jpg, .png, .webp, or .gif.
Default: If unclear, ask user which format they prefer.
Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
→ Got it! Preparing...
Mode: Two-person podcast
Topic: Latest developments in Manus AI
For URLs, identify type:
youtu.be/XXX → convert to https://www.youtube.com/watch?v=XXX→ Generation submitted
Estimated time:
• Podcast: 2-3 minutes
• Explain: 3-5 minutes
• TTS: 1-2 minutes
You can:
• Wait and ask "done yet?"
• Use check-status via scripts
• View outputs in product pages:
- Podcast: https://listenhub.ai/app/podcast
- Explain: https://listenhub.ai/app/explainer
- Text-to-Speech: https://listenhub.ai/app/text-to-speech
• Do other things, ask later
Internally remember Episode ID for status queries.
When user says "done yet?" / "ready?" / "check status":
Podcast result:
✓ Podcast generated!
"{title}"
Episode: https://listenhub.ai/app/episode/{episodeId}
Duration: ~{duration} minutes
Download audio: provide audioUrl or audioStreamUrl on request
One-stage podcast creation generates an online task. When status is success, the episode detail already includes scripts and audio URLs. Download uses the returned audioUrl or audioStreamUrl without a second create call. Two-stage creation is only for script review or manual edits before audio generation.
Explain result:
✓ Explainer video generated!
"{title}"
Watch: https://listenhub.ai/app/explainer
Duration: ~{duration} minutes
Need to download audio? Just say so.
Image result:
✓ Image generated!
~/Downloads/labnana-{timestamp}.jpg
Image results are file-only and not shown in the web UI.
Important: Prioritize web experience. Only provide download URLs when user explicitly requests.
Scripts are shell-based. Locate via **/skills/listenhub/scripts/.
Dependency: jq is required for request construction.
The AI must ensure curl and jq are installed before invoking scripts.
⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:
run_in_background: true in Bash toolInvocation pattern:
$SCRIPTS/script-name.sh [args]
Where $SCRIPTS = resolved path to **/skills/listenhub/scripts/
Default path. Use unless script review or manual editing is required.
$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"
Multiple --source-url and --source-text arguments are supported to combine several references in one request.
Advanced path. Use only when script review or edits are explicitly requested.
The entire value of two-stage generation is human review between stages. Skipping review reduces it to one-stage with extra latency — never do this.
Stage 1: Generate text content.
$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english
Review Gate (mandatory): After text generation completes, the agent MUST:
check-status.sh --wait to poll until completion. On exit code 2 (timeout or rate-limited), wait briefly and retry.~/Downloads/podcast-draft-<episode-id>.md — human-readable version assembled from the response fields (title, outline, sourceProcessResult.content, and the scripts array formatted as readable dialogue). This is for the user to review.~/Downloads/podcast-scripts-<episode-id>.json — the raw {"scripts": [...]} object extracted from the response, exactly in the format that create-podcast-audio.sh --scripts expects. This is the machine-readable source of truth for Stage 2.open command on macOS).create-podcast-audio.sh --episode <id> without --scripts (server uses original).--scripts.The agent MUST NOT proceed to Stage 2 automatically. This is a hard constraint, not a suggestion.
Stage 2: Generate audio from reviewed/approved text.
# User approved without changes:
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>"
# User provided edits:
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json
$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -
# scripts.json format:
# {
# "scripts": [
# {"content": "Script content here", "speakerId": "speaker-id"},
# ...
# ]
# }
$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language en
Guidance:
get-speakers.sh 获取可用列表。language 匹配的列表首个 speakerId 作为默认音色。Response structure (for AI parsing):
{
"code": 0,
"data": {
"items": [
{
"name": "Yuanye",
"speakerId": "cozy-man-english",
"gender": "male",
"language": "zh"
}
]
}
}
Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available speakerId values. NEVER hardcode or assume speakerIds.
$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"
$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-english
$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"
Supported sizes: 1K | 2K | 4K (default: 2K).
Supported aspect ratios: 16:9 | 1:1 | 9:16 | 2:3 | 3:2 | 3:4 | 4:3 | 21:9 (default: 16:9).
Reference images: comma-separated URLs, maximum 14.
# Single-shot query
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast
# Wait mode (recommended for automated polling)
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast --wait
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech --wait --timeout 60
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainer --wait --timeout 600
tts is accepted as an alias for flow-speech.
--wait mode handles polling internally with configurable limits.
Agents SHOULD use --wait instead of manual polling loops. On exit code 2, wait briefly and retry the command.
| Option | Default | Description |
|---|---|---|
| --wait | off | Enable polling mode |
| --max-polls | 30 | Maximum poll attempts |
| --timeout | 300 | Maximum total wait (seconds) |
| --interval | 10 | Base poll interval (seconds) |
Exit codes: 0 = completed, 1 = failed, 2 = timeout or rate-limited (still pending, safe to retry after a short wait).
Automatic Language Detection: Adapt output language based on user input and context.
Detection Rules:
Application:
Example:
User (Chinese): "生成一个关于 AI 的播客"
AI (Chinese): "→ 收到!准备双人播客..."
User (English): "Make a podcast about AI"
AI (English): "→ Got it! Preparing two-person podcast..."
Principle: Language is interface, not barrier. Adapt seamlessly to user's natural expression.
You are a dispatcher, not an implementer.
Your job is to:
Your job is NOT to:
ListenHub modes (passthrough):
get-speakers.sh first to list optionsLabnana mode (passthrough by default):
Default behavior: transparent forwarding. Pass the user's prompt directly to the script without modification.
When to offer optimization:
In this case, ask whether the user would like help enriching the prompt. Do not optimize without confirmation.
When to never modify:
If the user agrees to optimization, the following techniques are available as reference:
Style: "cyberpunk" → add "neon lights, futuristic, dystopian"; "ink painting" → add "Chinese ink painting, traditional art style"
Scene: time of day, lighting conditions, weather
Quality: "highly detailed", "8K quality", "cinematic composition"
Rules when optimizing:
→ Generation submitted, about 2-3 minutes
You can: • Wait and ask "done yet?" • Check listenhub.ai/app/library </response> </example>
<example name="explainer-request"> <user>Create an explainer video introducing Claude Code</user> <response> → Got it! Preparing explainer video... Topic: Claude Code introduction→ Generation submitted, explainer videos take 3-5 minutes
Includes: Script + narration + AI visuals </response> </example>
<example name="tts-request"> <user>Convert this article to speech https://blog.example.com/article</user> <response> → Got it! Parsing article...→ TTS submitted, about 1-2 minutes
Wait a moment, or ask "done yet?" to check </response> </example>
<example name="image-generation-short-prompt"> <user>Generate an image: cyberpunk city at night</user> <response> → Short prompt detected. Would you like help enriching it with style/lighting/composition details, or use it as-is? </response> </example> <example name="image-generation-detailed-prompt"> <user>Generate an image: "Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality"</user> <response> → Generating image...Prompt: Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality
Resolution: 2K (16:9)
✓ Image generated! ~/Downloads/labnana-20260121-143145.jpg </response> </example>
<example name="image-with-reference"> <user>Generate an image in this style: https://example.com/style-ref.jpg, prompt: "a futuristic car"</user> <response> → Generating image with reference...Prompt: a futuristic car Reference images: 1 Reference image URL: https://example.com/style-ref.jpg Resolution: 2K (16:9)
✓ Image generated! ~/Downloads/labnana-20260122-154230.jpg </response> </example>
<example name="status-check"> <user>Done yet?</user> <response> ✓ Podcast generated!"AI Revolution: From GPT to AGI"
Listen: https://listenhub.ai/app/podcast
Duration: ~8 minutes
Need to download? Just say so. </response> </example>
</examples>tools
Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.
development
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
development
SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)
data-ai
去除中文文本中的 AI 写作痕迹,使其读起来自然。基于维基百科 AI 写作特征指南,检测 24 种 AI 模式。触发词:humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。