skills/ltx-video/SKILL.md
Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.
npx skillsauth add pr-e/openclaw-master-skills ltx-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Base URL: https://api.ltx.video/v1
Auth: Authorization: Bearer <API_KEY>
Response: MP4 binary (direct download, no polling)
| Endpoint | Input | Use |
|----------|-------|-----|
| /v1/text-to-video | prompt | Generate video from text |
| /v1/image-to-video | image_uri + prompt | Animate a still image |
| /v1/audio-to-video | audio_uri + image_uri + prompt | Lip-sync video from audio + image |
| /v1/extend | video_uri + prompt | Extend a video at start or end |
| /v1/retake | video_uri + time range | Regenerate a section of a video |
| Model | Speed | Quality |
|-------|-------|---------|
| ltx-2-3-fast | ~17s | Good (use for tests) |
| ltx-2-3-pro | ~30-60s | Best (use for final) |
1920x1080 (landscape 16:9)1080x1920 (portrait 9:16 — native vertical, trained on vertical data)1440x1080, 4096x2160 (text-to-video only)audio-to-video only supports: 1920x1080 or 1080x1920
curl -X POST "https://api.ltx.video/v1/text-to-video" \
-H "Authorization: Bearer $LTX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
"model": "ltx-2-3-pro",
"duration": 8,
"resolution": "1920x1080"
}' -o output.mp4
curl -X POST "https://api.ltx.video/v1/audio-to-video" \
-H "Authorization: Bearer $LTX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audio_uri": "https://example.com/voice.mp3",
"image_uri": "https://example.com/portrait.jpg",
"prompt": "A man speaks directly to camera...",
"model": "ltx-2-3-pro",
"resolution": "1920x1080"
}' -o output.mp4
import requests
def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
model="ltx-2-3-pro", resolution="1920x1080",
output_path="output.mp4"):
r = requests.post(
"https://api.ltx.video/v1/audio-to-video",
headers={"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"},
json={"audio_uri": audio_url, "image_uri": image_url,
"prompt": prompt, "model": model, "resolution": resolution},
timeout=300, stream=True
)
if r.status_code != 200:
raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
with open(output_path, "wb") as f:
for chunk in r.iter_content(8192): f.write(chunk)
return output_path
application/octet-stream)curl -F "files[][email protected]" https://uguu.se/uploadaudio/mpeg ✅# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")
# Upload image
IMAGE_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")
# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg
/v1/extend to chain themLTX-2.3 has a much stronger text connector. Specificity wins.
❌ "A dramatic portrait of a man standing"
✅ "A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."
"His voice is clear and warm. Restaurant ambient sound softly in the background.""Individual hair strands visible in the backlight" → now renders correctlyresolution: "1080x1920" → trained on vertical data"He picks up the banana, raises it to his ear, and smirks"A [description of person] sits/stands [location]. He/she speaks directly
to camera, lips moving in perfect sync with his/her voice. [Gesture details].
Head stays level and gaze remains locked on camera throughout.
[Environment description softly blurred in background].
[Lighting]. [Camera: holds steady at eye level, front-on].
Custom nodes for ComfyUI (no manual API calls):
cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node
Nodes: LTX Text to Video, LTX Image to Video, LTX Extend Video
Category: LTX Video
Paul's key: stored in ~/clawd/.env as LTX_API_KEY
ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN
development
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
devops
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
content-media
# youtube-auto-captions - YouTube 自动字幕 ## 描述 自动为 YouTube 视频生成字幕,支持多语言翻译、时间轴校准。提升视频可访问性和 SEO。 ## 定价 - **按次收费**: ¥9/次 - 每视频最长 60 分钟 - 支持 50+ 语言 ## 用法 ```bash # 生成字幕 /youtube-auto-captions --video <video_id> --lang zh # 翻译字幕 /youtube-auto-captions --video <video_id> --translate en,ja,ko # 批量处理 /youtube-auto-captions --playlist <playlist_id> --lang zh # 导出字幕 /youtube-auto-captions --video <video_id> --export srt ``` ## 技能目录 `~/.openclaw/workspace/skills/youtube-auto-captions/` ## 作者 张 sir #
development
YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).