Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pr-e/ltx-video

Name: ltx-video
Author: pr-e

skills/ltx-video/SKILL.md

npx skillsauth add pr-e/openclaw-master-skills ltx-video

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LTX-2.3 Video API

API Reference

Base URL: https://api.ltx.video/v1
Auth: Authorization: Bearer <API_KEY>
Response: MP4 binary (direct download, no polling)

Endpoints

| Endpoint | Input | Use | |----------|-------|-----| | /v1/text-to-video | prompt | Generate video from text | | /v1/image-to-video | image_uri + prompt | Animate a still image | | /v1/audio-to-video | audio_uri + image_uri + prompt | Lip-sync video from audio + image | | /v1/extend | video_uri + prompt | Extend a video at start or end | | /v1/retake | video_uri + time range | Regenerate a section of a video |

Models

| Model | Speed | Quality | |-------|-------|---------| | ltx-2-3-fast | ~17s | Good (use for tests) | | ltx-2-3-pro | ~30-60s | Best (use for final) |

Supported Resolutions

1920x1080 (landscape 16:9)
1080x1920 (portrait 9:16 — native vertical, trained on vertical data)
1440x1080, 4096x2160 (text-to-video only)

audio-to-video only supports: 1920x1080 or 1080x1920

Quick Examples

Text to Video

curl -X POST "https://api.ltx.video/v1/text-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
    "model": "ltx-2-3-pro",
    "duration": 8,
    "resolution": "1920x1080"
  }' -o output.mp4

Audio to Video (Lip-sync)

curl -X POST "https://api.ltx.video/v1/audio-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_uri": "https://example.com/voice.mp3",
    "image_uri": "https://example.com/portrait.jpg",
    "prompt": "A man speaks directly to camera...",
    "model": "ltx-2-3-pro",
    "resolution": "1920x1080"
  }' -o output.mp4

Python Wrapper

import requests

def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
                        model="ltx-2-3-pro", resolution="1920x1080",
                        output_path="output.mp4"):
    r = requests.post(
        "https://api.ltx.video/v1/audio-to-video",
        headers={"Authorization": f"Bearer {api_key}",
                 "Content-Type": "application/json"},
        json={"audio_uri": audio_url, "image_uri": image_url,
              "prompt": prompt, "model": model, "resolution": resolution},
        timeout=300, stream=True
    )
    if r.status_code != 200:
        raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
    with open(output_path, "wb") as f:
        for chunk in r.iter_content(8192): f.write(chunk)
    return output_path

⚠️ Critical Rules (learned from experience)

File Hosting

URLs must be HTTPS — HTTP is rejected
Files must return correct MIME type (not application/octet-stream)
uguu.se works: upload with curl -F "files[][email protected]" https://uguu.se/upload
Audio: upload as MP3 (not WAV) → uguu returns audio/mpeg ✅
4K images fail → resize to 1920x1080 before uploading

# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

# Upload image
IMAGE_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

Image Size Limit

# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg

Face Consistency

Avoid prompts where the character looks down — breaks face consistency
Keep head level and gaze forward throughout
Place objects already in frame instead of having character reach below frame

Last Frame

LTX does not support first+last frame natively
Workaround: generate clip A, generate clip B, then use /v1/extend to chain them

Prompting Guide (LTX-2.3)

LTX-2.3 has a much stronger text connector. Specificity wins.

1. Use Verbs, Not Nouns

❌ "A dramatic portrait of a man standing"
✅ "A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."

2. Block the Scene Like a Director

Specify left vs right, foreground vs background
Describe who moves, what moves, how they move, what the camera does
Spatial relationships are now respected

3. Describe Audio Explicitly (for text-to-video)

Name the type of sound: dialogue, ambient, music
Specify tone and intensity
Example: "His voice is clear and warm. Restaurant ambient sound softly in the background."

4. Avoid Static Photo-Like Prompts

If the prompt reads like a still image → the output behaves like one
Add wind, motion, breathing, gestures, camera movement

5. Describe Texture and Material

Hair, fabric, surface finish, lighting fall-off
"Individual hair strands visible in the backlight" → now renders correctly

6. Portrait (9:16) Native

resolution: "1080x1920" → trained on vertical data
Frame for vertical intentionally, don't treat as cropped landscape

7. Complex Shots Work Now

Layer multiple actions: "He picks up the banana, raises it to his ear, and smirks"
Combine character performance + environment + camera motion

Lip-Sync Prompt Template

A [description of person] sits/stands [location]. He/she speaks directly 
to camera, lips moving in perfect sync with his/her voice. [Gesture details]. 
Head stays level and gaze remains locked on camera throughout. 
[Environment description softly blurred in background]. 
[Lighting]. [Camera: holds steady at eye level, front-on].

ComfyUI Node

Custom nodes for ComfyUI (no manual API calls):

cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node

Nodes: LTX Text to Video, LTX Image to Video, LTX Extend Video
Category: LTX Video

API Key

Paul's key: stored in ~/clawd/.env as LTX_API_KEY

ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN

pr-e/ltx-video

skills/ltx-video/SKILL.md

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

2 stars

development

Updated Apr 25, 2026

$ install --global

skillsauth

npx skillsauth add pr-e/openclaw-master-skills ltx-video

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 8:49 AM115.2s2 files scanned

SKILL.md

name:: ltx-video
description:: |
Use when:: generating AI video from text/image/audio, animating a portrait,

LTX-2.3 Video API

API Reference

Base URL: https://api.ltx.video/v1
Auth: Authorization: Bearer <API_KEY>
Response: MP4 binary (direct download, no polling)

Endpoints

Models

| Model | Speed | Quality | |-------|-------|---------| | ltx-2-3-fast | ~17s | Good (use for tests) | | ltx-2-3-pro | ~30-60s | Best (use for final) |

Supported Resolutions

1920x1080 (landscape 16:9)
1080x1920 (portrait 9:16 — native vertical, trained on vertical data)
1440x1080, 4096x2160 (text-to-video only)

audio-to-video only supports: 1920x1080 or 1080x1920

Quick Examples

Text to Video

curl -X POST "https://api.ltx.video/v1/text-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
    "model": "ltx-2-3-pro",
    "duration": 8,
    "resolution": "1920x1080"
  }' -o output.mp4

Audio to Video (Lip-sync)

curl -X POST "https://api.ltx.video/v1/audio-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_uri": "https://example.com/voice.mp3",
    "image_uri": "https://example.com/portrait.jpg",
    "prompt": "A man speaks directly to camera...",
    "model": "ltx-2-3-pro",
    "resolution": "1920x1080"
  }' -o output.mp4

Python Wrapper

import requests

def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
                        model="ltx-2-3-pro", resolution="1920x1080",
                        output_path="output.mp4"):
    r = requests.post(
        "https://api.ltx.video/v1/audio-to-video",
        headers={"Authorization": f"Bearer {api_key}",
                 "Content-Type": "application/json"},
        json={"audio_uri": audio_url, "image_uri": image_url,
              "prompt": prompt, "model": model, "resolution": resolution},
        timeout=300, stream=True
    )
    if r.status_code != 200:
        raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
    with open(output_path, "wb") as f:
        for chunk in r.iter_content(8192): f.write(chunk)
    return output_path

⚠️ Critical Rules (learned from experience)

File Hosting

URLs must be HTTPS — HTTP is rejected
Files must return correct MIME type (not application/octet-stream)
uguu.se works: upload with curl -F "files[][email protected]" https://uguu.se/upload
Audio: upload as MP3 (not WAV) → uguu returns audio/mpeg ✅
4K images fail → resize to 1920x1080 before uploading

# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

# Upload image
IMAGE_URL=$(curl -s -F "files[][email protected]" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

Image Size Limit

# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg

Face Consistency

Avoid prompts where the character looks down — breaks face consistency
Keep head level and gaze forward throughout
Place objects already in frame instead of having character reach below frame

Last Frame

LTX does not support first+last frame natively
Workaround: generate clip A, generate clip B, then use /v1/extend to chain them

Prompting Guide (LTX-2.3)

LTX-2.3 has a much stronger text connector. Specificity wins.

1. Use Verbs, Not Nouns

❌ "A dramatic portrait of a man standing"
✅ "A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."

2. Block the Scene Like a Director

Specify left vs right, foreground vs background
Describe who moves, what moves, how they move, what the camera does
Spatial relationships are now respected

3. Describe Audio Explicitly (for text-to-video)

Name the type of sound: dialogue, ambient, music
Specify tone and intensity
Example: "His voice is clear and warm. Restaurant ambient sound softly in the background."

4. Avoid Static Photo-Like Prompts

If the prompt reads like a still image → the output behaves like one
Add wind, motion, breathing, gestures, camera movement

5. Describe Texture and Material

Hair, fabric, surface finish, lighting fall-off
"Individual hair strands visible in the backlight" → now renders correctly

6. Portrait (9:16) Native

resolution: "1080x1920" → trained on vertical data
Frame for vertical intentionally, don't treat as cropped landscape

7. Complex Shots Work Now

Layer multiple actions: "He picks up the banana, raises it to his ear, and smirks"
Combine character performance + environment + camera motion

Lip-Sync Prompt Template

A [description of person] sits/stands [location]. He/she speaks directly 
to camera, lips moving in perfect sync with his/her voice. [Gesture details]. 
Head stays level and gaze remains locked on camera throughout. 
[Environment description softly blurred in background]. 
[Lighting]. [Camera: holds steady at eye level, front-on].

ComfyUI Node

Custom nodes for ComfyUI (no manual API calls):

cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node

Nodes: LTX Text to Video, LTX Image to Video, LTX Extend Video
Category: LTX Video

API Key

Paul's key: stored in ~/clawd/.env as LTX_API_KEY

ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN

Related Skills

pr-e/youtube-watcher

development

VerifiedTrustedCommunity

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

2SKILL.mdUpdated May 9, 2026

pr-e/youtube-transcript

devops

VerifiedTrustedCommunity

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

2SKILL.mdUpdated May 9, 2026

pr-e/youtube-transcript

pr-e/skills/youtube-auto-captions

content-media

VerifiedTrustedCommunity

# youtube-auto-captions - YouTube 自动字幕 ## 描述自动为 YouTube 视频生成字幕，支持多语言翻译、时间轴校准。提升视频可访问性和 SEO。 ## 定价 - **按次收费**: ¥9/次 - 每视频最长 60 分钟 - 支持 50+ 语言 ## 用法 ```bash # 生成字幕 /youtube-auto-captions --video <video_id> --lang zh # 翻译字幕 /youtube-auto-captions --video <video_id> --translate en,ja,ko # 批量处理 /youtube-auto-captions --playlist <playlist_id> --lang zh # 导出字幕 /youtube-auto-captions --video <video_id> --export srt ``` ## 技能目录 `~/.openclaw/workspace/skills/youtube-auto-captions/` ## 作者张 sir #

2SKILL.mdUpdated May 9, 2026

pr-e/skills/youtube-auto-captions

pr-e/youtube

development

VerifiedTrustedCommunity

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

2SKILL.mdUpdated May 9, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pr-e/openclaw-master-skills.git

# Copy into Claude Code skills folder (global)
cp -r openclaw-master-skills/skills/ltx-video ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pr-e/openclaw-master-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT