.cursor/.agents/skills/video-editing/SKILL.md
AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.
npx skillsauth add LUAgam/stage-harness video-editingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.
AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.
Screen Studio / raw footage
→ Claude / Codex
→ FFmpeg
→ Remotion
→ ElevenLabs / fal.ai
→ Descript or CapCut
Each layer has a specific job. Do not skip layers. Do not try to make one tool do everything.
Collect the source material:
videodb skill)Output: raw files ready for organization.
Use Claude Code or Codex to:
Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."
This layer is about structure, not final creative taste.
FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4
#!/bin/bash
# cuts.txt: start,end,label
while IFS=, read -r start end label; do
ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt
# Create file list
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4
Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";
export const VlogComposition: React.FC = () => {
const frame = useCurrentFrame();
return (
<AbsoluteFill>
{/* Main footage */}
<Sequence from={0} durationInFrames={300}>
<Video src="/segments/intro.mp4" />
</Sequence>
{/* Title overlay */}
<Sequence from={30} durationInFrames={90}>
<AbsoluteFill style={{
justifyContent: "center",
alignItems: "center",
}}>
<h1 style={{
fontSize: 72,
color: "white",
textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
}}>
The AI Editing Stack
</h1>
</AbsoluteFill>
</Sequence>
{/* Next segment */}
<Sequence from={300} durationInFrames={450}>
<Video src="/segments/demo.mp4" />
</Sequence>
</AbsoluteFill>
);
};
npx remotion render src/index.ts VlogComposition output.mp4
See the Remotion docs for detailed patterns and API reference.
Generate only what you need. Do not generate the whole video.
import os
import requests
resp = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your narration text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("voiceover.mp3", "wb") as f:
f.write(resp.content)
Use the fal-ai-media skill for:
Use for insert shots, thumbnails, or b-roll that doesn't exist:
generate(model_name: "fal-ai/nano-banana-pro", input: {
"prompt": "professional thumbnail for tech vlog, dark background, code on screen",
"image_size": "landscape_16_9"
})
If VideoDB is configured:
voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")
The last layer is human. Use a traditional editor for:
This is where taste lives. AI clears the repetitive work. You make the final calls.
Different platforms need different aspect ratios:
| Platform | Aspect Ratio | Resolution | |----------|-------------|------------| | YouTube | 16:9 | 1920x1080 | | TikTok / Reels | 9:16 | 1080x1920 | | Instagram Feed | 1:1 | 1080x1080 | | X / Twitter | 16:9 or 1:1 | 1280x720 or 720x720 |
# 16:9 to 9:16 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
# 16:9 to 1:1 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
# Smart reframe (AI-guided subject tracking)
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
# Detect scene changes (threshold 0.3 = moderate sensitivity)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
# Find silent segments (useful for cutting dead air)
ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
Use Claude to analyze transcript + scene timestamps:
"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."
| Tool | Strength | Weakness | |------|----------|----------| | Claude / Codex | Organization, planning, code generation | Not the creative taste layer | | FFmpeg | Deterministic cuts, batch processing, format conversion | No visual editing UI | | Remotion | Programmable overlays, composable scenes, reusable templates | Learning curve for non-devs | | Screen Studio | Polished screen recordings immediately | Only screen capture | | ElevenLabs | Voice, narration, music, SFX | Not the center of the workflow | | Descript / CapCut | Final pacing, captions, polish | Manual, not automatable |
fal-ai-media — AI image, video, and audio generationvideodb — Server-side video processing, indexing, and streamingcontent-engine — Platform-native content distributiondevelopment
在 generate-test-cases 阶段之后执行,逐个验证测试用例并在失败时修复项目代码、重新编译部署、再次验证, 直到通过或达到最大修复次数。覆盖 UI / API / API+UI / 性能测试四个维度,UI 测试通过浏览器真实模拟用户操作并截图, API 测试根据项目代码生成可执行的接口脚本,性能测试调用现有性能/质量技能全量执行。 涉及真实用户登录信息(如手机号+验证码、账号密码、JWT)时必须中断要求用户提供,禁止编造无效凭证。 所有 case 状态变更必须通过 e2e-case-tracker.sh 脚本持久化,确保中途崩溃可恢复、无 case 遗漏。
development
# SKILL: e2e > **核心原则**: > 1. 测试范围跟着本次变动走。后端接口改了,对应的前端流程必须做联调验证;与本次需求无关的功能不测。对于涉及算法、转换准确率等质量敏感型需求,需额外生成专项质量测试。 > 2. **覆盖完整性优先于执行便利性**。不得以"链路复杂"、"需要外部依赖"为由跳过本次变动相关的用例;凡是受变动影响的接口和 UI 流程,都必须生成真实调用/操作用例。 > 3. **UI 测试必须模拟真实用户操作**(定位元素、点击、键入、等待渲染、断言可见文本/状态)。**禁止**将 UI 套件退化为浏览器上下文里的 `page.evaluate(fetch(...))` API 验证——那只是把 API 测试换了执行环境,没有额外价值,不算 UI 测试。 > 4. **通用性**:本 skill 不假设具体业务域,所有规则均以抽象变动面(文件、接口、页面、用户动作)为单位组织,不针对任何特定项目的数据库/领域词汇。 > 5. **E2E 套件必须验证运行时行为**。严禁把"读取源码/配置文件并做字符串/结构匹配"的检查封装成独立 E2E 套件——这类检
tools
# SKILL: deploy ## CLI Bootstrap 在执行任何 `harnessctl` 命令前,先解析本地 CLI 路径: ```bash if [ -z "${HARNESSCTL:-}" ]; then candidates=( "./stage-harness/scripts/harnessctl" "../stage-harness/scripts/harnessctl" "$(git rev-parse --show-toplevel 2>/dev/null)/stage-harness/scripts/harnessctl" ) for candidate in "${candidates[@]}"; do if [ -n "$candidate" ] && [ -x "$candidate" ]; then HARNESSCTL="$candidate" break fi done fi test -n "${HARNESSCTL:-}" && test -x "$H
tools
# SKILL: build ## CLI Bootstrap 在执行任何 `harnessctl` 命令前,先解析本地 CLI 路径: ```bash if [ -z "${HARNESSCTL:-}" ]; then candidates=( "./stage-harness/scripts/harnessctl" "../stage-harness/scripts/harnessctl" "$(git rev-parse --show-toplevel 2>/dev/null)/stage-harness/scripts/harnessctl" ) for candidate in "${candidates[@]}"; do if [ -n "$candidate" ] && [ -x "$candidate" ]; then HARNESSCTL="$candidate" break fi done fi test -n "${HARNESSCTL:-}" && test -x "$HA