.claude/skills/video-editor/SKILL.md
Edit raw screen recording videos with synced audio. Removes repeated takes, false starts, silence, filler, and [MUSIC] tags to produce a clean final cut. Use when the user provides a raw/unedited MP4 video file and wants it edited down. Triggers on: "edit this video", "cut this video", providing a raw MP4 for editing, or any request to remove bad takes from a recording.
npx skillsauth add theramjad/ray-os video-editorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Edit raw screen recordings by transcribing, reviewing the transcript as an LLM editor, and applying cuts with ffmpeg.
MP4 → AssemblyAI (transcribe) → Claude (editorial decisions) → refine_edl.py (VAD) → ffmpeg (cut + join) → edited MP4
python3 <skill_dir>/scripts/transcribe_assemblyai.py <input.mp4> --api-key <key>
Requires ASSEMBLYAI_API_KEY env var or --api-key flag.
Uses AssemblyAI universal-3-pro model with English language.
Word-level timestamps are returned by default.
Outputs <basename>_transcript.json.
Alternative providers available (transcribe_deepgram.py, transcribe_azure.py) but
AssemblyAI has the best timestamp accuracy (74ms avg vs 135ms Deepgram, 106ms Azure)
and lowest start bias (+39ms vs +58ms Deepgram, +144ms Azure). Tested against
human-corrected ground truth on a 24min recording with 71 segment boundaries.
python3 <skill_dir>/scripts/format_transcript.py <transcript.json> -o <formatted.txt>
Outputs timestamped lines like:
[00:08.16 - 00:12.34] Okay, so we have a lot of Cloud Code updates to go over—
[00:12.50 - 00:18.90] Okay, so we have a whole bunch of Cloud Code updates to go...
Read the formatted transcript top to bottom. A ~24min raw recording produces ~300 lines, easily reviewable in one pass. Apply these editing rules:
Last take wins. When the speaker repeats the same line or phrase multiple times (retakes), keep only the LAST complete take. Earlier attempts are always cut. Retakes are easy to spot: consecutive lines starting with the same few words (e.g., 8 lines all starting "First of all, one of the biggest..."). Always pick the last one that completes the full thought.
Cut false starts. Any sentence that trails off mid-thought ending with "—" or just stops and restarts is cut. Keep only the completed version. A line ending with "—" is almost always a false start — look for the completed version in the lines that follow. Exception: if a false start contains a useful phrase that the clean retake omits (e.g., "model system card"), extract that phrase as a micro-segment (even 1-2s) and place it before the clean retake.
Remove all non-speech. Every [MUSIC], [NOISE], [APPLAUSE], or similar tag is removed entirely.
Remove dead air. Gaps and silence between takes are cut. The final video should flow continuously from one kept segment to the next. Also cut silence WITHIN segments — if a kept segment has a 500ms+ internal pause (breath, hesitation, dead air), split it into two segments with the silence removed. Review every segment >5s and look for gaps in the transcript timestamps. Even short segments can contain splittable pauses.
Trim filler and stumbles. Remove standalone filler ("uh", "um", "like", "you know") when they appear as hesitations. Also surgically remove mid-sentence stumbles by ending a segment before the stumble and starting a new segment after the correction. Don't keep stumbles just because splitting feels awkward — the cut always works better than the stumble in the final video.
Preserve the speaker's intent. The final edit should read as if the speaker said everything correctly on the first try. Maintain the logical flow and ordering of topics.
When in doubt, keep the later version. If two takes are roughly equal quality, prefer the later one.
Merge adjacent segments. When two kept segments are very close together (gap < 300ms), merge them into one segment rather than creating a cut. This avoids micro-glitches.
Produce an EDL (edit decision list) as JSON:
{
"segments": [
{"start_ms": 68100, "end_ms": 82700},
{"start_ms": 121300, "end_ms": 160400}
]
}
Each segment is a portion of the ORIGINAL video to keep. Segments must be in chronological order and non-overlapping.
Padding: Add ~200ms padding before each segment start (to catch speech onset that
precedes the transcript timestamp). Do NOT over-pad — 400ms was found to be too much,
causing the user to trim most starts back. Add ~200ms after each segment end.
AssemblyAI start timestamps mark when a word becomes recognizable (~200-300ms after
the speaker begins the sound), so the start padding compensates for this.
Write the EDL to <basename>_edl.json.
python3 <skill_dir>/scripts/refine_edl.py <input.mp4> <edl.json> [-o <refined_edl.json>]
Uses ffmpeg silencedetect to refine the EDL — no extra Python dependencies.
Boundary verification: Extends segment starts/ends outward (up to 300ms) if speech is detected at the boundary. Prevents the most common issue: clipped word onsets.
Internal pause splitting: Finds silence gaps ≥500ms inside segments and splits them. Catches ~70% of the splits the user would make manually. Segments <500ms created by splitting are dropped (noise blips between adjacent silences).
Options:
--noise-db -35 — silence threshold (lower = more sensitive, default -35)--min-pause-ms 500 — minimum internal silence to split at--extend-ms 300 — max boundary extension--min-segment-ms 500 — drop segments shorter than this after splitting--dry-run — show changes without writingpython3 <skill_dir>/scripts/apply_edits.py <input.mp4> <edl.json> -o <output_edited.mp4>
Cuts the original MP4 per the EDL and concatenates into the final edit.
python3 <skill_dir>/scripts/generate_edl.py <edl.json> <input.mp4> -o <output.edl>
Generates a CMX 3600 .edl file importable by Premiere Pro, DaVinci Resolve, and any NLE.
Auto-detects frame rate from the source MP4 via ffprobe.
The user can import via File > Import in Premiere Pro, match frame rate, then link the source media.
All cuts appear as events on a single timeline — ready for final review and polish.
If the formatted transcript is too long for a single pass (>2000 lines), process it in sections:
In practice: a 24min raw recording = ~3400 words = ~300 formatted lines (one pass). A 2-hour recording would be ~2500 lines (may need 4-5 passes).
After generating the EDL, run a sanity check:
Use VAD (voice activity detection) to verify segment boundaries. Transcript timestamps alone can clip speech onset/offset. After determining segment boundaries from the transcript, use VAD on the source audio to confirm that no speech is being clipped at the start or end of each segment. If VAD detects speech energy at a segment boundary, extend the segment to include it. This prevents the most common user correction: extending starts/ends where speech was clipped.
See BENCHMARK.md for the full 4-way comparison (AssemblyAI, Deepgram, Azure, Google) tested against 71 hand-corrected segments.
After generating the .edl file with generate_edl.py:
.edl fileThe EDL also works in DaVinci Resolve (File > Import Timeline > EDL) and most other NLEs.
tools
Monitor Twitter/X for trending AI and developer tool discussions, score them, and deliver pre-digested content briefs. Use this skill whenever the user wants to find trending topics on Twitter, check what people are talking about in AI/tech, scout for video ideas from Twitter, or says things like "what's trending", "anything new on Twitter", "find me something to talk about", "scout Twitter", or "what are people saying about X". Also triggers on "monitor layer", "input layer", or "fountainhead" when Twitter is relevant.
development
# /triage — Inbox Triage ## Description Scan all connected communication channels, prioritize items by urgency, and draft responses in your voice. Clear your inbox in minutes. ## Arguments - `quick` — Tier 1 items only, no drafts (fastest) - `digest` — Full scan with summaries, drafts for Tier 1-2 - (no argument) — Full scan with drafts for everything actionable ## Instructions You are running inbox triage for {{YOUR_NAME}}. The goal is to process all incoming messages quickly and surface wh
tools
YouTube data utility — fetch transcripts, metadata, thumbnails, and search for videos. Wraps the Supadata API and yt-dlp into a single CLI tool. Use whenever you need YouTube data like transcripts, video metadata, thumbnail images, or need to search YouTube for videos matching a query. Triggers on any YouTube data fetching need from other skills or direct user requests.
tools
Show what to work on next by reading the Dashboard and following links to project boards and corrections. Use this skill whenever the user asks "what should I work on", "what's next", "show me my queue", "what's in progress", or wants a quick view of active tasks across all projects. Also triggers on /next-up.