skills/separate-audio/SKILL.md
Text-guided audio source separation using SAM-Audio via mlx-audio
npx skillsauth add nuva-lab/vibecut separate-audioInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Isolate specific sounds from audio using natural language text prompts. Uses Meta's SAM-Audio model via mlx-audio for native Mac M2/M3 inference.
# Extract speaker by description
python skills/separate-audio/separate.py panel.wav --prompt "man speaking" --output speaker.wav
# Extract with time hint
python skills/separate-audio/separate.py video.mp4 --prompt "applause" --span 10.5-12.0
# Save both target and residual
python skills/separate-audio/separate.py audio.wav --prompt "woman singing" --save-residual
| Use Case | Prompt Example | |----------|----------------| | Extract single speaker | "man speaking about investments" | | Remove background music | Separate, keep residual | | Isolate applause | "audience applause" | | Clean panel discussion | Run multiple times with different prompts |
from separate import separate_audio
result = separate_audio(
audio_path="panel.wav",
prompt="man speaking about space",
output_path="speaker.wav",
span=(10.5, 12.0), # Optional time hint
)
print(result["target_path"])
pip install mlx-audioThis skill is implemented but not extensively tested in the main video pipeline. The primary audio workflow uses Qwen3-ForcedAligner for caption alignment. SAM-Audio is available for advanced use cases like:
tools
Generate voiceover scripts in Joyce's style for video clips
tools
Clone a voice using qwen3-tts and generate speech from text
development
# Validate Media Skill Pre-flight media validation and diagnostics using ffprobe. ## Purpose Check video/audio files for common issues before rendering: - Duration mismatches between video and audio tracks - Missing audio tracks - Codec compatibility - Volume levels - Potential freeze points ## Usage ```bash python skills/validate-media/validate.py <video_file> [--verbose] ``` ## Output JSON report with issues and recommendations: ```json { "file": "video.mp4", "video_duration": 35.1
tools
Transcribe a video clip using Gemini to get timestamped segments for captions