plugins/ffmpeg-core/skills/ffmpeg-audio-processing/SKILL.md
Complete audio encoding and normalization system. PROACTIVELY activate for: (1) Audio codec selection (AAC, MP3, Opus, FLAC), (2) Loudness normalization (EBU R128, loudnorm), (3) Audio extraction from video, (4) Format conversion, (5) Volume adjustment and dynamics, (6) Noise reduction and EQ, (7) Channel operations (stereo/mono/surround), (8) Sample rate and bit depth conversion, (9) Audio fade in/out and crossfades, (10) Podcast and broadcast processing chains. Provides: Codec comparison tables, loudness standards reference, two-pass normalization scripts, professional mastering chains. Ensures: Broadcast-compliant audio with proper loudness and quality.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace ffmpeg-audio-processingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
| Task | Command |
|------|---------|
| Extract audio | ffmpeg -i video.mp4 -vn -c:a copy audio.m4a |
| Convert to MP3 | ffmpeg -i input.flac -c:a libmp3lame -q:a 2 output.mp3 |
| Normalize (EBU R128) | -af loudnorm=I=-23:LRA=7:TP=-2 |
| Podcast standard | -af loudnorm=I=-16:TP=-1.5 |
| Adjust volume | -af "volume=1.5" or -af "volume=6dB" |
| Mono to stereo | -ac 2 |
| Codec | Recommended Bitrate | Use Case | |-------|---------------------|----------| | AAC | 128-192k (music), 64k (speech) | Streaming, mobile | | MP3 | 192-320k (music), 128k (speech) | Universal compatibility | | Opus | 96-128k (music), 48k (speech) | WebM, VoIP, modern |
Use for audio-focused operations:
Complete guide to audio encoding, normalization, and professional audio workflows with FFmpeg.
| Codec | Encoder | Bitrate Range | Quality | Compatibility | Use Case | |-------|---------|---------------|---------|---------------|----------| | AAC | aac, libfdk_aac | 64-320 kbps | Excellent | Universal | Streaming, mobile | | MP3 | libmp3lame | 96-320 kbps | Good | Universal | Legacy, podcasts | | Opus | libopus | 32-256 kbps | Best | Modern | VoIP, WebM | | FLAC | flac | ~900 kbps | Lossless | Wide | Archival | | ALAC | alac | ~900 kbps | Lossless | Apple | Apple ecosystem | | Vorbis | libvorbis | 64-500 kbps | Very Good | Wide | WebM, games | | AC3 | ac3 | 192-640 kbps | Good | Universal | DVD, Blu-ray | | EAC3 | eac3 | 192-768 kbps | Very Good | Wide | Streaming | | xHE-AAC | - (decode only) | 12-64 kbps | Excellent | Emerging | Ultra-low bitrate |
| Use Case | AAC | MP3 | Opus | |----------|-----|-----|------| | Podcast/Speech | 64-96k | 96-128k | 48-64k | | Music (Standard) | 128-192k | 192-256k | 96-128k | | Music (High Quality) | 256-320k | 320k | 160-256k | | Transparent Quality | 256k+ | 320k | 192k+ |
# Extract to original format (no re-encode)
ffmpeg -i video.mp4 -vn -c:a copy audio.m4a
# Extract to MP3
ffmpeg -i video.mp4 -vn -c:a libmp3lame -b:a 320k audio.mp3
# Extract to AAC
ffmpeg -i video.mp4 -vn -c:a aac -b:a 256k audio.m4a
# Extract to FLAC (lossless)
ffmpeg -i video.mp4 -vn -c:a flac audio.flac
# Extract to WAV (uncompressed)
ffmpeg -i video.mp4 -vn -c:a pcm_s16le audio.wav
# MP3 to AAC
ffmpeg -i input.mp3 -c:a aac -b:a 256k output.m4a
# WAV to MP3
ffmpeg -i input.wav -c:a libmp3lame -b:a 320k output.mp3
# FLAC to MP3
ffmpeg -i input.flac -c:a libmp3lame -b:a 320k output.mp3
# Multiple files (batch)
for f in *.flac; do
ffmpeg -i "$f" -c:a libmp3lame -b:a 320k "${f%.flac}.mp3"
done
# AAC VBR quality (1-5, higher = better)
ffmpeg -i input.wav -c:a aac -q:a 2 output.m4a
# MP3 VBR quality (0-9, lower = better)
ffmpeg -i input.wav -c:a libmp3lame -q:a 0 output.mp3
# Opus with target bitrate
ffmpeg -i input.wav -c:a libopus -b:a 128k output.opus
| Standard | Target | TP (True Peak) | Use Case | |----------|--------|----------------|----------| | EBU R128 | -23 LUFS | -1 dBTP | European broadcast | | ATSC A/85 | -24 LKFS | -2 dBTP | US broadcast | | Spotify | -14 LUFS | -1 dBTP | Streaming | | YouTube | -14 LUFS | -1 dBTP | Video platform | | Apple Music | -16 LUFS | -1 dBTP | Music streaming | | Podcast | -16 to -19 LUFS | -1 dBTP | Podcast |
# Quick normalization (less accurate)
ffmpeg -i input.mp3 \
-af loudnorm=I=-16:TP=-1.5:LRA=11 \
output.mp3
# Pass 1: Analyze
ffmpeg -i input.mp3 \
-af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json \
-f null -
# Output will include:
# "input_i": "-25.23"
# "input_tp": "-0.50"
# "input_lra": "8.32"
# "input_thresh": "-35.87"
# "target_offset": "1.23"
# Pass 2: Normalize with measured values
ffmpeg -i input.mp3 \
-af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-25.23:measured_TP=-0.50:measured_LRA=8.32:measured_thresh=-35.87:offset=1.23:linear=true \
-ar 48000 \
output.mp3
#!/bin/bash
# loudnorm-2pass.sh
INPUT="$1"
OUTPUT="$2"
TARGET_I="${3:--16}"
TARGET_TP="${4:--1.5}"
TARGET_LRA="${5:-11}"
# Pass 1: Analyze
stats=$(ffmpeg -i "$INPUT" \
-af loudnorm=I=${TARGET_I}:TP=${TARGET_TP}:LRA=${TARGET_LRA}:print_format=json \
-f null - 2>&1 | grep -A 12 "Parsed_loudnorm")
# Extract values
input_i=$(echo "$stats" | grep input_i | tr -d '", ' | cut -d':' -f2)
input_tp=$(echo "$stats" | grep input_tp | tr -d '", ' | cut -d':' -f2)
input_lra=$(echo "$stats" | grep input_lra | tr -d '", ' | cut -d':' -f2)
input_thresh=$(echo "$stats" | grep input_thresh | tr -d '", ' | cut -d':' -f2)
offset=$(echo "$stats" | grep target_offset | tr -d '", ' | cut -d':' -f2)
# Pass 2: Normalize
ffmpeg -i "$INPUT" \
-af "loudnorm=I=${TARGET_I}:TP=${TARGET_TP}:LRA=${TARGET_LRA}:measured_I=${input_i}:measured_TP=${input_tp}:measured_LRA=${input_lra}:measured_thresh=${input_thresh}:offset=${offset}:linear=true" \
-ar 48000 \
"$OUTPUT"
# Normalize to peak level
ffmpeg -i input.mp3 \
-af "volume=0dB:eval=once:precision=fixed" \
-af "loudnorm=I=-16:TP=-1:LRA=11" \
output.mp3
# Simple peak normalization
ffmpeg -i input.mp3 \
-filter:a "volume=replaygain=peak" \
output.mp3
# Normalize to specific RMS level
ffmpeg -i input.mp3 \
-af "loudnorm=I=-23:LRA=7:TP=-2" \
output.mp3
The ffmpeg-normalize Python utility provides an easier interface:
# Install
pip install ffmpeg-normalize
# Basic usage
ffmpeg-normalize input.mp3 -o output.mp3
# Custom target
ffmpeg-normalize input.mp3 -o output.mp3 -t -14
# Batch normalize (album mode - preserves relative loudness)
ffmpeg-normalize *.mp3 --batch -o normalized/
# Use built-in presets (v1.36.0+)
ffmpeg-normalize input.mp3 --preset podcast -o output.mp3
ffmpeg-normalize *.mp3 --preset music --batch -o normalized/
# Increase volume by 50%
ffmpeg -i input.mp3 -af "volume=1.5" output.mp3
# Increase by 6dB
ffmpeg -i input.mp3 -af "volume=6dB" output.mp3
# Decrease by 3dB
ffmpeg -i input.mp3 -af "volume=-3dB" output.mp3
# Fade in 3 seconds, fade out last 3 seconds
ffmpeg -i input.mp3 \
-af "afade=t=in:ss=0:d=3,afade=t=out:st=57:d=3" \
output.mp3
# Calculate fade out start automatically
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp3)
fadeout_start=$(echo "$duration - 3" | bc)
ffmpeg -i input.mp3 \
-af "afade=t=in:ss=0:d=3,afade=t=out:st=${fadeout_start}:d=3" \
output.mp3
# Bass boost
ffmpeg -i input.mp3 \
-af "equalizer=f=100:width_type=o:width=2:g=5" \
output.mp3
# Treble reduction
ffmpeg -i input.mp3 \
-af "equalizer=f=8000:width_type=o:width=2:g=-3" \
output.mp3
# Multi-band EQ
ffmpeg -i input.mp3 \
-af "equalizer=f=100:width_type=o:width=2:g=3,equalizer=f=1000:width_type=o:width=2:g=-2,equalizer=f=8000:width_type=o:width=2:g=2" \
output.mp3
# High-pass filter (remove below 80Hz)
ffmpeg -i input.mp3 -af "highpass=f=80" output.mp3
# Low-pass filter (remove above 8kHz)
ffmpeg -i input.mp3 -af "lowpass=f=8000" output.mp3
# Band-pass filter
ffmpeg -i input.mp3 -af "highpass=f=80,lowpass=f=12000" output.mp3
# FFT-based noise reduction
ffmpeg -i input.mp3 \
-af "afftdn=nf=-25" \
output.mp3
# With noise floor adjustment
ffmpeg -i input.mp3 \
-af "afftdn=nf=-20:tn=1" \
output.mp3
# Dynamic range compression
ffmpeg -i input.mp3 \
-af "acompressor=threshold=-20dB:ratio=4:attack=5:release=50" \
output.mp3
# Limiter
ffmpeg -i input.mp3 \
-af "alimiter=limit=0.9:attack=5:release=50" \
output.mp3
# De-esser
ffmpeg -i input.mp3 \
-af "deesser=i=0.4:f=4000:w=0.5" \
output.mp3
# Detect silence
ffmpeg -i input.mp3 \
-af silencedetect=noise=-30dB:d=0.5 \
-f null -
# Remove silence
ffmpeg -i input.mp3 \
-af "silenceremove=start_periods=1:start_silence=0.5:start_threshold=-50dB:stop_periods=1:stop_silence=0.5:stop_threshold=-50dB" \
output.mp3
# Average both channels
ffmpeg -i stereo.mp3 \
-af "pan=mono|c0=0.5*c0+0.5*c1" \
mono.mp3
# Use only left channel
ffmpeg -i stereo.mp3 -af "pan=mono|c0=c0" mono.mp3
# Downmix stereo to mono
ffmpeg -i stereo.mp3 -ac 1 mono.mp3
# Duplicate mono to both channels
ffmpeg -i mono.mp3 -af "pan=stereo|c0=c0|c1=c0" stereo.mp3
# Simple conversion
ffmpeg -i mono.mp3 -ac 2 stereo.mp3
# Extract left channel
ffmpeg -i stereo.mp3 \
-filter_complex "[0:a]channelsplit=channel_layout=stereo:channels=FL[left]" \
-map "[left]" left.mp3
# Extract right channel
ffmpeg -i stereo.mp3 \
-filter_complex "[0:a]channelsplit=channel_layout=stereo:channels=FR[right]" \
-map "[right]" right.mp3
# Downmix 5.1 to stereo
ffmpeg -i surround.ac3 \
-af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" \
stereo.mp3
# Extract center channel
ffmpeg -i surround.ac3 \
-filter_complex "[0:a]channelsplit=channel_layout=5.1:channels=FC[center]" \
-map "[center]" center.mp3
# Convert to 44.1kHz
ffmpeg -i input.wav -ar 44100 output.wav
# Convert to 48kHz
ffmpeg -i input.wav -ar 48000 output.wav
# High-quality resampling
ffmpeg -i input.wav \
-af "aresample=resampler=soxr:precision=33:cheby=1" \
-ar 44100 output.wav
# Convert to 16-bit
ffmpeg -i input.wav -c:a pcm_s16le output.wav
# Convert to 24-bit
ffmpeg -i input.wav -c:a pcm_s24le output.wav
# Convert to 32-bit float
ffmpeg -i input.wav -c:a pcm_f32le output.wav
# 2x speed (chipmunk effect)
ffmpeg -i input.mp3 -af "atempo=2.0" output.mp3
# 0.5x speed (slow motion)
ffmpeg -i input.mp3 -af "atempo=0.5" output.mp3
# For >2x, chain filters
ffmpeg -i input.mp3 -af "atempo=2.0,atempo=2.0" output.mp3 # 4x
# Pitch shift using rubberband
ffmpeg -i input.mp3 \
-af "rubberband=pitch=1.5" \
output.mp3
# Pitch shift semitones
ffmpeg -i input.mp3 \
-af "asetrate=44100*2^(2/12),aresample=44100" \
output.mp3 # +2 semitones
# Extract 30 seconds starting at 1 minute
ffmpeg -ss 00:01:00 -i input.mp3 -t 00:00:30 -c copy output.mp3
# Extract from 1:00 to 2:30
ffmpeg -ss 00:01:00 -to 00:02:30 -i input.mp3 -c copy output.mp3
# Create file list
echo "file 'part1.mp3'" > list.txt
echo "file 'part2.mp3'" >> list.txt
# Concatenate (same format)
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp3
# Concatenate with re-encode
ffmpeg -f concat -safe 0 -i list.txt -c:a aac -b:a 256k output.m4a
# Crossfade two files (3 second overlap)
ffmpeg -i part1.mp3 -i part2.mp3 \
-filter_complex "acrossfade=d=3:c1=tri:c2=tri" \
output.mp3
For complete processing chains (podcast, music mastering, broadcast EBU R128) and fixes for common post-processing issues (loudnorm resample artifact, A/V sync, re-encoding quality loss), see references/workflows-and-troubleshooting.md.
For detailed audio analysis and measurement filters, see references/audio-analysis-measurement.md. Covers:
audio-analysis.sh aggregating astats + ebur128 + volumedetect + silencedetectThis guide covers FFmpeg audio processing. For video operations, see the fundamentals skill. For noise reduction details, see ffmpeg-noise-reduction.
development
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.