plugins/ffmpeg-core/skills/ffmpeg-hardware-acceleration/SKILL.md
Complete GPU-accelerated encoding/decoding system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) NVIDIA NVENC/NVDEC encoding, (2) Intel Quick Sync Video (QSV), (3) AMD AMF encoding, (4) Apple VideoToolbox, (5) Linux VAAPI setup, (6) Vulkan Video 8.0 (FFv1, AV1, VP9, ProRes RAW), (7) VVC/H.266 hardware decoding (VAAPI/QSV), (8) GPU pipeline optimization with pad_cuda, (9) Docker GPU containers, (10) Performance benchmarking. Provides: Platform-specific commands, preset comparisons, quality tuning, full GPU pipeline examples, Vulkan compute codecs, VVC decoding, troubleshooting guides. Ensures: Maximum encoding speed with optimal quality using GPU acceleration.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace ffmpeg-hardware-accelerationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Activate when GPU acceleration is needed:
GPU encoding trades some quality for massive speed. Use -cq, -qp, or -global_quality for quality control.
| Platform | Encoder | Decoder | Detect Command |
|----------|---------|---------|----------------|
| NVIDIA | h264_nvenc, hevc_nvenc, av1_nvenc | h264_cuvid, hevc_cuvid | ffmpeg -encoders \| grep nvenc |
| Intel QSV | h264_qsv, hevc_qsv, av1_qsv | h264_qsv, hevc_qsv | ffmpeg -encoders \| grep qsv |
| AMD AMF | h264_amf, hevc_amf, av1_amf | N/A (use software) | ffmpeg -encoders \| grep amf |
| Apple | h264_videotoolbox, hevc_videotoolbox | h264_videotoolbox | macOS only |
| VAAPI | h264_vaapi, hevc_vaapi, av1_vaapi | with -hwaccel vaapi | Linux only |
| Vulkan | h264_vulkan, hevc_vulkan, av1_vulkan, ffv1_vulkan | VP9, ProRes RAW (8.0+) | ffmpeg -encoders \| grep vulkan |
Current Latest: FFmpeg 8.0.1 (released 2025-11-20). Check with ffmpeg -version.
Hardware acceleration uses dedicated GPU/SoC components for video processing:
| Method | Speed | Quality | Power | Use Case | |--------|-------|---------|-------|----------| | libx264 (CPU) | 1x | Best | High | Quality-critical | | libx265 (CPU) | 0.3x | Best | Very High | Archival | | h264_nvenc | 10-20x | Good | Low | Real-time, streaming | | hevc_nvenc | 8-15x | Good | Low | 4K streaming | | h264_qsv | 8-15x | Good | Very Low | Laptop, efficiency | | h264_amf | 8-15x | Good | Low | AMD systems |
ffmpeg -hwaccels, ffmpeg -encoders | grep <api>-hwaccel <api> and -hwaccel_output_format <api> before -iscale_cuda, scale_vulkan, vpp_qsv, ...)-preset (NVENC p1-p7), -cq/-qp/-global_quality, lookahead, spatial/temporal AQffmpeg -benchmark and monitor GPU via nvidia-smi dmon, intel_gpu_top, etc.# NVIDIA NVENC
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-c:v h264_nvenc -preset p4 -b:v 5M output.mp4
# Intel QSV
ffmpeg -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
-c:v h264_qsv -preset medium -b:v 5M output.mp4
# AMD AMF
ffmpeg -i input.mp4 -c:v h264_amf -quality balanced -b:v 5M output.mp4
# Apple VideoToolbox
ffmpeg -i input.mp4 -c:v h264_videotoolbox -b:v 5M output.mp4
# Linux VAAPI
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi -i input.mp4 \
-c:v h264_vaapi -b:v 5M output.mp4
# Vulkan (cross-platform)
ffmpeg -init_hw_device vulkan -i input.mp4 \
-c:v h264_vulkan -b:v 5M output.mp4
ffmpeg -y -vsync 0 \
-hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc -preset p4 -b:v 5M \
-c:a copy \
output.mp4
Omitting -hwaccel_output_format can cut throughput by up to 50% because decoded frames silently round-trip through CPU memory. See references/gpu-memory-and-troubleshooting.md for memory flow diagrams and best practices.
| Operation | NVIDIA | Intel | AMD/Linux | Cross-platform |
|-----------|--------|-------|-----------|----------------|
| Scale | scale_cuda, scale_npp | vpp_qsv, scale_qsv | scale_vaapi | scale_vulkan, scale_opencl, libplacebo |
| Overlay | overlay_cuda | - | - | overlay_vulkan, overlay_opencl |
| Deinterlace | bwdif_cuda | vpp_qsv | deinterlace_vaapi | bwdif_vulkan |
| Denoise | bilateral_cuda | - | - | nlmeans_vulkan, nlmeans_opencl |
| Chromakey | chromakey_cuda | - | - | colorkey_opencl |
| Tonemap (HDR->SDR) | - | - | tonemap_vaapi | libplacebo, tonemap_opencl |
| Pad/letterbox | pad_cuda (8.0+) | - | - | pad_opencl |
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \
-c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \
-f flv rtmp://server/live/stream
ffmpeg -i input.mp4 \
-c:v hevc_nvenc -preset p6 -tune hq \
-rc vbr -cq 22 -b:v 0 \
-rc-lookahead 32 -spatial-aq 1 \
output.mp4
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 &
wait
docker run --gpus all --rm -v $(pwd):/data \
jrottenberg/ffmpeg:nvidia \
-hwaccel cuda -hwaccel_output_format cuda \
-i /data/input.mp4 -c:v h264_nvenc /data/output.mp4
For deep dives on each backend, see:
references/nvidia-nvenc.mdreferences/intel-qsv.mdreferences/amd-amf-vaapi-videotoolbox.mdreferences/vulkan.mdreferences/opencl-filters.mdreferences/gpu-memory-and-troubleshooting.mddevelopment
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.