plugins/unity-master/skills/unity-performance/SKILL.md
Unity performance optimization for slow games, stutters, and FPS drops. PROACTIVELY activate for: (1) game is slow, stuttering, or has FPS drops, (2) Unity Profiler analysis, (3) Frame Debugger and draw-call investigation, (4) batching (static, dynamic, GPU instancing, SRP Batcher), (5) object pooling to reduce GC pressure, (6) garbage collection spikes and allocation hot paths, (7) memory leaks and Memory Profiler usage, (8) LOD and occlusion culling, (9) texture compression and mipmap settings, (10) build size reduction. Provides: profiling workflow, batching patterns, object-pool implementations, GC reduction techniques, LOD/culling setup, and texture-compression matrix.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace unity-performanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic approach to profiling and optimizing Unity games. Covers profiling tools, CPU/GPU optimization, memory management, rendering optimization, and platform-specific considerations.
| Tool | What It Shows | When to Use | |------|--------------|-------------| | Unity Profiler | CPU, GPU, memory, audio, physics per frame | First stop for any perf issue | | Frame Debugger | Draw call breakdown, shader/material state | Rendering bottlenecks | | Memory Profiler | Heap snapshots, texture/mesh memory | Memory leaks, bloat | | Profile Analyzer | Compare captures, statistical analysis | Before/after optimization | | Physics Debugger | Collider visualization, contact points | Physics performance |
If frame time > 16.6ms (60 FPS target):
CPU timeline > GPU timeline -> CPU-bound
GPU timeline > CPU timeline -> GPU-bound
GC.Alloc column shows per-frame allocations -> GC pressure
Look for spikes (single bad frames) vs. sustained high times (baseline too heavy).
| Anti-Pattern | Fix |
|-------------|-----|
| string + string in Update | Use StringBuilder or cache |
| new List<T>() every frame | Allocate once, Clear() and reuse |
| LINQ in hot paths | Replace with manual loops |
| GetComponent<T>() per frame | Cache in Awake/Start |
| GameObject.Find() per frame | Cache reference or use events |
| foreach on non-generic collections | Use for loop or generic collections |
| SendMessage() / BroadcastMessage() | Use direct calls, events, or interfaces |
| Boxing value types | Use generic collections, avoid object casts |
public class ObjectPool<T> where T : Component
{
readonly Queue<T> _pool = new();
readonly T _prefab;
readonly Transform _parent;
public ObjectPool(T prefab, int preWarm, Transform parent = null)
{
_prefab = prefab;
_parent = parent;
for (int i = 0; i < preWarm; i++)
_pool.Enqueue(CreateInstance());
}
public T Get(Vector3 position, Quaternion rotation)
{
var obj = _pool.Count > 0 ? _pool.Dequeue() : CreateInstance();
obj.transform.SetPositionAndRotation(position, rotation);
obj.gameObject.SetActive(true);
return obj;
}
public void Return(T obj)
{
obj.gameObject.SetActive(false);
_pool.Enqueue(obj);
}
T CreateInstance()
{
var obj = Object.Instantiate(_prefab, _parent);
obj.gameObject.SetActive(false);
return obj;
}
}
Pool bullets, particles, enemies, UI elements -- anything instantiated/destroyed frequently. Unity 2021+ also has UnityEngine.Pool.ObjectPool<T> built-in.
| Technique | Description |
|-----------|-------------|
| Stagger updates | Don't update all AI every frame; use tick groups |
| Distance-based LOD | Reduce update frequency for distant objects |
| Event-driven | Replace polling with events where possible |
| Disable unused scripts | enabled = false on off-screen components |
| Use InvokeRepeating | For periodic checks (cheaper than coroutine yielding) |
| Technique | How | Savings |
|-----------|-----|---------|
| Static Batching | Mark non-moving objects as Static | Combines meshes at build time |
| Dynamic Batching | Automatic for small meshes (<300 verts) | URP/Built-in only |
| GPU Instancing | Enable on materials for repeated objects | Trees, grass, rocks |
| SRP Batcher | Enabled by default in URP/HDRP | Reduces SetPass calls |
| Texture Atlasing | Combine textures into atlas | Fewer material switches |
| Mesh Combining | CombineMeshes() at runtime | Custom batching |
LOD Group Setup:
LOD 0 (0-30%): Full-detail mesh (5000 tris)
LOD 1 (30-60%): Medium mesh (2000 tris)
LOD 2 (60-90%): Low mesh (500 tris)
Culled (90%+): Not rendered
Use LOD for meshes, but also reduce script complexity, particle counts, and physics at distance.
Bake occlusion data for indoor/complex scenes. Mark large static occluders (walls, floors). Configure cell size based on scene scale. Use the Occlusion Culling window to visualize and test.
| Issue | Solution | |-------|----------| | Complex fragment shaders | Reduce texture samples, simplify math | | Overdraw | Minimize transparent objects, use opaque when possible | | Too many variants | Strip unused shader variants in build settings | | Expensive post-processing | Disable effects on mobile, use cheaper alternatives |
| Issue | Symptom | Fix |
|-------|---------|-----|
| Texture bloat | High memory, long loads | Compress textures, reduce max size per platform |
| Unloaded scenes holding refs | Memory climbs over time | Use Resources.UnloadUnusedAssets() after scene transitions |
| Addressables not released | Bundles stay in memory | Call Addressables.Release(handle) |
| Audio clips uncompressed | Huge memory footprint | Use compressed in memory for music, decompress on load for SFX |
| Mesh read/write enabled | Double memory per mesh | Disable Read/Write if not needed at runtime |
| Platform | Format | Notes | |----------|--------|-------| | PC/Console | BC7 (DXT) | Best quality/size ratio | | Android | ASTC 6x6 | Universal, scalable quality | | iOS | ASTC 6x6 | Same as Android | | WebGL | ETC2 / DXT | Depends on target GPU |
Use "Override for [Platform]" in texture import settings. Set max texture size to the minimum needed (512 for UI icons, 1024 for props, 2048 for hero assets).
| Platform | Key Constraints | |----------|----------------| | Mobile | Thermal throttling, limited memory, battery drain, fill-rate limited | | WebGL | No threads (pre-Unity 6), large download size, no compute shaders | | Console | Certification requirements, fixed hardware, memory budgets | | VR/XR | 72-90 FPS minimum, stereo rendering cost, motion sickness from drops |
Use Addressables for async asset loading to avoid load-time hitches:
// Preload during loading screen
var handle = Addressables.LoadAssetAsync<GameObject>("enemy_boss");
await handle;
// Release when done
Addressables.Release(handle);
Use Addressable groups to control bundle granularity. Mark infrequently used assets as remote/on-demand. Profile bundle memory with the Addressables Event Viewer.
references/profiling-deep-dive.md -- Advanced Profiler usage, memory profiler snapshots, frame-by-frame analysis, custom profiler markers, automated performance testing, build size analysis, ECS/DOTS performance patternsdevelopment
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.