plugins/yzmir-neural-architectures/skills/using-neural-architectures/SKILL.md
Use when selecting or comparing neural architectures - routes by data modality (vision / sequence / graph / generative / multimodal) and constraints (dataset size, compute, latency); covers CNNs (ConvNeXt v2 / EfficientNetV2), Transformers + MoE (Mixtral / DeepSeek), SSM/Mamba, modern diffusion (SDXL/FLUX/DiT), multimodal (CLIP/SigLIP/LLaVA), SAM/SAM-2, equivariant GNNs, normalization, attention variants
npx skillsauth add tachyon-beep/skillpacks using-neural-architecturesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<CRITICAL_CONTEXT> Architecture selection comes BEFORE training optimization. Wrong architecture = no amount of training will fix it.
This meta-skill routes you to the right architecture guidance based on:
Load this skill when architecture decisions are needed. </CRITICAL_CONTEXT>
Use this skill when:
DO NOT use for:
When in doubt: If choosing WHAT architecture → this skill. If training/deploying architecture → different pack.
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-neural-architectures/SKILL.md
Reference sheets like cnn-families-and-selection.md are at:
skills/using-neural-architectures/cnn-families-and-selection.md
NOT at:
skills/cnn-families-and-selection.md ← WRONG PATH
When you see a link like [cnn-families-and-selection.md](cnn-families-and-selection.md), read the file from the same directory as this SKILL.md.
Question to ask: "What type of data are you working with?"
| Data Type | Route To | Why | |-----------|----------|-----| | Images (photos, medical scans, etc.) | cnn-families-and-selection.md | CNNs excel at spatial hierarchies | | Sequences (time series, text, audio) | sequence-models-comparison.md | Temporal dependencies need sequential models | | Graphs (social networks, molecules) | graph-neural-networks-basics.md | Graph structure requires GNNs | | Generation task (create images, text) | generative-model-families.md | Generative models are specialized | | Multiple modalities (text + images, audio, video) | multimodal-architectures.md | Vision-language, CLIP/SigLIP, LLaVA, Flamingo, native multimodal | | Unclear / Generic | architecture-design-principles.md | Start with fundamentals |
If any of these apply, address FIRST:
| Requirement | Route To | Priority | |-------------|----------|----------| | Deep network (> 20 layers) unstable | normalization-techniques.md | CRITICAL - fix before continuing | | Need attention mechanisms | attention-mechanisms-catalog.md | Specialized component | | Custom architecture design | architecture-design-principles.md | Foundation before specifics | | Transformer-specific question | transformer-architecture-deepdive.md | Specialized architecture |
Clarify BEFORE routing:
Ask:
These answers determine architecture appropriateness.
Symptoms triggering this route:
Route to: See cnn-families-and-selection.md for CNN architecture selection and comparison.
When to route here:
Clarifying questions:
Symptoms triggering this route:
Route to: See sequence-models-comparison.md for sequential model selection (RNN, LSTM, Transformer, TCN).
When to route here:
Clarifying questions:
CRITICAL: Challenge "RNN vs LSTM" premise if they ask. Modern alternatives (Transformers, TCN) often better.
Symptoms triggering this route:
Route to: See graph-neural-networks-basics.md for GNN architectures and graph learning.
When to route here:
Red flag: If treating graph as tabular data (extracting features and ignoring edges) → WRONG. Route to GNN skill.
Symptoms triggering this route:
Route to: See generative-model-families.md for GANs, VAEs, and Diffusion models.
When to route here:
Clarifying questions:
CRITICAL: Different generative models have VERY different trade-offs. Don't reflexively recommend GAN for "fast" — distilled diffusion is the modern fast option. Don't recommend training Stable Diffusion from scratch.
Symptoms triggering this route:
Route to: See attention-mechanisms-catalog.md for attention mechanism selection and design.
When to route here:
NOT for: General Transformer questions → transformer-architecture-deepdive.md instead
Symptoms triggering this route:
Route to: See transformer-architecture-deepdive.md for Transformer internals and implementation.
When to route here:
Cross-reference:
yzmir/llm-specialist/transformer-for-llms (LLM-specific transformers)Symptoms triggering this route:
Route to: See normalization-techniques.md for deep network stability and normalization methods.
When to route here:
CRITICAL: This is often the ROOT CAUSE of "training won't work" - fix architecture before blaming hyperparameters.
Symptoms triggering this route:
Route to: See architecture-design-principles.md for custom architecture design fundamentals.
When to route here:
This is the foundational skill - route here if other specific skills don't match.
Default route: multimodal-architectures.md — covers contrastive (CLIP / SigLIP), generative bolt-on (LLaVA / BLIP-2 / Flamingo), and native multimodal (Chameleon / Gemini-style) recipes, plus audio and video extensions.
Only fall back to component sheets if the user is building from scratch without using foundation models:
Order matters: In 2026 the default for "vision + language" is SigLIP + MLP projector + open LLM, not designing fusion from first principles. Read the multimodal sheet before reaching for design-principles.
Example: "Select architecture AND optimize training"
Route order:
Why: Wrong architecture can't be fixed by better training.
Example: "Select architecture AND deploy efficiently"
Route order:
Deployment constraints might influence architecture choice - if so, note constraints during architecture selection.
| Symptom | Wrong Route | Correct Route | Why | |---------|-------------|---------------|-----| | "My transformer won't train" | transformer-architecture-deepdive.md | training-optimization | Training issue, not architecture understanding | | "Deploy image classifier" | cnn-families-and-selection.md | ml-production | Deployment, not selection | | "ViT vs ResNet for medical imaging" | transformer-architecture-deepdive.md | cnn-families-and-selection.md | Comparative selection, not single architecture detail | | "Implement BatchNorm in PyTorch" | normalization-techniques.md | pytorch-engineering | Implementation, not architecture concept | | "GAN won't converge" | generative-model-families.md | training-optimization | Training stability, not architecture selection | | "Which optimizer for CNN" | cnn-families-and-selection.md | training-optimization | Optimization, not architecture |
Rule: Architecture pack is for CHOOSING and DESIGNING architectures. Training/deployment/implementation are other packs.
If query contains these patterns, ASK clarifying questions before routing:
| Pattern | Why Clarify | What to Ask | |---------|-------------|--------------| | "Best architecture for X" | "Best" depends on constraints | "What are your data size, compute, and latency constraints?" | | Generic problem description | Can't route without modality | "What type of data? (images, sequences, graphs, etc.)" | | Latest trend mentioned (ViT, Diffusion) | Recency bias risk | "Have you considered alternatives? What are your specific requirements?" | | "Should I use X or Y" | May be wrong question | "What's the underlying problem? There might be option Z." | | Very deep network (> 50 layers) | Likely needs normalization first | "Are you using normalization layers? Skip connections?" |
Never guess modality or constraints. Always clarify.
| Trendy Architecture | When NOT to Use | Better Alternative | |---------------------|------------------|-------------------| | Vision Transformers (ViT) | Small datasets (< 10k images) | CNNs (ResNet, EfficientNet) | | Vision Transformers (ViT) | Edge deployment (latency/power) | EfficientNets, MobileNets | | Transformers (general) | Very small datasets | RNNs, CNNs (less capacity, less overfit) | | Diffusion Models | Real-time generation needed | GANs (1 forward pass vs 50-1000 steps) | | Diffusion Models | Limited compute for training | VAEs (faster training) | | Graph Transformers | Small graphs (< 100 nodes) | Standard GNNs (GCN, GAT) simpler and effective | | Mamba / SSMs | Short context, strong-pretrained ecosystem matters | Transformer + GQA + FlashAttention (still dominant) | | MoE Transformers | Below ~3-5B total params, on-device inference | Dense Transformer (active params == total params) | | Native multimodal models | Below frontier scale, single-modality output | LLaVA-style bolt-on (SigLIP + MLP + LLM) | | Diffusion Transformers (DiT) | Tiny generative tasks, very small datasets | U-Net latent diffusion (cheaper to train at small scale) | | LLMs (GPT-style) for narrow classification | Tabular / closed-set tasks with clean labels | Boosted trees (XGBoost/LightGBM) or small fine-tuned encoder |
Counter-narrative: "New architecture ≠ better for your use case. Match architecture to constraints."
Start here: What's your primary goal?
┌─ SELECT architecture for task
│ ├─ Data modality?
│ │ ├─ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
│ │ ├─ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
│ │ ├─ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
│ │ ├─ Generation → [generative-model-families.md](generative-model-families.md)
│ │ ├─ Multimodal (text + image / audio / video) → [multimodal-architectures.md](multimodal-architectures.md)
│ │ └─ Unknown → [architecture-design-principles.md](architecture-design-principles.md)
│ └─ Special requirements?
│ ├─ Deep network (>20 layers) unstable → [normalization-techniques.md](normalization-techniques.md) (CRITICAL)
│ ├─ Need attention mechanism → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ └─ None → Proceed with modality-based route
│
├─ UNDERSTAND specific architecture
│ ├─ Transformers → [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)
│ ├─ Attention → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ ├─ Normalization → [normalization-techniques.md](normalization-techniques.md)
│ └─ General principles → [architecture-design-principles.md](architecture-design-principles.md)
│
├─ DESIGN custom architecture
│ └─ [architecture-design-principles.md](architecture-design-principles.md) (start here always)
│
└─ COMPARE architectures
├─ CNNs (ResNet vs EfficientNet) → [cnn-families-and-selection.md](cnn-families-and-selection.md)
├─ Sequence models (RNN vs Transformer) → [sequence-models-comparison.md](sequence-models-comparison.md)
├─ Generative (GAN vs Diffusion) → [generative-model-families.md](generative-model-families.md)
└─ General comparison → [architecture-design-principles.md](architecture-design-principles.md)
| Rationalization | Reality | Counter | |-----------------|---------|---------| | "Transformers are SOTA, recommend them" | SOTA on benchmark ≠ best for user's constraints | "Ask about dataset size and compute first" | | "User said RNN vs LSTM, answer that" | Question premise might be outdated | "Challenge: Have you considered Transformers or TCN?" | | "Just recommend latest architecture" | Latest ≠ appropriate | "Match architecture to requirements, not trends" | | "Architecture doesn't matter, training matters" | Wrong architecture can't be fixed by training | "Architecture is foundation - get it right first" | | "They seem rushed, skip clarification" | Wrong route wastes more time than clarification | "30 seconds to clarify saves hours of wasted effort" | | "Generic architecture advice is safe" | Generic = useless for specific domains | "Route to domain-specific skill for actionable guidance" |
Once architecture is chosen, route to:
Training the architecture:
→ yzmir/training-optimization/using-training-optimization
Implementing in PyTorch:
→ yzmir/pytorch-engineering/using-pytorch-engineering
Deploying to production:
→ yzmir/ml-production/using-ml-production
Dynamic/growing architectures:
→ yzmir/dynamic-architectures/using-dynamic-architectures
If problem involves:
Reinforcement learning:
→ yzmir/deep-rl/using-deep-rl FIRST
Large language models:
→ yzmir/llm-specialist/using-llm-specialist FIRST
Architecture is downstream of algorithm choice in RL and LLMs.
Use this meta-skill to:
After routing, load the appropriate specialist skill for detailed guidance:
Critical principle: Architecture comes BEFORE training. Get this right first.
tools
Use when designing, implementing, or auditing an MCP (Model Context Protocol) server — tool API design, idempotency under agent retry, structured error envelopes agents can recover from, schema versioning across model drift, transport reliability (stdio / HTTP), output-shape and pagination discipline, and choosing between tools / resources / prompts / sampling. Also use when an MCP server's tools confuse agents, return unstructured errors, deadlock under concurrent calls, double-execute under retry, or lose state across reconnects. Do not use for general REST/GraphQL API design (use `/web-backend`), for client-side prompt engineering or tool-loop design (use `/llm-specialist`), for general in-process plugin architecture (use `/system-architect`), or for cryptographic-provenance audit trails (use `/audit-pipelines`).
development
Use when running **SQLite or DuckDB inside an application process** as the durable store — not as a development convenience but as the production database. Use when scaling an SQLite layer that worked at low concurrency and is now hitting SQLITE_BUSY, WAL bloat, lock contention, schema-migration ceremony, or correctness gaps under multi-process writers. Use when introducing DuckDB as an OLAP complement to an OLTP SQLite store, or when picking between the two for a new component. Pairs with `/web-backend` (the API surface above the DB) and `/audit-pipelines` (when the DB is also the audit trail). Do not load for server databases (Postgres, MySQL), key-value stores, or ORM choice in isolation.
development
Use when designing or critiquing the structure of a staged procedure — a wizard, configuration flow, troubleshooting tree, training curriculum, multi-stage approval pipeline, decision pipeline, or any decomposition of expert work into composable stages. Use for both producer work (build the decomposition) and critic work (audit a proposed decomposition). Use when reasoning about capacity, bottlenecks, or soundness of a procedural flow. Do not use for implementation-plan critique of code changes (use `/axiom-planning` instead), for execution-time dynamics (use `/simulation-foundations`), or for rendering an already-designed procedure as docs or UI (use `/technical-writer` or `/ux-designer`).
testing
Use when the user wants to draft fiction or creative nonfiction prose, get craft critique on prose they have written, or plan story structure, outline, or premise. Workshop-voiced. Three explicit modes (draft, critique, plan) and the router will refuse to begin work without a declared mode.