cli-tool/components/skills/ai-research/model-architecture-litgpt/SKILL.md
Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
npx skillsauth add davila7/claude-code-templates implementing-llms-litgptInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LitGPT provides 20+ pretrained LLM implementations with clean, readable code and production-ready training workflows.
Installation:
pip install 'litgpt[extra]'
Load and use any model:
from litgpt import LLM
# Load pretrained model
llm = LLM.load("microsoft/phi-2")
# Generate text
result = llm.generate(
"What is the capital of France?",
max_new_tokens=50,
temperature=0.7
)
print(result)
List available models:
litgpt download list
Copy this checklist:
Fine-Tuning Setup:
- [ ] Step 1: Download pretrained model
- [ ] Step 2: Prepare dataset
- [ ] Step 3: Configure training
- [ ] Step 4: Run fine-tuning
Step 1: Download pretrained model
# Download Llama 3 8B
litgpt download meta-llama/Meta-Llama-3-8B
# Download Phi-2 (smaller, faster)
litgpt download microsoft/phi-2
# Download Gemma 2B
litgpt download google/gemma-2b
Models are saved to checkpoints/ directory.
Step 2: Prepare dataset
LitGPT supports multiple formats:
Alpaca format (instruction-response):
[
{
"instruction": "What is the capital of France?",
"input": "",
"output": "The capital of France is Paris."
},
{
"instruction": "Translate to Spanish: Hello, how are you?",
"input": "",
"output": "Hola, ¿cómo estás?"
}
]
Save as data/my_dataset.json.
Step 3: Configure training
# Full fine-tuning (requires 40GB+ GPU for 7B models)
litgpt finetune \
meta-llama/Meta-Llama-3-8B \
--data JSON \
--data.json_path data/my_dataset.json \
--train.max_steps 1000 \
--train.learning_rate 2e-5 \
--train.micro_batch_size 1 \
--train.global_batch_size 16
# LoRA fine-tuning (efficient, 16GB GPU)
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--train.max_steps 1000 \
--train.learning_rate 1e-4
Step 4: Run fine-tuning
Training saves checkpoints to out/finetune/ automatically.
Monitor training:
# View logs
tail -f out/finetune/logs.txt
# TensorBoard (if using --train.logger_name tensorboard)
tensorboard --logdir out/finetune/lightning_logs
Most memory-efficient option.
LoRA Training:
- [ ] Step 1: Choose base model
- [ ] Step 2: Configure LoRA parameters
- [ ] Step 3: Train with LoRA
- [ ] Step 4: Merge LoRA weights (optional)
Step 1: Choose base model
For limited GPU memory (12-16GB):
Step 2: Configure LoRA parameters
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \ # LoRA rank (8-64, higher=more capacity)
--lora_alpha 32 \ # LoRA scaling (typically 2×r)
--lora_dropout 0.05 \ # Prevent overfitting
--lora_query true \ # Apply LoRA to query projection
--lora_key false \ # Usually not needed
--lora_value true \ # Apply LoRA to value projection
--lora_projection true \ # Apply LoRA to output projection
--lora_mlp false \ # Usually not needed
--lora_head false # Usually not needed
LoRA rank guide:
r=8: Lightweight, 2-4MB adaptersr=16: Standard, good qualityr=32: High capacity, use for complex tasksr=64: Maximum quality, 4× larger adaptersStep 3: Train with LoRA
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--train.epochs 3 \
--train.learning_rate 1e-4 \
--train.micro_batch_size 4 \
--train.global_batch_size 32 \
--out_dir out/phi2-lora
# Memory usage: ~8-12GB for Phi-2 with LoRA
Step 4: Merge LoRA weights (optional)
Merge LoRA adapters into base model for deployment:
litgpt merge_lora \
out/phi2-lora/final \
--out_dir out/phi2-merged
Now use merged model:
from litgpt import LLM
llm = LLM.load("out/phi2-merged")
Train new model on your domain data.
Pretraining:
- [ ] Step 1: Prepare pretraining dataset
- [ ] Step 2: Configure model architecture
- [ ] Step 3: Set up multi-GPU training
- [ ] Step 4: Launch pretraining
Step 1: Prepare pretraining dataset
LitGPT expects tokenized data. Use prepare_dataset.py:
python scripts/prepare_dataset.py \
--source_path data/my_corpus.txt \
--checkpoint_dir checkpoints/tokenizer \
--destination_path data/pretrain \
--split train,val
Step 2: Configure model architecture
Edit config file or use existing:
# config/pythia-160m.yaml
model_name: pythia-160m
block_size: 2048
vocab_size: 50304
n_layer: 12
n_head: 12
n_embd: 768
rotary_percentage: 0.25
parallel_residual: true
bias: true
Step 3: Set up multi-GPU training
# Single GPU
litgpt pretrain \
--config config/pythia-160m.yaml \
--data.data_dir data/pretrain \
--train.max_tokens 10_000_000_000
# Multi-GPU with FSDP
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir data/pretrain \
--devices 8 \
--train.max_tokens 100_000_000_000
Step 4: Launch pretraining
For large-scale pretraining on cluster:
# Using SLURM
sbatch --nodes=8 --gpus-per-node=8 \
pretrain_script.sh
# pretrain_script.sh content:
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir /shared/data/pretrain \
--devices 8 \
--num_nodes 8 \
--train.global_batch_size 512 \
--train.max_tokens 300_000_000_000
Export LitGPT models for production.
Model Deployment:
- [ ] Step 1: Test inference locally
- [ ] Step 2: Quantize model (optional)
- [ ] Step 3: Convert to GGUF (for llama.cpp)
- [ ] Step 4: Deploy with API
Step 1: Test inference locally
from litgpt import LLM
llm = LLM.load("out/phi2-lora/final")
# Single generation
print(llm.generate("What is machine learning?"))
# Streaming
for token in llm.generate("Explain quantum computing", stream=True):
print(token, end="", flush=True)
# Batch inference
prompts = ["Hello", "Goodbye", "Thank you"]
results = [llm.generate(p) for p in prompts]
Step 2: Quantize model (optional)
Reduce model size with minimal quality loss:
# 8-bit quantization (50% size reduction)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--dtype bfloat16 \
--quantize bnb.nf4
# 4-bit quantization (75% size reduction)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--quantize bnb.nf4-dq # Double quantization
Step 3: Convert to GGUF (for llama.cpp)
python scripts/convert_lit_checkpoint.py \
--checkpoint_path out/phi2-lora/final \
--output_path models/phi2.gguf \
--model_name microsoft/phi-2
Step 4: Deploy with API
from fastapi import FastAPI
from litgpt import LLM
app = FastAPI()
llm = LLM.load("out/phi2-lora/final")
@app.post("/generate")
def generate(prompt: str, max_tokens: int = 100):
result = llm.generate(
prompt,
max_new_tokens=max_tokens,
temperature=0.7
)
return {"response": result}
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
Use LitGPT when:
Use alternatives instead:
Issue: Out of memory during fine-tuning
Use LoRA instead of full fine-tuning:
# Instead of litgpt finetune (requires 40GB+)
litgpt finetune_lora # Only needs 12-16GB
Or enable gradient checkpointing:
litgpt finetune_lora \
... \
--train.gradient_accumulation_iters 4 # Accumulate gradients
Issue: Training too slow
Enable Flash Attention (built-in, automatic on compatible hardware):
# Already enabled by default on Ampere+ GPUs (A100, RTX 30/40 series)
# No configuration needed
Use smaller micro-batch and accumulate:
--train.micro_batch_size 1 \
--train.global_batch_size 32 \
--train.gradient_accumulation_iters 32 # Effective batch=32
Issue: Model not loading
Check model name:
# List all available models
litgpt download list
# Download if not exists
litgpt download meta-llama/Meta-Llama-3-8B
Verify checkpoints directory:
ls checkpoints/
# Should see: meta-llama/Meta-Llama-3-8B/
Issue: LoRA adapters too large
Reduce LoRA rank:
--lora_r 8 # Instead of 16 or 32
Apply LoRA to fewer layers:
--lora_query true \
--lora_value true \
--lora_projection false \ # Disable this
--lora_mlp false # And this
Supported architectures: See references/supported-models.md for complete list of 20+ model families with sizes and capabilities.
Training recipes: See references/training-recipes.md for proven hyperparameter configurations for pretraining and fine-tuning.
FSDP configuration: See references/distributed-training.md for multi-GPU training with Fully Sharded Data Parallel.
Custom architectures: See references/custom-models.md for implementing new model architectures in LitGPT style.
tools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.