skills/vast-gpu/SKILL.md
Rent, manage, and destroy GPU instances on vast.ai. Use when user says "rent gpu", "vast.ai", "rent a server", "cloud gpu", or needs on-demand GPU without owning hardware.
npx skillsauth add shaun-z/auto-claude-code-research-in-sleep vast-gpuInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Manage vast.ai GPU instance: $ARGUMENTS
Rent cheap, capable GPUs from vast.ai on demand. This skill analyzes the training task to determine GPU requirements, searches for the best-value offers, presents options with estimated total cost, and handles the full lifecycle: rent → setup → run → destroy.
Users do NOT specify GPU models or hardware. They describe the task — the skill figures out what to rent.
Prerequisites: The vastai CLI must be installed (requires Python ≥ 3.10) and authenticated:
pip install vastai
vastai set api-key YOUR_API_KEY
If your system Python is < 3.10, create a virtual environment with Python ≥ 3.10 (e.g.,
conda create,pyenv,uv venv, etc.) and installvastaithere.
SSH public key must be uploaded at https://cloud.vast.ai/manage-keys/ BEFORE creating any instance. Keys are baked into instances at creation time — if you add a key after renting, you must destroy and re-create the instance.
All active vast.ai instances are tracked in vast-instances.json at the project root:
[
{
"instance_id": 33799165,
"offer_id": 25831376,
"gpu_name": "RTX_3060",
"num_gpus": 1,
"dph": 0.0414,
"ssh_url": "ssh://[email protected]:58955",
"ssh_host": "1.208.108.242",
"ssh_port": 58955,
"created_at": "2026-03-29T21:12:00Z",
"status": "running",
"experiment": "exp01_baseline",
"estimated_hours": 4.0,
"estimated_cost": 0.17
}
]
This file is the source of truth for /run-experiment and /monitor-experiment to connect to vast.ai instances.
Analyze the task, find the best GPU, and present cost-optimized options. This is the main entry point — called directly or automatically by /run-experiment when gpu: vast is set.
Step 1: Analyze Task Requirements
Read available context to determine what the task needs:
From the experiment plan (refine-logs/EXPERIMENT_PLAN.md):
From experiment scripts (if already written):
num_parameters, config filesDataParallel, DistributedDataParallel, accelerate, deepspeedFrom user description (if no plan/scripts exist):
Step 2: Determine GPU Requirements
Based on the task analysis, determine:
| Factor | How to estimate | |--------|----------------| | Min VRAM | Model params × 4 bytes (fp32) or × 2 (fp16/bf16) + optimizer states + activations. Rules of thumb: 7B model ≈ 16 GB (fp16), 13B ≈ 28 GB, 70B ≈ 140 GB (needs multi-GPU). ResNet/ViT ≈ 4-8 GB. Add 20% headroom. | | Num GPUs | 1 unless: model doesn't fit in single GPU VRAM, or scripts use DDP/FSDP/DeepSpeed, or plan specifies multi-GPU | | Est. hours | From experiment plan's cost column, or: (dataset_size × epochs) / (throughput × batch_size). Default to user estimate if available. Add 30% buffer for setup + unexpected slowdowns | | Min disk | 20 GB base + model checkpoint size + dataset size. Default: 50 GB | | CUDA version | Match PyTorch version. PyTorch 2.x needs CUDA ≥ 11.8. Default: 12.1 |
Step 3: Search Offers
Search across multiple GPU tiers to find the best value. Always search broadly — do NOT limit to one GPU model:
# Tier 1: Budget GPUs (good for small models, fine-tuning, ablations)
vastai search offers "gpu_ram>=<MIN_VRAM> num_gpus>=<N> reliability>0.95 inet_down>100" -o 'dph+' --storage <DISK> --limit 10
# Tier 2: If VRAM > 24 GB, also search high-VRAM cards specifically
vastai search offers "gpu_ram>=48 num_gpus>=<N> reliability>0.95" -o 'dph+' --storage <DISK> --limit 5
The output is a table with columns: ID, CUDA, N (GPU count), Model, PCIE, cpu_ghz, vCPUs, RAM, Disk, $/hr, DLP (deep learning perf), score, NV Driver, Net_up, Net_down, R (reliability %), Max_Days, mach_id, status, host_id, ports, country.
The first column (ID) is the offer ID needed for vastai create instance.
Step 4: Present Cost-Optimized Options
Present 3 options to the user, ranked by estimated total cost:
Task analysis:
- Model: [model name/size] → estimated VRAM: ~[X] GB
- Training: ~[Y] hours estimated
- Requirements: [N] GPU(s), ≥[X] GB VRAM, ~[Z] GB disk
Recommended options (sorted by estimated total cost):
| # | GPU | VRAM | $/hr | Est. Hours | Est. Total | Reliability | Offer ID |
|---|-------------|-------|--------|------------|------------|-------------|-----------|
| 1 | RTX 3060 | 12 GB | $0.04 | ~6h | ~$0.25 | 99.4% | 25831376 | ← cheapest
| 2 | RTX 4090 | 24 GB | $0.28 | ~4h | ~$1.12 | 99.2% | 6995713 | ← best value
| 3 | A100 SXM | 80 GB | $0.95 | ~2h | ~$1.90 | 99.5% | 7023456 | ← fastest
Option 1 is cheapest overall. Option 3 finishes fastest.
Pick a number (or type a different offer ID):
Key presentation rules:
Relative speed scaling (approximate, for estimating hours across GPU tiers):
| GPU | Relative Speed (FP16) | |-----|-----------------------:| | RTX 3060 | 0.5× | | RTX 3090 | 1.0× | | RTX 4090 | 1.6× | | A5000 | 0.9× | | A6000 | 1.1× | | L40S | 1.5× | | A100 SXM | 2.0× | | H100 SXM | 3.3× |
Use these to scale the base estimated hours across offers.
Create an instance from a user-selected offer.
Step 1: Create Instance
vastai create instance <OFFER_ID> \
--image <DOCKER_IMAGE> \
--disk <DISK_GB> \
--ssh \
--direct \
--onstart-cmd "apt-get update && apt-get install -y git screen rsync"
Default Docker image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel (override via CLAUDE.md image: field if set).
The output looks like:
Started. {'success': True, 'new_contract': 33799165, 'instance_api_key': '...'}
The new_contract value is the instance ID — save this for all subsequent commands.
Step 2: Wait for Instance Ready
Poll instance status every 20 seconds until it's running (typically takes 30-60 seconds, max ~5 minutes):
vastai show instances --raw | python3 -c "
import sys, json
instances = json.load(sys.stdin)
for inst in instances:
if inst['id'] == <INSTANCE_ID>:
print(inst['actual_status'])
"
Wait states: loading → running. If stuck in loading for >5 minutes, warn the user — the host may be slow or the image may be large.
Step 3: Get SSH Connection Details
vastai ssh-url <INSTANCE_ID>
This returns a URL in the format: ssh://root@<HOST>:<PORT>
Parse out host and port from this URL. Example:
ssh://[email protected]:589551.208.108.242, Port: 58955Important: Always use
vastai ssh-urlto get connection details — do NOT rely onssh_host/ssh_portfromvastai show instances, as those may point to proxy servers that differ from the direct connection endpoint.
Step 4: Verify SSH Connectivity
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=15 -p <PORT> root@<HOST> "nvidia-smi && echo 'CONNECTION_OK'"
If SSH fails with "Permission denied (publickey)":
If SSH fails with "Connection refused":
Step 5: Update State File
Write/update vast-instances.json with the new instance details including the ssh_url from Step 3, estimated hours and cost.
Step 6: Report
Vast.ai instance ready:
- Instance ID: <ID>
- GPU: <GPU_NAME> x <NUM_GPUS>
- Cost: $<DPH>/hr (estimated total: ~$<TOTAL>)
- SSH: ssh -p <PORT> root@<HOST>
- Docker: <IMAGE>
To deploy: /run-experiment (will auto-detect this instance)
To destroy when done: /vast-gpu destroy <ID>
Set up the rented instance for a specific experiment. Called automatically by /run-experiment when targeting a vast.ai instance.
Step 1: Install Dependencies
ssh -p <PORT> root@<HOST> "pip install -q wandb tensorboard scipy scikit-learn pandas"
If a requirements.txt exists in the project, install that instead:
scp -P <PORT> requirements.txt root@<HOST>:/workspace/
ssh -p <PORT> root@<HOST> "pip install -q -r /workspace/requirements.txt"
Note:
scpuses uppercase-Pfor port, whilesshuses lowercase-p.
Step 2: Sync Code
rsync -avz -e "ssh -p <PORT>" \
--include='*.py' --include='*.yaml' --include='*.yml' --include='*.json' \
--include='*.txt' --include='*.sh' --include='*/' \
--exclude='*.pt' --exclude='*.pth' --exclude='*.ckpt' \
--exclude='__pycache__' --exclude='.git' --exclude='data/' \
--exclude='wandb/' --exclude='outputs/' \
./ root@<HOST>:/workspace/project/
Step 3: Verify Setup
ssh -p <PORT> root@<HOST> "cd /workspace/project && python -c 'import torch; print(f\"PyTorch {torch.__version__}, CUDA: {torch.cuda.is_available()}, GPUs: {torch.cuda.device_count()}\")'"
Expected output: PyTorch 2.1.0, CUDA: True, GPUs: 1 (or more GPUs if multi-GPU instance).
Tear down a vast.ai instance to stop billing.
Step 1: Confirm Results Collected
Before destroying, check if there are experiment results to download:
ssh -p <PORT> root@<HOST> "ls /workspace/project/results/ 2>/dev/null || echo 'NO_RESULTS_DIR'"
If results exist, download them first:
rsync -avz -e "ssh -p <PORT>" root@<HOST>:/workspace/project/results/ ./results/
Also download logs:
scp -P <PORT> root@<HOST>:/workspace/*.log ./logs/ 2>/dev/null
Step 2: Destroy Instance
vastai destroy instance <INSTANCE_ID>
Output: destroying instance <INSTANCE_ID>.
Destruction is irreversible — all data on the instance is permanently deleted.
Step 3: Update State File
Remove the instance from vast-instances.json or mark its status as destroyed.
Step 4: Report Cost
Calculate actual cost based on creation time and $/hr:
Instance <ID> destroyed.
- Duration: ~X.X hours
- Actual cost: ~$X.XX (estimated was $Y.YY)
- Results downloaded to: ./results/
Show all active vast.ai instances:
vastai show instances
Cross-reference with vast-instances.json for experiment associations.
Tear down all active instances (use after all experiments complete):
vast-instances.json--direct SSH when creating instances — faster than proxy SSHvastai ssh-url <ID> to get connection details — the host/port from show instances may differpytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel unless user specifies otherwise/workspace/ (Docker default). Code syncs to /workspace/project/vast-instances.json must stay up to date — other skills depend on itvastai CLI requires Python ≥ 3.10 — if system Python is older, use a conda envUsers only need to set gpu: vast — no hardware preferences required:
## Vast.ai
- gpu: vast # tells run-experiment to use vast.ai
- auto_destroy: true # auto-destroy after experiment completes (default: true)
- max_budget: 5.00 # optional: max total $ to spend (skill warns if estimate exceeds this)
- image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel # optional: override Docker image
The skill analyzes experiment scripts and plans to determine what GPU to rent. No need to specify GPU model, VRAM, or instance count.
/run-experiment "train model" ← detects gpu: vast, calls /vast-gpu provision
↳ /vast-gpu provision ← analyzes task, presents options with cost
↳ user picks option ← rent + setup + deploy
↳ /vast-gpu destroy ← auto-destroy when done (if auto_destroy: true)
/vast-gpu provision ← manual: analyze task + show options
/vast-gpu rent <offer_id> ← manual: rent a specific offer
/vast-gpu list ← show active instances
/vast-gpu destroy <instance_id> ← tear down, stop billing
/vast-gpu destroy-all ← tear down everything
development
Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.
development
Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.
development
Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.
data-ai
Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.