.cursor/skills/fork-setup/SKILL.md
Get DGX Lab running on your own DGX Spark after cloning or forking. Covers prerequisites, install, config, first run, Docker, Tailscale, and common troubleshooting. Use when setting up DGX Lab for the first time or diagnosing setup issues.
npx skillsauth add jxtngx/dgx-lab fork-setupInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Requirement | Check |
|-------------|-------|
| NVIDIA DGX Spark (or GPU with nvidia-smi) | nvidia-smi returns GPU info |
| Python 3.12+ | python3 --version |
| uv | uv --version |
| Bun 1.3+ | bun --version |
| Docker + Docker Compose (production only) | docker compose version |
git clone <your-fork-url> dgx-lab && cd dgx-lab
cd backend && uv sync && cd ..
cd frontend && bun install && cd ..
make dev
Open http://localhost:3000. Backend is at http://localhost:8000/api/health.
All paths are set via env vars in backend/app/config.py. Defaults work out of the box on a Spark:
| Env var | Default | What |
|---------|---------|------|
| DGX_LAB_MODELS_DIR | ~/.cache/huggingface/hub | HuggingFace model cache |
| DGX_LAB_EXPERIMENTS_DIR | ~/.dgx-lab/experiments | Logger experiment data |
| DGX_LAB_TRACES_DIR | ~/.dgx-lab/traces | Agent trace JSONL files |
| DGX_LAB_DATASETS_DIR | ~/.dgx-lab/datasets | Local dataset storage |
| DGX_LAB_MEMORY_TOTAL_GB | 128 | Total memory for fit calculations |
| DGX_LAB_MEMORY_BW_MAX_GBS | 273 | Memory bandwidth ceiling |
If your machine has different specs (e.g. a non-Spark GPU), override the memory vars:
export DGX_LAB_MEMORY_TOTAL_GB=80
export DGX_LAB_MEMORY_BW_MAX_GBS=200
make build
make up
Serves on port 80 via nginx. Stop with make down. Logs with make logs.
# On the Spark
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
tailscale serve http://localhost:80
# From your Mac
open https://spark.your-tailnet.ts.net
See docs/remote-access.md for SSH tunnels and systemd auto-start.
| Problem | Fix |
|---------|-----|
| uv sync fails | Ensure Python 3.12+. Run uv python install 3.12 if needed. |
| bun install fails | Ensure Bun 1.3+. Run curl -fsSL https://bun.sh/install \| bash to update. |
| Monitor shows "GPU not available" | Verify nvidia-smi works on the host. In Docker, add deploy.resources.reservations.devices for GPU access. |
| Frontend can't reach backend | Dev mode proxies /api/* via Next.js rewrite to localhost:8000. Ensure backend is running. |
| Docker build fails on frontend | Ensure bun.lock exists. Run bun install locally first to generate it. |
| Port 80 already in use | Change the nginx port in docker-compose.yaml: ports: ["8080:80"] |
backend/app/main.py ← FastAPI app, router registration
backend/app/config.py ← All env vars and hardware constants
backend/app/routers/*.py ← One router per tool
frontend/apps/web/app/ ← Next.js pages (app router)
frontend/apps/web/components/ ← UI components per tool
frontend/packages/ui/ ← Shared shadcn components
docker-compose.yaml ← frontend + backend + nginx
nginx.conf ← Reverse proxy config
Makefile ← dev, build, up, down, logs, rebuild
tools
INVOKE THIS SKILL when working with LangSmith tracing OR querying traces. Covers adding tracing to applications and querying/exporting trace data. Uses the langsmith CLI tool.
tools
INVOKE THIS SKILL when building evaluation pipelines for LangSmith. Covers three core components: (1) Creating Evaluators - LLM-as-Judge, custom code; (2) Defining Run Functions - how to capture outputs and trajectories from your agent; (3) Running Evaluations - locally with evaluate() or auto-run via LangSmith. Uses the langsmith CLI tool.
tools
INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.
testing
INVOKE THIS SKILL when your LangGraph needs to persist state, remember conversations, travel through history, or configure subgraph checkpointer scoping. Covers checkpointers, thread_id, time travel, Store, and subgraph persistence modes.