Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

BBuf/h100-sglang-diffusion

Name: h100-sglang-diffusion
Author: BBuf

skills/h100-sglang-diffusion/SKILL.md

npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS h100-sglang-diffusion

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

H100 — SGLang Diffusion

Overview

Use this skill to do SGLang diffusion development on the H100 box through h100_sglang. The default container is sglang_bbuf and the repo lives at /data/bbuf/repos/sglang.

Prefer this skill when:

Validating diffusion Triton / CUDA JIT kernels
Running diffusion model smoke tests (DiffGenerator, flux, etc.)
Comparing eager vs torch.compile diffusion performance
Verifying python[diffusion] editable install changes

This environment is already prepared:

sglang_bbuf is running on lmsysorg/sglang:dev
the repo is cloned at /data/bbuf/repos/sglang
editable installs for python[all] and python[diffusion] are already done
/data/.cache is mounted to /root/.cache
Infiniband paths are mounted for RDMA-aware workflows: /sys/class/infiniband, /dev/infiniband, and /usr/sbin/show_gids

Quick Start

Check the host, container, and GPU state.

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

Enter the container and confirm HF token visibility.

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}

If HF_TOKEN is missing, export it before any Hub-backed diffusion run:

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

For non-interactive docker exec ... bash -lc "<cmd>" runs, export both variables inline instead of relying on shell startup:

ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'

Pick a free GPU.

Use a GPU with 0 utilization and only a few MiB allocated. Always set CUDA_VISIBLE_DEVICES=<gpu_id> for diffusion validation commands.

If the container is not running, start it.

ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

Inspect the repo state before editing.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'

Fast-forward to latest clean main before creating a validation worktree.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

Never write directly into /data/bbuf/repos/sglang when it is dirty.
Use one of these isolation strategies.

Create a detached worktree for remote-only experiments:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

Stream the local working tree into the container (validates exactly what is local right now):

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

For patch-oriented validation:

fast-forward remote main
create a detached worktree from that commit
stream or git apply only the focused local diff into the worktree

This keeps /data/bbuf/repos/sglang clean while still validating the exact local delta.

Diffusion Validation Workflow

1. Syntax / Import Check

Always start here before running any GPU kernel or model test.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

For broader coverage:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

2. JIT Kernel Smoke

Run a targeted smoke script covering the changed primitives before any model-level test.

Cover at least these when relevant:

rms_norm_fn
RMSNorm under torch.compile
norm_infer
apply_rotary_embedding

Pipe the smoke script through docker exec -i:

ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py

3. Fused Modulation Regression

Run this after any change to jit_kernel/diffusion/triton:

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/diffusion/test_qwen_image_modulation.py -q"'

4. General Diffusion Tests

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'

5. Model-Level Smoke (`DiffGenerator`)

Only after steps 1–4 pass.

Use a real .py file with if __name__ == "__main__": guard — multiprocessing.spawn will fail if the entry point is stdin or unguarded top-level code.

# stream the script file to the container
scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'

Treat checkpoint, dependency, and environment failures separately from code regressions.

6. Server-Level Smoke

Only attempt after model-level smoke passes.

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'

Torch Compile Attribution

When a benchmark compares eager vs torch.compile, do not stop at the speedup number. Capture matching eager and compile perf dumps or profile dirs. Compare structured perf dumps with the in-repo comparator:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python python/sglang/multimodal_gen/benchmarks/compare_perf.py eager.json compile.json"'

For trace-level attribution, use llm-torch-profiler-analysis on the matching profile dirs and explain whether the gain came from fewer launches, fewer copies, or fused kernels replacing eager ATen ops.

Cleanup

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'

BBuf/h100-sglang-diffusion

skills/h100-sglang-diffusion/SKILL.md

SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.

156 stars

development

Updated Apr 29, 2026

$ install --global

skillsauth

npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS h100-sglang-diffusion

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 29, 2026, 3:32 AM363.8s1 file scanned

SKILL.md

name:: h100-sglang-diffusion
description:: SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.

H100 — SGLang Diffusion

Overview

Use this skill to do SGLang diffusion development on the H100 box through h100_sglang. The default container is sglang_bbuf and the repo lives at /data/bbuf/repos/sglang.

Prefer this skill when:

Validating diffusion Triton / CUDA JIT kernels
Running diffusion model smoke tests (DiffGenerator, flux, etc.)
Comparing eager vs torch.compile diffusion performance
Verifying python[diffusion] editable install changes

This environment is already prepared:

sglang_bbuf is running on lmsysorg/sglang:dev
the repo is cloned at /data/bbuf/repos/sglang
editable installs for python[all] and python[diffusion] are already done
/data/.cache is mounted to /root/.cache
Infiniband paths are mounted for RDMA-aware workflows: /sys/class/infiniband, /dev/infiniband, and /usr/sbin/show_gids

Quick Start

Check the host, container, and GPU state.

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

Enter the container and confirm HF token visibility.

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}

If HF_TOKEN is missing, export it before any Hub-backed diffusion run:

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

For non-interactive docker exec ... bash -lc "<cmd>" runs, export both variables inline instead of relying on shell startup:

ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'

Pick a free GPU.

Use a GPU with 0 utilization and only a few MiB allocated. Always set CUDA_VISIBLE_DEVICES=<gpu_id> for diffusion validation commands.

If the container is not running, start it.

ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

Inspect the repo state before editing.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'

Fast-forward to latest clean main before creating a validation worktree.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

Never write directly into /data/bbuf/repos/sglang when it is dirty.
Use one of these isolation strategies.

Create a detached worktree for remote-only experiments:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

Stream the local working tree into the container (validates exactly what is local right now):

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

For patch-oriented validation:

fast-forward remote main
create a detached worktree from that commit
stream or git apply only the focused local diff into the worktree

This keeps /data/bbuf/repos/sglang clean while still validating the exact local delta.

Diffusion Validation Workflow

1. Syntax / Import Check

Always start here before running any GPU kernel or model test.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

For broader coverage:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

2. JIT Kernel Smoke

Run a targeted smoke script covering the changed primitives before any model-level test.

Cover at least these when relevant:

rms_norm_fn
RMSNorm under torch.compile
norm_infer
apply_rotary_embedding

Pipe the smoke script through docker exec -i:

ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py

3. Fused Modulation Regression

Run this after any change to jit_kernel/diffusion/triton:

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/diffusion/test_qwen_image_modulation.py -q"'

4. General Diffusion Tests

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'

5. Model-Level Smoke (`DiffGenerator`)

Only after steps 1–4 pass.

Use a real .py file with if __name__ == "__main__": guard — multiprocessing.spawn will fail if the entry point is stdin or unguarded top-level code.

# stream the script file to the container
scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'

Treat checkpoint, dependency, and environment failures separately from code regressions.

6. Server-Level Smoke

Only attempt after model-level smoke passes.

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'

Torch Compile Attribution

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python python/sglang/multimodal_gen/benchmarks/compare_perf.py eager.json compile.json"'

Cleanup

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'

Related Skills

BBuf/vllm-sota-humanize-loop

development

VerifiedTrustedCommunity

Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.

423SKILL.mdUpdated May 27, 2026

BBuf/vllm-sota-humanize-loop

BBuf/llm-pipeline-analysis

devops

VerifiedTrustedCommunity

Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.

423SKILL.mdUpdated May 21, 2026

BBuf/llm-pipeline-analysis

BBuf/sglang-sota-humanize-loop

development

VerifiedTrustedCommunity

Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.

423SKILL.mdUpdated May 17, 2026

BBuf/sglang-sota-humanize-loop

BBuf/model-pr-history-knowledge

documentation

VerifiedTrustedCommunity

Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.

423SKILL.mdUpdated May 17, 2026

BBuf/model-pr-history-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS.git

# Copy into Claude Code skills folder (global)
cp -r AI-Infra-Auto-Driven-SKILLS/skills/h100-sglang-diffusion ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

BBuf/AI-Infra-Auto-Driven-SKILLS

156 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

BBuf/h100-sglang-diffusion

$ install --global

Security Scan Results

SKILL.md

H100 — SGLang Diffusion

Overview

Quick Start

Safe Remote Workflow

Diffusion Validation Workflow

1. Syntax / Import Check

2. JIT Kernel Smoke

3. Fused Modulation Regression

4. General Diffusion Tests

5. Model-Level Smoke (DiffGenerator)

6. Server-Level Smoke

Torch Compile Attribution

Cleanup

Related Skills

BBuf/vllm-sota-humanize-loop

BBuf/llm-pipeline-analysis

BBuf/sglang-sota-humanize-loop

BBuf/model-pr-history-knowledge

BBuf/h100-sglang-diffusion

$ install --global

Security Scan Results

SKILL.md

H100 — SGLang Diffusion

Overview

Quick Start

Safe Remote Workflow

Diffusion Validation Workflow

1. Syntax / Import Check

2. JIT Kernel Smoke

3. Fused Modulation Regression

4. General Diffusion Tests

5. Model-Level Smoke (DiffGenerator)

6. Server-Level Smoke

Torch Compile Attribution

Cleanup

Related Skills

BBuf/vllm-sota-humanize-loop

BBuf/llm-pipeline-analysis

BBuf/sglang-sota-humanize-loop

BBuf/model-pr-history-knowledge

5. Model-Level Smoke (`DiffGenerator`)

5. Model-Level Smoke (`DiffGenerator`)