Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

BBuf/h100

Name: h100
Author: BBuf

skills/h100/SKILL.md

npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS h100

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

H100

Overview

Use this skill to do SGLang development on the H100 box through h100_sglang. The default container is sglang_bbuf and the repo lives at /sgl-workspace/sglang. Prefer it whenever local validation is insufficient for CUDA, Triton, diffusion pipelines, or other GPU-backed SGLang behavior.

This environment is already prepared:

sglang_bbuf is running on lmsysorg/sglang:dev
the repo is cloned at /sgl-workspace/sglang
editable installs for python[all] and python[diffusion] are already done
/root/.cache is mounted as the cache path
Infiniband paths are mounted into the container for RDMA-aware workflows: /sys/class/infiniband, /dev/infiniband, and /usr/sbin/show_gids

Hugging Face cache is already mounted, but do not assume HF_TOKEN is visible in every docker exec context. Interactive shells and non-interactive docker exec ... bash -lc "<cmd>" can behave differently. Always verify with echo ${HF_TOKEN:+set} before gated-model or Hub-backed runs.

Quick Start

Check the host, container, and GPU state.

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

Enter the container and repo.

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}

If HF_TOKEN is unexpectedly missing in the current shell, export it manually before Hub-backed workflows:

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

For non-interactive docker exec ... bash -lc "<cmd>" runs, prefer exporting both variables inside the command itself instead of assuming the shell startup path will populate them.

Pick a free GPU.

Use a GPU with 0 utilization and only a few MiB allocated. Set CUDA_VISIBLE_DEVICES=<gpu_id> for every GPU-backed validation command.

This host currently does not provide the kill-idle helper.

Do not assume you can reclaim other users' idle allocations automatically. If the free GPU list is tight, re-check nvidia-smi, choose another GPU, or coordinate before proceeding.

If the container is not running, start it first.

ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

Inspect the default repo before editing it.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'

Fast-forward /sgl-workspace/sglang to the latest clean main before creating any validation worktree.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

Avoid writing directly into /sgl-workspace/sglang when it is dirty or when the local snapshot differs from the remote HEAD.
Prefer one of these isolation strategies.

Create a detached worktree for remote-only experiments:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

Stream the exact local working tree into the container when validating the current local snapshot:

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
  ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

Use the streamed copy when the goal is "validate exactly what is in the local repo right now". For patch-oriented remote validation, another good option is:

update remote main
create a detached worktree from that clean commit
stream or apply a focused local patch diff into the worktree only

That keeps /sgl-workspace/sglang clean while still validating the exact local delta.

Validation Workflow

Start with import or syntax-level checks.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

For diffusion-specific edits, prefer a narrower first pass:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

Run targeted tests for the changed area.

ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'

For diffusion changes, start with the fused modulation regression:

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/diffusion/test_qwen_image_modulation.py -q"'

For GPU-backed changes, pin a free GPU explicitly.

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'

For kernel-heavy diffusion work, run a targeted smoke script for the changed primitives before attempting a model-level run.

Cover at least these when relevant:

rms_norm_fn
RMSNorm under torch.compile
norm_infer
apply_rotary_embedding

Pipe the script through docker exec -i ... python for pure kernel smoke.

Use a real .py file with if __name__ == "__main__": when calling DiffGenerator.from_pretrained(..., local_mode=True) or any flow that relies on multiprocessing.spawn.

multiprocessing.spawn will fail if the script is executed from stdin or from unguarded top-level code.

Attempt model-level or server-level smoke only after unit, kernel, or targeted regression checks pass.

Treat checkpoint, dependency, and environment failures separately from code regressions. If a workflow reads from Hugging Face Hub, verify HF_TOKEN first and re-export it explicitly in the current shell or command when needed.

Torch Compile Attribution

When a benchmark compares eager vs torch.compile, do not stop at the speedup number. Capture matching eager and compile perf dumps or profile dirs. Compare structured perf dumps with python python/sglang/multimodal_gen/benchmarks/compare_perf.py eager.json compile.json, then use llm-torch-profiler-analysis on the matching profile dirs to explain whether the gain came from fewer launches, fewer copies, or fused kernels replacing eager ATen ops.

Cleanup

Remove temporary validation directories when finished.

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'

BBuf/h100

skills/h100/SKILL.md

SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/sgl-workspace/sglang`, and use the ready H100 remote environment for SGLang development and validation. Use when a task needs remote CUDA work, GPU-backed smoke tests, diffusion checks, or a safe remote copy instead of local-only execution.

156 stars

development

Updated Apr 29, 2026

$ install --global

skillsauth

npx skillsauth add BBuf/AI-Infra-Auto-Driven-SKILLS h100

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 29, 2026, 3:32 AM335.6s1 file scanned

SKILL.md

name:: h100
description:: SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/sgl-workspace/sglang`, and use the ready H100 remote environment for SGLang development and validation. Use when a task needs remote CUDA work, GPU-backed smoke tests, diffusion checks, or a safe remote copy instead of local-only execution.

H100

Overview

This environment is already prepared:

sglang_bbuf is running on lmsysorg/sglang:dev
the repo is cloned at /sgl-workspace/sglang
editable installs for python[all] and python[diffusion] are already done
/root/.cache is mounted as the cache path
Infiniband paths are mounted into the container for RDMA-aware workflows: /sys/class/infiniband, /dev/infiniband, and /usr/sbin/show_gids

Quick Start

Check the host, container, and GPU state.

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

Enter the container and repo.

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}

If HF_TOKEN is unexpectedly missing in the current shell, export it manually before Hub-backed workflows:

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

For non-interactive docker exec ... bash -lc "<cmd>" runs, prefer exporting both variables inside the command itself instead of assuming the shell startup path will populate them.

Pick a free GPU.

Use a GPU with 0 utilization and only a few MiB allocated. Set CUDA_VISIBLE_DEVICES=<gpu_id> for every GPU-backed validation command.

This host currently does not provide the kill-idle helper.

Do not assume you can reclaim other users' idle allocations automatically. If the free GPU list is tight, re-check nvidia-smi, choose another GPU, or coordinate before proceeding.

If the container is not running, start it first.

ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

Inspect the default repo before editing it.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'

Fast-forward /sgl-workspace/sglang to the latest clean main before creating any validation worktree.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

Avoid writing directly into /sgl-workspace/sglang when it is dirty or when the local snapshot differs from the remote HEAD.
Prefer one of these isolation strategies.

Create a detached worktree for remote-only experiments:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

Stream the exact local working tree into the container when validating the current local snapshot:

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
  ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

Use the streamed copy when the goal is "validate exactly what is in the local repo right now". For patch-oriented remote validation, another good option is:

update remote main
create a detached worktree from that clean commit
stream or apply a focused local patch diff into the worktree only

That keeps /sgl-workspace/sglang clean while still validating the exact local delta.

Validation Workflow

Start with import or syntax-level checks.

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

For diffusion-specific edits, prefer a narrower first pass:

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

Run targeted tests for the changed area.

ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'

For diffusion changes, start with the fused modulation regression:

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/diffusion/test_qwen_image_modulation.py -q"'

For GPU-backed changes, pin a free GPU explicitly.

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'

For kernel-heavy diffusion work, run a targeted smoke script for the changed primitives before attempting a model-level run.

Cover at least these when relevant:

rms_norm_fn
RMSNorm under torch.compile
norm_infer
apply_rotary_embedding

Pipe the script through docker exec -i ... python for pure kernel smoke.

Use a real .py file with if __name__ == "__main__": when calling DiffGenerator.from_pretrained(..., local_mode=True) or any flow that relies on multiprocessing.spawn.

multiprocessing.spawn will fail if the script is executed from stdin or from unguarded top-level code.

Attempt model-level or server-level smoke only after unit, kernel, or targeted regression checks pass.

Torch Compile Attribution

Cleanup

Remove temporary validation directories when finished.

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'

Related Skills

BBuf/vllm-sota-humanize-loop

development

VerifiedTrustedCommunity

Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.

423SKILL.mdUpdated May 27, 2026

BBuf/vllm-sota-humanize-loop

BBuf/llm-pipeline-analysis

devops

VerifiedTrustedCommunity

Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.

423SKILL.mdUpdated May 21, 2026

BBuf/llm-pipeline-analysis

BBuf/sglang-sota-humanize-loop

development

VerifiedTrustedCommunity

Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.

423SKILL.mdUpdated May 17, 2026

BBuf/sglang-sota-humanize-loop

BBuf/model-pr-history-knowledge

documentation

VerifiedTrustedCommunity

Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.

423SKILL.mdUpdated May 17, 2026

BBuf/model-pr-history-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS.git

# Copy into Claude Code skills folder (global)
cp -r AI-Infra-Auto-Driven-SKILLS/skills/h100 ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

BBuf/AI-Infra-Auto-Driven-SKILLS

156 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT