Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

amdpilot-org/amd-ci-test-bisect

Name: amd-ci-test-bisect
Author: amdpilot-org

amd-ci-test-bisect/SKILL.md

npx skillsauth add amdpilot-org/amd-skills amd-ci-test-bisect

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

AMD CI Test / Bisect

This skill is the AMD/ROCm adaptation of SGLang's write-sglang-test, ci-workflow-guide, and sglang-bisect-ci-regression skills.

Use it for four things:

add a regression test for an AMD-only or ROCm-sensitive fix
register that test with register_amd_ci(...)
reproduce the AMD CI job locally through the upstream scripts/ci/amd/* flow
bisect an AMD CI regression across MI325 / MI35x runners, ROCm versions, and image tags

Read references/suite-matrix.md when you need exact suite/runner mappings or local reproduction commands.

Read references/bisect-playbook.md when you are debugging a failing AMD CI run on main or nightly.

Core Rules

register_amd_ci is AST-parsed. Keep est_time, suite, nightly, and disabled as module-level literal constants. Do not hide them behind helper functions, variables, or computed expressions.
Only add AMD registration when AMD coverage is the point. Use register_amd_ci(...) for ROCm-only kernels, HIP/aiter paths, MI35x/gfx950 behavior, RCCL/distributed AMD paths, or an AMD regression fix. Do not duplicate backend-independent tests onto AMD just because AMD exists.
Choose the lightest AMD suite that proves the fix. Prefer 1-GPU MI325 first. Escalate to MI35x only when the failure depends on gfx950 / MI355X / ROCm 7.2 image behavior. Escalate to 2/4/8 GPU only when the bug requires distributed state, DP/TP, or model scale.
MI325 and MI35x are different validation targets. In upstream AMD CI, MI325-class runners are generally normalized onto mi30x images. MI35x runners use MI35x images and amd_ci_exec.sh injects GPU_ARCHS=gfx950. Treat them as separate contracts, not interchangeable hardware.
Reproduce with the AMD CI scripts, not ad-hoc shell state. Use ensure_vram_clear.sh, amd_ci_start_container.sh, amd_ci_install_dependency.sh, and amd_ci_exec.sh. That is the closest match to what GitHub Actions actually runs.
For regressions, separate code regressions from runner/image drift. Always record runner label, GPU family, ROCm version, and container image tag before blaming a commit.

Workflow

Phase 1: Classify the fix

Ask:

Is the bug AMD-only, or just observed first on AMD?
Does it depend on is_in_amd_ci(), HIP kernels, aiter, RCCL, or ROCm image contents?
Is it MI35x/gfx950-specific, or should MI325 coverage be enough?
Does it need 1 GPU, 2 GPU, 4 GPU, or 8 GPU to reproduce?

If the answer is "common logic, no AMD-specific path", keep the test on CPU/CUDA and do not add AMD registration just for symmetry.

Phase 2: Add the test

Default SGLang authoring rules still apply:

use CustomTestCase
make tearDownClass defensive
prefer mocks/unit tests when a server is unnecessary
place CI-discovered tests under test/registered/**

For AMD-specific behavior, the common pattern is:

from sglang.test.ci.ci_register import register_amd_ci

register_amd_ci(est_time=120, suite="stage-b-test-1-gpu-small-amd")

When the same file should run on both CUDA and AMD, register both explicitly and gate behavior inside the test body with is_in_amd_ci().

Use disabled="reason" instead of deleting coverage when a suite is temporarily too expensive or unstable.

Phase 3: Pick the suite

Use references/suite-matrix.md.

Practical default:

1-GPU ROCm correctness/kernel issue: stage-a-test-1-gpu-small-amd or stage-b-test-1-gpu-small-amd
1-GPU heavy model / memory issue on AMD: stage-b-test-1-gpu-large-amd
MI355X/gfx950-specific issue: stage-b-test-1-gpu-small-amd-mi35x or stage-c-test-large-8-gpu-amd-mi35x
distributed AMD issue: stage-b-test-2-gpu-large-amd or stage-c-test-4-gpu-amd
long model/eval coverage: nightly AMD suites

Phase 4: Reproduce locally

From a SGLang checkout:

bash scripts/ci/amd/ensure_vram_clear.sh
bash scripts/ci/amd/amd_ci_start_container.sh --rocm-version rocm720
bash scripts/ci/amd/amd_ci_install_dependency.sh
bash scripts/ci/amd/amd_ci_exec.sh -w "/sglang-checkout/test" \
  python3 run_suite.py --hw amd --suite stage-b-test-1-gpu-small-amd

Notes:

amd_ci_start_container.sh auto-detects MI325/MI35x from the runner hostname.
MI325/MI300 runners collapse to mi30x images; MI35x keeps mi35x.
amd_ci_exec.sh auto-adds SGLANG_IS_IN_CI_AMD=1, SGLANG_USE_AITER=1, and on MI35x also GPU_ARCHS=gfx950.
For disaggregated tests, use amd_ci_start_container_disagg.sh.

Phase 5: Bisect regressions

Use references/bisect-playbook.md.

The AMD twist is important:

pr-test-amd.yml does push / PR / rerun-stage coverage, not scheduled cron
nightly-test-amd-rocm720.yml is the scheduled source of truth for long AMD coverage

So:

use push runs on main for per-commit AMD regressions
use scheduled nightly AMD runs for large-model / MI35x / long-running regressions

Output Requirements

When you finish an AMD test / CI / bisect task, report:

whether the fix is AMD-only or cross-backend
the chosen register_amd_ci suite and why
whether validation target is MI325/mi30x or MI35x/gfx950
the exact local reproduction command
if bisecting, whether the root cause is:
- code regression
- runner/hardware-specific
- image / ROCm version drift
- flaky / nondeterministic

Minimal Checklist

test added or updated in the right folder
register_amd_ci(...) literals are valid for AST parsing
suite matches the real hardware requirement
local AMD CI reproduction command was documented
any MI35x-only assumption is stated explicitly
reviewer can tell whether this should also keep CUDA coverage

amdpilot-org/amd-ci-test-bisect

amd-ci-test-bisect/SKILL.md

Add or update AMD/ROCm SGLang regression tests, choose the right MI325/MI35x CI suite, reproduce AMD CI locally with the upstream amd_ci container scripts, and bisect AMD CI regressions on main/nightly. Use when an agent-generated fix needs register_amd_ci coverage, when selecting MI325 vs MI35x runners, when debugging pr-test-amd or nightly AMD failures, or when documenting the register_amd_ci flow for ROCm-specific fixes.

1 stars

development

Updated Apr 25, 2026

$ install --global

skillsauth

npx skillsauth add amdpilot-org/amd-skills amd-ci-test-bisect

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 5:14 AM52.6s3 files scanned

SKILL.md

name:: amd-ci-test-bisect
description:: >

AMD CI Test / Bisect

This skill is the AMD/ROCm adaptation of SGLang's write-sglang-test, ci-workflow-guide, and sglang-bisect-ci-regression skills.

Use it for four things:

add a regression test for an AMD-only or ROCm-sensitive fix
register that test with register_amd_ci(...)
reproduce the AMD CI job locally through the upstream scripts/ci/amd/* flow
bisect an AMD CI regression across MI325 / MI35x runners, ROCm versions, and image tags

Read references/suite-matrix.md when you need exact suite/runner mappings or local reproduction commands.

Read references/bisect-playbook.md when you are debugging a failing AMD CI run on main or nightly.

Core Rules

register_amd_ci is AST-parsed. Keep est_time, suite, nightly, and disabled as module-level literal constants. Do not hide them behind helper functions, variables, or computed expressions.
Only add AMD registration when AMD coverage is the point. Use register_amd_ci(...) for ROCm-only kernels, HIP/aiter paths, MI35x/gfx950 behavior, RCCL/distributed AMD paths, or an AMD regression fix. Do not duplicate backend-independent tests onto AMD just because AMD exists.
Choose the lightest AMD suite that proves the fix. Prefer 1-GPU MI325 first. Escalate to MI35x only when the failure depends on gfx950 / MI355X / ROCm 7.2 image behavior. Escalate to 2/4/8 GPU only when the bug requires distributed state, DP/TP, or model scale.
MI325 and MI35x are different validation targets. In upstream AMD CI, MI325-class runners are generally normalized onto mi30x images. MI35x runners use MI35x images and amd_ci_exec.sh injects GPU_ARCHS=gfx950. Treat them as separate contracts, not interchangeable hardware.
Reproduce with the AMD CI scripts, not ad-hoc shell state. Use ensure_vram_clear.sh, amd_ci_start_container.sh, amd_ci_install_dependency.sh, and amd_ci_exec.sh. That is the closest match to what GitHub Actions actually runs.
For regressions, separate code regressions from runner/image drift. Always record runner label, GPU family, ROCm version, and container image tag before blaming a commit.

Workflow

Phase 1: Classify the fix

Ask:

Is the bug AMD-only, or just observed first on AMD?
Does it depend on is_in_amd_ci(), HIP kernels, aiter, RCCL, or ROCm image contents?
Is it MI35x/gfx950-specific, or should MI325 coverage be enough?
Does it need 1 GPU, 2 GPU, 4 GPU, or 8 GPU to reproduce?

If the answer is "common logic, no AMD-specific path", keep the test on CPU/CUDA and do not add AMD registration just for symmetry.

Phase 2: Add the test

Default SGLang authoring rules still apply:

use CustomTestCase
make tearDownClass defensive
prefer mocks/unit tests when a server is unnecessary
place CI-discovered tests under test/registered/**

For AMD-specific behavior, the common pattern is:

from sglang.test.ci.ci_register import register_amd_ci

register_amd_ci(est_time=120, suite="stage-b-test-1-gpu-small-amd")

When the same file should run on both CUDA and AMD, register both explicitly and gate behavior inside the test body with is_in_amd_ci().

Use disabled="reason" instead of deleting coverage when a suite is temporarily too expensive or unstable.

Phase 3: Pick the suite

Use references/suite-matrix.md.

Practical default:

1-GPU ROCm correctness/kernel issue: stage-a-test-1-gpu-small-amd or stage-b-test-1-gpu-small-amd
1-GPU heavy model / memory issue on AMD: stage-b-test-1-gpu-large-amd
MI355X/gfx950-specific issue: stage-b-test-1-gpu-small-amd-mi35x or stage-c-test-large-8-gpu-amd-mi35x
distributed AMD issue: stage-b-test-2-gpu-large-amd or stage-c-test-4-gpu-amd
long model/eval coverage: nightly AMD suites

Phase 4: Reproduce locally

From a SGLang checkout:

bash scripts/ci/amd/ensure_vram_clear.sh
bash scripts/ci/amd/amd_ci_start_container.sh --rocm-version rocm720
bash scripts/ci/amd/amd_ci_install_dependency.sh
bash scripts/ci/amd/amd_ci_exec.sh -w "/sglang-checkout/test" \
  python3 run_suite.py --hw amd --suite stage-b-test-1-gpu-small-amd

Notes:

amd_ci_start_container.sh auto-detects MI325/MI35x from the runner hostname.
MI325/MI300 runners collapse to mi30x images; MI35x keeps mi35x.
amd_ci_exec.sh auto-adds SGLANG_IS_IN_CI_AMD=1, SGLANG_USE_AITER=1, and on MI35x also GPU_ARCHS=gfx950.
For disaggregated tests, use amd_ci_start_container_disagg.sh.

Phase 5: Bisect regressions

Use references/bisect-playbook.md.

The AMD twist is important:

pr-test-amd.yml does push / PR / rerun-stage coverage, not scheduled cron
nightly-test-amd-rocm720.yml is the scheduled source of truth for long AMD coverage

So:

use push runs on main for per-commit AMD regressions
use scheduled nightly AMD runs for large-model / MI35x / long-running regressions

Output Requirements

When you finish an AMD test / CI / bisect task, report:

whether the fix is AMD-only or cross-backend
the chosen register_amd_ci suite and why
whether validation target is MI325/mi30x or MI35x/gfx950
the exact local reproduction command
if bisecting, whether the root cause is:
- code regression
- runner/hardware-specific
- image / ROCm version drift
- flaky / nondeterministic

Minimal Checklist

test added or updated in the right folder
register_amd_ci(...) literals are valid for AST parsing
suite matches the real hardware requirement
local AMD CI reproduction command was documented
any MI35x-only assumption is stated explicitly
reviewer can tell whether this should also keep CUDA coverage

Related Skills

amdpilot-org/flydsl-kernel-authoring

development

VerifiedTrustedCommunity

FlyDSL is a Python DSL with MLIR-native backend for authoring custom AMD GPU kernels with explicit layout algebra (pre-installed at /opt/FlyDSL on images tagged *-flydsl:*). Use this skill when profiling identifies a hot per-row reduction (RMSNorm / LayerNorm / softmax), a fused elementwise chain (norm + residual add, activation + multiplier), or an unusual-shape grouped GEMM that the standard AMD backends (Triton / aiter / CK / hipBLASLt / TransformerEngine) don't serve well. Essential for any workload where Python/config/Triton-tuning gains have plateaued and the profile shows a custom kernel opportunity. Covers the `/opt/FlyDSL` availability check, the integration playbook (dispatcher + direct site-packages edit + autograd-safe output handling), kernel authoring patterns (elementwise via layout API, block reductions via wave_reduce_add, fused dx+dw designs, MFMA GEMM preshuffle), torchrun gotchas, and the critical rule that custom kernels typically only win end-to-end when stacked with `torch.compile(mode="default")`.

2SKILL.mdUpdated Apr 25, 2026

amdpilot-org/flydsl-kernel-authoring

amdpilot-org/skill-creator

tools

VerifiedTrustedCommunity

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

1SKILL.mdUpdated Apr 25, 2026

amdpilot-org/skill-creator

amdpilot-org/rocprofv3-profiler

data-ai

VerifiedTrustedCommunity

Profile AMD GPU kernels using rocprofv3 and analyze performance bottlenecks. Use when the user wants to profile HIP/ROCm kernels, identify GPU performance issues, analyze hardware counters, or understand why a kernel is slow on AMD GPUs (MI100, MI200, MI300 series). Provides wrapper scripts for rocprofv3 execution and automated parsing of profiler output into structured, agent-friendly JSON with bottleneck classification.

1SKILL.mdUpdated Apr 25, 2026

amdpilot-org/rocprofv3-profiler

amdpilot-org/rocm-profiler-analysis

testing

VerifiedTrustedCommunity

Analyze SGLang and vLLM profiler traces on AMD ROCm systems, especially MI355X/gfx950 nodes. Adapted from the SGLang torch-profiler workflow: triage kernel breakdown, overlap headroom, and fuse opportunities, then write structured artifacts that can be attached to amdpilot experiments, trials, and dashboard views. Use when a run needs profiling, when an optimization trial should produce machine-readable profiling artifacts, or when the user asks why a ROCm workload is slow.

1SKILL.mdUpdated Apr 25, 2026

amdpilot-org/rocm-profiler-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/amdpilot-org/amd-skills.git

# Copy into Claude Code skills folder (global)
cp -r amd-skills/amd-ci-test-bisect ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

amdpilot-org/amd-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT