Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

AMD-AGI/performance-analysis

Name: performance-analysis
Author: AMD-AGI

skills/performance-analysis/SKILL.md

npx skillsauth add AMD-AGI/maxtext-slurm performance-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

MaxText Performance Analysis

Post-training (or mid-training) analysis pipeline. Follow the workflow below from top to bottom.

Multi-job comparisons: If comparing two or more jobs (e.g., "why is job B slower than job A?"), start with skills/tsdb-diagnosis/SKILL.md (Multi-Job Comparison workflow) before running TraceLens. The TSDB reveals system-level root causes — CPU contention from RCCL resource leaks, network errors, I/O pressure, thermal throttling — that TraceLens cannot observe (it only sees GPU-side kernel timings). Only proceed to TraceLens here if the TSDB comparison is inconclusive.

Deep per-kernel analysis: When the user asks for per-kernel time breakdowns, step-time composition tables, cross-variant kernel comparisons, or whether a specific kernel is main-stream-blocking — switch to skills/profile-drill/SKILL.md. TraceLens's kernel_launchers_summary_by_category.csv has a known ~1.5×–2× inflation bug on 1-node/proc profiles (the time ms per gpu column divides by host count, not GPU count). profile-drill uses utils/profile_drill.py to read the raw xplane trace JSONs directly and avoids this bias.

Workflow

Step 1: Run the dispatcher

python3 utils/analyze_job.py "$JOB_WORKSPACE/<job>.log"
python3 utils/analyze_job.py "$JOB_WORKSPACE/<job_dir>/"
python3 utils/analyze_job.py "$JOB_WORKSPACE/local_2026*"

For running jobs, pass -f to force re-analysis (bypasses staleness check):

python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"

The dispatcher auto-detects available artifacts and runs only the relevant tools:

Log with TGS data → tgs_tagger.py
*.xplane.pb → TraceLens_generate_perf_report_jax
xla_dump/*.gpu_after_optimizations.txt → IRLens_analyze_hlo_ir.py

Step 2: Handle TraceLens if needed

If the dispatcher output says "TraceLens not installed" and xplane traces exist:

Check if TraceLens is already installed and patched before doing anything:
```
python3 -c "
import TraceLens.util, inspect
src = inspect.getsource(TraceLens.util.DataLoader.load_data)
assert 'xprof' in src, 'not patched'
print('TraceLens: installed and patched')
"
```
- Succeeds → TraceLens is ready. Just re-run: python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"
- ImportError → not installed. Install then patch (see below).
- AssertionError → installed but unpatched. Patch only (see below).

Install (only if import failed):

pip install git+https://github.com/AMD-AGI/TraceLens.git

Patch (only if the xprof assertion failed). Apply all patches from tracelens-patches.md — 6 files, ~13 patches. Key fixes:
- protobuf/xprof import errors (TF 2.19+ renamed tensorboard_plugin_profile to xprof)
- GPU PID remapping (xprof remaps device PIDs to 1001+; code filtering pid < 100 misses all GPU events)
- metadata_events not passed to build_tree()
- KeyError on gpu_kernel_op_cat and missing parent events for launch latency

Re-run the dispatcher with -f:

python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"

This is one-time per environment. Always check before patching to avoid redundant work.

Step 3: Read results

Read the generated analysis.json — but do NOT try to read the raw file (it can be 40K+ lines due to per-step arrays). Extract key metrics programmatically:

python3 -c "
import json, sys
with open('<job_dir>/analysis.json') as f:
    d = json.load(f)
print(f'Job: {d[\"job_id\"]} | Model: {d[\"model\"]} | Nodes: {d[\"num_nodes\"]} | Status: {d[\"job_status\"][\"status\"]}')
tgs = d['tgs']
print(f'Steady TGS: {tgs[\"steady\"][\"mean\"]:.1f} (std={tgs[\"steady\"][\"std\"]:.1f}, steps {tgs[\"steady\"][\"range\"]})')
print(f'Tail   TGS: {tgs[\"tail\"][\"mean\"]:.1f} (std={tgs[\"tail\"][\"std\"]:.1f}, steps {tgs[\"tail\"][\"range\"]})')
tl = d.get('tracelens_summary', {})
if tl:
    print(f'Compute: {tl[\"computation_time\"]:.1f}% | Exposed comm: {tl[\"exposed_comm_time\"]:.1f}% | Idle: {tl[\"idle_time\"]:.2f}% | Total comm: {tl[\"total_comm_time\"]:.1f}%')
"

For deeper TraceLens analysis, read the CSVs in <job_dir>/tracelens/<timestamp>/csvs/:

gpu_events_averages.csv — per-GPU compute/comm/idle breakdown (averages)
gpu_timeline.csv — per-GPU breakdown with pid
kernel_launchers_summary_by_category.csv — time by kernel category (GEMM, NCCL, XLA fusions, etc.)
kernel_launchers_summary.csv — time by individual kernel name

⚠️ TraceLens per-GPU CSV bias on 1-node/proc. The time ms per gpu column in the two kernel_launchers_summary*.csv files divides total kernel time by host count (typically 8), not GPU count (typically 64) — so per-GPU numbers are ~1.5×–2× inflated on 1-node/proc profiles. Percentages and category rankings are fine; absolute per-GPU kernel times are not. For kernel-time numbers you can cite (e.g. in a report or step-time composition table), use skills/profile-drill/SKILL.md instead — it reads raw xplane trace JSONs and divides by auto-detected GPUs.

Step 4: Summarize findings

Present results using this structure:

| Metric | Source | What to look for | |--------|--------|------------------| | TGS (steady-state) | analysis.json → tgs.steady | Primary throughput metric | | MFU | analysis.json → mfu_per_step | Model FLOPS utilization (if available) | | GPU compute % | tracelens_summary.computation_time | Time on actual compute kernels | | Exposed comm % | tracelens_summary.exposed_comm_time | Communication NOT overlapped with compute (lower is better) | | Idle % | tracelens_summary.idle_time | GPU doing nothing (should be near 0) | | Kernel breakdown | kernel_launchers_summary_by_category.csv | GEMM vs NCCL vs fusion time | | Comm ops per step | dispatcher IRLens output | Count of all-reduce, all-gather, all-to-all, reduce-scatter |

Interpretation guidelines:

High exposed comm % → opportunities for better comm/compute overlap
Large per-GPU variance in compute % → load imbalance
High idle % → scheduling or synchronization issues
Tail TGS std much larger than steady std → periodic overhead (checkpointing, profiling)

Step 5: Ensure dashboard is running

Check the dispatcher output first — it prints a Dashboard: line at the end. If it shows a URL with (running), use that URL.

If the dashboard is not running, start it:

pip install fastapi uvicorn   # one-time
utils/perf_server.py &        # binds 127.0.0.1 by default

Always tell the user the dashboard URL: http://localhost:<PORT>. For remote access, instruct them to tunnel: ssh -L <PORT>:localhost:<PORT> user@host. Avoid --host 0.0.0.0 — perf_server.py has no auth.

The server auto-detects a free port starting from 8080 and auto-reloads analysis.json on each request.

Reference

Job output layout

<JOB_WORKSPACE>/<JOB_ID>-<JOB_NAME>[-TGS_<VALUE>]/
  log -> ../<log_file>                          # symlink to log file
  analysis.json                                 # structured metrics
  xla_dump/                                     # if _env_ENABLE_XLA_DUMP=1
    module_NNNN.jit_train_step.*_gpu_after_optimizations.txt
  <run_name>/tensorboard/plugins/profile/<ts>/  # if profiler=xplane
    <hostname>.xplane.pb                        #   1-node/proc: one per host
  <run_name>/tensorboard/plugins/profile/<ts_i>/ # 1-GPU/proc (LOCAL_WORLD_SIZE ts dirs,
    <hostname>.proc<N>.xplane.pb                #   one file per host per ts;
                                                #   successive serialized writes land
                                                #   in different per-second ts dirs)
  tracelens/<ts>/csvs/*.csv                     # 1-node/proc: TraceLens output
  tracelens/<ts_i>/<hostname>.proc<N>/csvs/*.csv # 1-GPU/proc: one dir per GPU

The .log file sits alongside the directory in <JOB_WORKSPACE>/.

When enable_checkpointing=true, profiler traces may end up in a shared directory outside the job dir. analyze_job.py parses Config param tensorboard_dir from the log to locate these. The dispatcher and perf_server.py filter profiles by job execution time window and node-0 hostname to disambiguate. In 1-GPU-per-process mode the node-0 filter name.startswith("<host>.") still matches all <host>.proc<N>.xplane.pb files, so TraceLens runs once per GPU on node 0; the multiple timestamp dirs (one per serialized write) are treated like periodic-profiling windows by the existing code.

Running individual tools directly

These are rarely needed — analyze_job.py orchestrates them. Use only for targeted re-runs.

# TGS tagging
utils/tag_tgs.sh <log_file_or_glob>
utils/tag_tgs.sh -f <log_file>       # force on running job

# IRLens
utils/IRLens_analyze_hlo_ir.py <hlo_file>
utils/IRLens_analyze_hlo_ir.py <hlo_file> --op communication
utils/IRLens_analyze_hlo_ir.py <hlo_file> --op computation

# TraceLens
TraceLens_generate_perf_report_jax \
    --profile_path <xplane.pb> \
    --output_csvs_dir <output_dir>/csvs

# profile_drill.py — direct per-kernel analysis from trace JSONs
# (use when TraceLens's per-GPU numbers are suspect or you need kernel-level
# ground truth; see skills/profile-drill/SKILL.md)
utils/profile_drill.py <job_dir>/.../tensorboard/plugins/profile/*/*.trace.json.gz

`RAY=1` Slurm log truncation

For RAY=1 jobs, the Slurm log may contain fewer training steps than actually completed due to Ray output buffering (actor stdout is forwarded asynchronously to the driver, and unflushed output is lost when the job exits). If the analysis shows suspiciously few steps (e.g., 34 out of 100) with no error or JOB SUMMARY, check ray_logs/<head_node>/worker*.out in the job directory for the authoritative step count. The analysis.json TGS/MFU metrics will be based only on what appears in the Slurm log and may undercount the actual run.

Running jobs

The dispatcher detects running jobs via the JOB SUMMARY log marker and file modification time (15 min threshold).
analyze_job.py -f bypasses the staleness check but never renames files for running jobs. Renames happen automatically on the next analysis after the job finishes.
TraceLens needs a completed profiler trace; skipped if *.xplane.pb doesn't exist yet.
IRLens works on running jobs if xla_dump/ is already populated.

AMD-AGI/performance-analysis

skills/performance-analysis/SKILL.md

Analyze MaxText training job performance using tgs_tagger, TraceLens, and IRLens. Use when the user asks to analyze a training run, profile traces, HLO IR, TGS metrics, GPU utilization, or mentions tag_tgs, TraceLens, IRLens, xplane, or performance analysis.

27 stars

data-ai

Updated May 4, 2026

$ install --global

skillsauth

npx skillsauth add AMD-AGI/maxtext-slurm performance-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 4, 2026, 2:48 AM15.1s1 file scanned

SKILL.md

name:: performance-analysis
description:: Analyze MaxText training job performance using tgs_tagger, TraceLens, and IRLens. Use when the user asks to analyze a training run, profile traces, HLO IR, TGS metrics, GPU utilization, or mentions tag_tgs, TraceLens, IRLens, xplane, or performance analysis.

MaxText Performance Analysis

Post-training (or mid-training) analysis pipeline. Follow the workflow below from top to bottom.

Workflow

Step 1: Run the dispatcher

python3 utils/analyze_job.py "$JOB_WORKSPACE/<job>.log"
python3 utils/analyze_job.py "$JOB_WORKSPACE/<job_dir>/"
python3 utils/analyze_job.py "$JOB_WORKSPACE/local_2026*"

For running jobs, pass -f to force re-analysis (bypasses staleness check):

python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"

The dispatcher auto-detects available artifacts and runs only the relevant tools:

Log with TGS data → tgs_tagger.py
*.xplane.pb → TraceLens_generate_perf_report_jax
xla_dump/*.gpu_after_optimizations.txt → IRLens_analyze_hlo_ir.py

Step 2: Handle TraceLens if needed

If the dispatcher output says "TraceLens not installed" and xplane traces exist:

Check if TraceLens is already installed and patched before doing anything:
```
python3 -c "
import TraceLens.util, inspect
src = inspect.getsource(TraceLens.util.DataLoader.load_data)
assert 'xprof' in src, 'not patched'
print('TraceLens: installed and patched')
"
```
- Succeeds → TraceLens is ready. Just re-run: python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"
- ImportError → not installed. Install then patch (see below).
- AssertionError → installed but unpatched. Patch only (see below).

Install (only if import failed):

pip install git+https://github.com/AMD-AGI/TraceLens.git

Patch (only if the xprof assertion failed). Apply all patches from tracelens-patches.md — 6 files, ~13 patches. Key fixes:
- protobuf/xprof import errors (TF 2.19+ renamed tensorboard_plugin_profile to xprof)
- GPU PID remapping (xprof remaps device PIDs to 1001+; code filtering pid < 100 misses all GPU events)
- metadata_events not passed to build_tree()
- KeyError on gpu_kernel_op_cat and missing parent events for launch latency

Re-run the dispatcher with -f:

python3 utils/analyze_job.py -f "$JOB_WORKSPACE/<job>.log"

This is one-time per environment. Always check before patching to avoid redundant work.

Step 3: Read results

Read the generated analysis.json — but do NOT try to read the raw file (it can be 40K+ lines due to per-step arrays). Extract key metrics programmatically:

python3 -c "
import json, sys
with open('<job_dir>/analysis.json') as f:
    d = json.load(f)
print(f'Job: {d[\"job_id\"]} | Model: {d[\"model\"]} | Nodes: {d[\"num_nodes\"]} | Status: {d[\"job_status\"][\"status\"]}')
tgs = d['tgs']
print(f'Steady TGS: {tgs[\"steady\"][\"mean\"]:.1f} (std={tgs[\"steady\"][\"std\"]:.1f}, steps {tgs[\"steady\"][\"range\"]})')
print(f'Tail   TGS: {tgs[\"tail\"][\"mean\"]:.1f} (std={tgs[\"tail\"][\"std\"]:.1f}, steps {tgs[\"tail\"][\"range\"]})')
tl = d.get('tracelens_summary', {})
if tl:
    print(f'Compute: {tl[\"computation_time\"]:.1f}% | Exposed comm: {tl[\"exposed_comm_time\"]:.1f}% | Idle: {tl[\"idle_time\"]:.2f}% | Total comm: {tl[\"total_comm_time\"]:.1f}%')
"

For deeper TraceLens analysis, read the CSVs in <job_dir>/tracelens/<timestamp>/csvs/:

gpu_events_averages.csv — per-GPU compute/comm/idle breakdown (averages)
gpu_timeline.csv — per-GPU breakdown with pid
kernel_launchers_summary_by_category.csv — time by kernel category (GEMM, NCCL, XLA fusions, etc.)
kernel_launchers_summary.csv — time by individual kernel name

⚠️ TraceLens per-GPU CSV bias on 1-node/proc. The time ms per gpu column in the two kernel_launchers_summary*.csv files divides total kernel time by host count (typically 8), not GPU count (typically 64) — so per-GPU numbers are ~1.5×–2× inflated on 1-node/proc profiles. Percentages and category rankings are fine; absolute per-GPU kernel times are not. For kernel-time numbers you can cite (e.g. in a report or step-time composition table), use skills/profile-drill/SKILL.md instead — it reads raw xplane trace JSONs and divides by auto-detected GPUs.

Step 4: Summarize findings

Present results using this structure:

Interpretation guidelines:

High exposed comm % → opportunities for better comm/compute overlap
Large per-GPU variance in compute % → load imbalance
High idle % → scheduling or synchronization issues
Tail TGS std much larger than steady std → periodic overhead (checkpointing, profiling)

Step 5: Ensure dashboard is running

Check the dispatcher output first — it prints a Dashboard: line at the end. If it shows a URL with (running), use that URL.

If the dashboard is not running, start it:

pip install fastapi uvicorn   # one-time
utils/perf_server.py &        # binds 127.0.0.1 by default

The server auto-detects a free port starting from 8080 and auto-reloads analysis.json on each request.

Reference

Job output layout

<JOB_WORKSPACE>/<JOB_ID>-<JOB_NAME>[-TGS_<VALUE>]/
  log -> ../<log_file>                          # symlink to log file
  analysis.json                                 # structured metrics
  xla_dump/                                     # if _env_ENABLE_XLA_DUMP=1
    module_NNNN.jit_train_step.*_gpu_after_optimizations.txt
  <run_name>/tensorboard/plugins/profile/<ts>/  # if profiler=xplane
    <hostname>.xplane.pb                        #   1-node/proc: one per host
  <run_name>/tensorboard/plugins/profile/<ts_i>/ # 1-GPU/proc (LOCAL_WORLD_SIZE ts dirs,
    <hostname>.proc<N>.xplane.pb                #   one file per host per ts;
                                                #   successive serialized writes land
                                                #   in different per-second ts dirs)
  tracelens/<ts>/csvs/*.csv                     # 1-node/proc: TraceLens output
  tracelens/<ts_i>/<hostname>.proc<N>/csvs/*.csv # 1-GPU/proc: one dir per GPU

The .log file sits alongside the directory in <JOB_WORKSPACE>/.

Running individual tools directly

These are rarely needed — analyze_job.py orchestrates them. Use only for targeted re-runs.

# TGS tagging
utils/tag_tgs.sh <log_file_or_glob>
utils/tag_tgs.sh -f <log_file>       # force on running job

# IRLens
utils/IRLens_analyze_hlo_ir.py <hlo_file>
utils/IRLens_analyze_hlo_ir.py <hlo_file> --op communication
utils/IRLens_analyze_hlo_ir.py <hlo_file> --op computation

# TraceLens
TraceLens_generate_perf_report_jax \
    --profile_path <xplane.pb> \
    --output_csvs_dir <output_dir>/csvs

# profile_drill.py — direct per-kernel analysis from trace JSONs
# (use when TraceLens's per-GPU numbers are suspect or you need kernel-level
# ground truth; see skills/profile-drill/SKILL.md)
utils/profile_drill.py <job_dir>/.../tensorboard/plugins/profile/*/*.trace.json.gz

`RAY=1` Slurm log truncation

Running jobs

The dispatcher detects running jobs via the JOB SUMMARY log marker and file modification time (15 min threshold).
analyze_job.py -f bypasses the staleness check but never renames files for running jobs. Renames happen automatically on the next analysis after the job finishes.
TraceLens needs a completed profiler trace; skipped if *.xplane.pb doesn't exist yet.
IRLens works on running jobs if xla_dump/ is already populated.

Related Skills

AMD-AGI/pre-commit-audit

tools

VerifiedTrustedCommunity

Comprehensive pre-commit verification checklist with five independent responsibilities. (1) Launcher path coverage - verify a change to any launcher-chain file preserves correct behavior across all 16 combinations of entry point × launch mode × stack (Steps 1-4 + 5.1). (2) Ancillary scripts smoke - syntax / help / read-only / caller checks for any `.sh` or `.py` outside the launcher chain (Step 5.2; covers analysis utilities, sourced libraries, debug helpers, sweep tooling). (3) Code quality and design review (Step 6) - propose-first surface of code smells (duplication, long functions, magic numbers, deep nesting, unclear naming, primitive obsession, etc.) and design-decay signals (5th case in a switch, N-th env-var read, hand-rolled retry loops); auto-fix mechanical findings, hold design-shaped ones for explicit go-ahead. (4) Docs / comments / format-consistency (Step 7) - check any commit for stale prose, trailing-comment alignment drift, broken anchors / missing files in links, drifted cross-references, and this skill itself drifting from the code it describes. (5) Sensitive-info leak scan (Step 8) - cluster hostnames, internal IPs, vendor mount paths, hard-coded credentials, internal job IDs; final security gate. Trigger keywords - "verify all launcher paths", "trace launcher chain", "audit entry × launch × stack", "path coverage", "(entry × launch × stack) matrix", "post-launch teardown verification", "pre-commit audit", "before commit", "ready to commit", "verify scripts / utils not broken", "smoke-test the changed scripts", "any utility script broken", "code quality", "design review", "code smells", "tighten and polish", "avoid quality decay", "revisit design choice", "scrub leaks", "check for sensitive info before commit", "any docs or skills need update", "any stale comments", "any inaccurate comments", "comment alignment", "link policy", "broken anchors". Use when modifying `_train.sh`, `_train_with_ray.sh`, `_ray_actor.py`, `_container.sh`, `_job.sbatch`, `_k8s_job.sh`, `in_container_run.sh`, `run_local.sh`, `submit.sh`, `k8s_submit.sh`, `utils/run_setup.sh`, `utils/ray_cluster.sh`, `utils/monkey_patch_maxtext.py`, `utils/coredump.sh`, `utils/stage_timeout.sh`, or anywhere else in the launcher chain. Also use proactively before opening any PR (Steps 5.2, 6, 7, 8 apply universally to all changes that touch code / docs / comments), when investigating a path-specific bug ("this only happens in K8s + 1-gpu-per-process"), after adding a new entry point / launch mode / stack option, after touching any analysis utility (`utils/analyze_job.py`, `utils/perf_server.py`, `utils/profile_drill.py`, `utils/slurm_job_monitor.sh`, etc.), or after editing any doc or skill in the repo (Step 7 catches cross-reference drift).

27SKILL.mdUpdated May 10, 2026

AMD-AGI/pre-commit-audit

AMD-AGI/xla-tuning

testing

VerifiedTrustedCommunity

Find the XLA flag / NCCL env-var combination that maximizes steady-state TGS for one (model × parallelism) cell. Produces an evidence-backed leaderboard, mechanistic explanation of the winning flag, and a deployment recipe. Use when the user asks to tune XLA flags, tune NCCL, find best collective-permute / all-gather threshold, optimize FSDP/PP/TP, close a parallelism-vs-parallelism throughput gap, or sweep cross-iteration prefetch / overlap-limit / async-stream-priority knobs for a specific model.

27SKILL.mdUpdated May 3, 2026

AMD-AGI/tsdb-diagnosis

testing

VerifiedTrustedCommunity

Diagnose training job incidents and check cluster health using the per-job Prometheus TSDB. Use when the user asks to diagnose a failure root cause, check GPU/network health, query Prometheus metrics, investigate a hang, or when the triage skill recommends deeper TSDB analysis.

27SKILL.mdUpdated Apr 23, 2026

AMD-AGI/tsdb-diagnosis

AMD-AGI/telegram

testing

VerifiedTrustedCommunity

Use Telegram as the agent's I/O channel. Once triggered, the agent enters a REPL state — reading instructions from TG, executing them, printing results back to TG, and looping. Use when the user asks to be notified, messaged, or alerted via Telegram, or wants to interact with the agent through TG. This is a cross-cutting skill — other skills (batch-sweep, model-config, job-triage) can trigger it when the user explicitly requests it.

27SKILL.mdUpdated Apr 23, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/AMD-AGI/maxtext-slurm.git

# Copy into Claude Code skills folder (global)
cp -r maxtext-slurm/skills/performance-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

AMD-AGI/maxtext-slurm

27 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

AMD-AGI/performance-analysis

$ install --global

Security Scan Results

SKILL.md

MaxText Performance Analysis

Workflow

Step 1: Run the dispatcher

Step 2: Handle TraceLens if needed

Step 3: Read results

Step 4: Summarize findings

Step 5: Ensure dashboard is running

Reference

Job output layout

Running individual tools directly

RAY=1 Slurm log truncation

Running jobs

Related Skills

AMD-AGI/pre-commit-audit

AMD-AGI/xla-tuning

AMD-AGI/tsdb-diagnosis

AMD-AGI/telegram

AMD-AGI/performance-analysis

$ install --global

Security Scan Results

SKILL.md

MaxText Performance Analysis

Workflow

Step 1: Run the dispatcher

Step 2: Handle TraceLens if needed

Step 3: Read results

Step 4: Summarize findings

Step 5: Ensure dashboard is running

Reference

Job output layout

Running individual tools directly

RAY=1 Slurm log truncation

Running jobs

Related Skills

AMD-AGI/pre-commit-audit

AMD-AGI/xla-tuning

AMD-AGI/tsdb-diagnosis

AMD-AGI/telegram

`RAY=1` Slurm log truncation

`RAY=1` Slurm log truncation