Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

shaun-z/training-check

Name: training-check
Author: shaun-z

skills/skills-codex/training-check/SKILL.md

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep training-check

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Training Check

Periodically read WandB metrics during training to catch problems early. Do not wait until training finishes to discover it was a waste of GPU time.

Context: $ARGUMENTS

Constants

WANDB_RUN - Read from project notes or pass as entity/project/run_id.
CHECK_INTERVAL - Starts at 10 minutes, then gradually increases if consistently healthy: 10 min -> 20 min -> 30 min -> 60 min (cap).
REVIEWER_MODEL = gpt-5.4 - Used via a secondary Codex agent for ambiguous cases only.

When to Use

After training is confirmed running (session alive, loss decreasing for the first few steps)
When the user wants recurring health checks during training
This skill checks training QUALITY, not process HEALTH. Process health (session alive, GPU utilization) belongs to watchdog-style monitoring.

Workflow

Step 1: Read WandB Metrics

import wandb
api = wandb.Api()
run = api.run("<entity>/<project>/<run_id>")
history = run.history()

If WandB is unreachable (API error, network issue), fall back to reading the log file directly via SSH:

ssh server "tail -100 /path/to/training.log"

Check these signals:

Loss trend - Is training loss decreasing over the last N steps?
Eval metrics - Are evaluation metrics improving (or at least not degrading)?
NaN / Inf - Any NaN or Inf values in loss or gradients?
Spikes - Sudden large jumps in loss (>10x normal variance)?
Learning rate - Is the schedule behaving as expected?
Gradient norm - Exploding or vanishing?

Step 2: Judgment

| Signal | Judgment | Action | |--------|----------|--------| | NaN/Inf in loss | Clearly bad | Stop training, investigate | | Loss diverging (increasing for >N steps) | Clearly bad | Stop training, investigate | | Eval metrics significantly worse than baseline | Clearly bad | Stop training, investigate | | Loss decreasing, metrics improving | Clearly fine | Continue, increase check interval | | Loss flat but not diverging | Unsure | -> Step 3 (secondary review) | | Metrics noisy, can't tell trend | Unsure | -> Step 3 (secondary review) | | Slightly worse than baseline but still early | Unsure | -> Step 3 (secondary review) |

Step 3: Secondary Codex Judgment (only when unsure)

Only escalate when the signal is ambiguous. For clearly good or clearly bad signals, act directly.

spawn_agent:
  model: REVIEWER_MODEL
  reasoning_effort: high
  message: |
    TRAINING HEALTH CHECK - need your judgment on ambiguous metrics.

    Run: <entity>/<project>/<run_id>
    Current epoch/step: X / Y total
    Training loss (last 10 checkpoints): [values]
    Eval metrics (last 3 evals): [values]
    Baseline reference: [numbers from paper/reproduction]

    What I'm unsure about: [specific concern]

    Please respond with exactly one of:
    - STOP: clearly problematic, should kill training
    - CONTINUE: looks fine, check again next interval
    - WAIT: not enough data to judge, check again sooner

If delegation is unavailable, make a local judgment using the same rubric and mark the decision [pending external review]. In ambiguous cases with no hard failure, prefer WAIT over STOP.

Step 4: Act

| Decision | Action | |----------|--------| | Stop | Kill the training session. Save the WandB run URL, key metrics, and reason for stopping. Log to project notes for debugging. | | Continue | Do nothing. Re-run at the next interval (increase interval if consistently healthy). | | Wait | Do nothing but keep the current short interval (do not increase). |

Integration with Watchdog

training-check and watchdog-style monitoring operate at different levels:

| Layer | Tool | What it checks | Frequency | |-------|------|----------------|-----------| | Process health | watchdog | Session alive? GPU active? | Every 60s (continuous) | | Training quality | training-check | Loss trend? Metrics improving? | Every 10-60 min (periodic) |

Use both together:

Watchdog catches crashes and idle GPUs immediately
training-check catches subtle quality issues (loss plateau, metric degradation)

Rules

Do not stop training on the first sign of noise - some loss spikes are normal. Look at trends over multiple checkpoints.
When stopping training, always save the WandB run URL and key metrics as evidence.
If both WandB and log files are unreachable, report the connectivity issue and try again next interval. Do not assume training is broken.
Gradually increase check interval when healthy (10 -> 20 -> 30 -> 60 min). Reset to 10 min after any anomaly.
This skill is meant to be automated via a recurring scheduler. If the user wants ongoing monitoring, set up the best local mechanism available instead of waiting for manual reruns.

Recurring Setup Example

After training is confirmed stable:
  Create a recurring job (cron, task scheduler, tmux loop, etc.)
  that runs `/training-check <entity>/<project>/<run_id>` every 10 minutes.

As the check interval increases, update the old recurring job to match the new interval.

shaun-z/training-check

skills/skills-codex/training-check/SKILL.md

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep training-check

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 28, 2026, 7:25 AM36.5s1 file scanned

SKILL.md

name:: training-check
description:: Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.
allowed-tools:: Bash(*), Read, Grep, Glob, Write, Edit, Agent

Training Check

Periodically read WandB metrics during training to catch problems early. Do not wait until training finishes to discover it was a waste of GPU time.

Context: $ARGUMENTS

Constants

WANDB_RUN - Read from project notes or pass as entity/project/run_id.
CHECK_INTERVAL - Starts at 10 minutes, then gradually increases if consistently healthy: 10 min -> 20 min -> 30 min -> 60 min (cap).
REVIEWER_MODEL = gpt-5.4 - Used via a secondary Codex agent for ambiguous cases only.

When to Use

After training is confirmed running (session alive, loss decreasing for the first few steps)
When the user wants recurring health checks during training
This skill checks training QUALITY, not process HEALTH. Process health (session alive, GPU utilization) belongs to watchdog-style monitoring.

Workflow

Step 1: Read WandB Metrics

import wandb
api = wandb.Api()
run = api.run("<entity>/<project>/<run_id>")
history = run.history()

If WandB is unreachable (API error, network issue), fall back to reading the log file directly via SSH:

ssh server "tail -100 /path/to/training.log"

Check these signals:

Loss trend - Is training loss decreasing over the last N steps?
Eval metrics - Are evaluation metrics improving (or at least not degrading)?
NaN / Inf - Any NaN or Inf values in loss or gradients?
Spikes - Sudden large jumps in loss (>10x normal variance)?
Learning rate - Is the schedule behaving as expected?
Gradient norm - Exploding or vanishing?

Step 2: Judgment

Step 3: Secondary Codex Judgment (only when unsure)

Only escalate when the signal is ambiguous. For clearly good or clearly bad signals, act directly.

spawn_agent:
  model: REVIEWER_MODEL
  reasoning_effort: high
  message: |
    TRAINING HEALTH CHECK - need your judgment on ambiguous metrics.

    Run: <entity>/<project>/<run_id>
    Current epoch/step: X / Y total
    Training loss (last 10 checkpoints): [values]
    Eval metrics (last 3 evals): [values]
    Baseline reference: [numbers from paper/reproduction]

    What I'm unsure about: [specific concern]

    Please respond with exactly one of:
    - STOP: clearly problematic, should kill training
    - CONTINUE: looks fine, check again next interval
    - WAIT: not enough data to judge, check again sooner

If delegation is unavailable, make a local judgment using the same rubric and mark the decision [pending external review]. In ambiguous cases with no hard failure, prefer WAIT over STOP.

Step 4: Act

Integration with Watchdog

training-check and watchdog-style monitoring operate at different levels:

Use both together:

Watchdog catches crashes and idle GPUs immediately
training-check catches subtle quality issues (loss plateau, metric degradation)

Rules

Do not stop training on the first sign of noise - some loss spikes are normal. Look at trends over multiple checkpoints.
When stopping training, always save the WandB run URL and key metrics as evidence.
If both WandB and log files are unreachable, report the connectivity issue and try again next interval. Do not assume training is broken.
Gradually increase check interval when healthy (10 -> 20 -> 30 -> 60 min). Reset to 10 min after any anomaly.
This skill is meant to be automated via a recurring scheduler. If the user wants ongoing monitoring, set up the best local mechanism available instead of waiting for manual reruns.

Recurring Setup Example

After training is confirmed stable:
  Create a recurring job (cron, task scheduler, tmux loop, etc.)
  that runs `/training-check <entity>/<project>/<run_id>` every 10 minutes.

As the check interval increases, update the old recurring job to match the new interval.

Related Skills

shaun-z/paper-illustration-image2

development

VerifiedTrustedCommunity

Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.

SKILL.mdUpdated Apr 25, 2026

shaun-z/paper-illustration-image2

shaun-z/overleaf-sync

development

VerifiedTrustedCommunity

Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.

SKILL.mdUpdated Apr 25, 2026

shaun-z/overleaf-sync

shaun-z/citation-audit

development

VerifiedTrustedCommunity

Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.

SKILL.mdUpdated Apr 20, 2026

shaun-z/citation-audit

shaun-z/writing-systems-papers

data-ai

VerifiedTrustedCommunity

Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.

SKILL.mdUpdated Apr 17, 2026

shaun-z/writing-systems-papers

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/shaun-z/auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r auto-claude-code-research-in-sleep/skills/skills-codex/training-check ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

shaun-z/auto-claude-code-research-in-sleep

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT