Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wanshuiyin/monitor-experiment

Name: monitor-experiment
Author: wanshuiyin

skills/monitor-experiment/SKILL.md

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep monitor-experiment

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Error

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Monitor Experiment Results

⏱ External cadence is appropriate here. This skill waits on an external fact (job completion / progress), so it is a natural /loop / CronCreate surface: the wake reads status and self-judges only machine-checkable completion (exit code, file exists, epoch logged) — never quality. This is the additive external-wait shape in shared-references/external-cadence.md. If a scheduled wait here ends in a verdict step (e.g. then audit results), run that verdict once after the wait clears — not re-entered per tick.

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

SSH server:

ssh <server> "screen -ls"

Vast.ai instance (read ssh_host, ssh_port from vast-instances.json):

ssh -p <PORT> root@<HOST> "screen -ls"

Also check vast.ai instance status:

vastai show instances

Modal (when gpu: modal in CLAUDE.md):

modal app list         # List running/recent apps
modal app logs <app>   # Stream logs from a running app

Modal apps auto-terminate when done — if it's not in the list, it already finished. Check results via modal volume ls <volume> or local output.

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Skip this step entirely if wandb is not set or is false in CLAUDE.md.

Pull training curves and metrics from Weights & Biases via Python API:

# List recent runs in the project
ssh <server> "python3 -c \"
import wandb
api = wandb.Api()
runs = api.runs('<entity>/<project>', per_page=10)
for r in runs:
    print(f'{r.id}  {r.state}  {r.name}  {r.summary.get(\"eval/loss\", \"N/A\")}')
\""

# Pull specific metrics from a run (last 50 steps)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
history = list(run.scan_history(keys=['train/loss', 'eval/loss', 'eval/ppl', 'train/lr'], page_size=50))
print(json.dumps(history[-10:], indent=2))
\""

# Pull run summary (final metrics)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
print(json.dumps(dict(run.summary), indent=2, default=str))
\""

What to extract:

Training loss curve — is it converging? diverging? plateauing?
Eval metrics — loss, PPL, accuracy at latest checkpoint
Learning rate — is the schedule behaving as expected?
GPU memory — any OOM risk?
Run status — running / finished / crashed?

W&B dashboard link (include in summary for user):

https://wandb.ai/<entity>/<project>/runs/<run_id>

This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time.

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

Compare against known baselines
Flag unexpected results (negative delta, NaN, divergence)
Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check ~/.claude/feishu.json:

Send experiment_done notification: results summary table, delta vs baseline
If config absent or mode "off": skip entirely (no-op)

Key Rules

Always show raw numbers before interpretation
Compare against the correct baseline (same config)
Note if experiments are still running (check progress bars, iteration counts)
If results look wrong, check training logs for errors before concluding
Vast.ai cost awareness: When monitoring vast.ai instances, report the running cost (hours * $/hr from vast-instances.json). If all experiments on an instance are done, remind the user to run /vast-gpu destroy <instance_id> to stop billing
Modal cost awareness: Modal auto-scales to zero — no idle billing. When reporting results from Modal runs, note the actual execution time and estimated cost (time * $/hr from the GPU tier used). No cleanup action needed

wanshuiyin/monitor-experiment

skills/monitor-experiment/SKILL.md

Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.

13,323 stars

testing

Updated Jul 13, 2026

$ install --global

skillsauth

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep monitor-experiment

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Error

VirusTotalMulti-engine malware detection

70%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 13, 2026, 4:42 AM154.9s1 file scanned

SKILL.md

name:: monitor-experiment
description:: Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.
argument-hint:: [server-alias or screen-name]
allowed-tools:: Bash(ssh *), Bash(echo *), Read, Write, Edit

Monitor Experiment Results

⏱ External cadence is appropriate here. This skill waits on an external fact (job completion / progress), so it is a natural /loop / CronCreate surface: the wake reads status and self-judges only machine-checkable completion (exit code, file exists, epoch logged) — never quality. This is the additive external-wait shape in shared-references/external-cadence.md. If a scheduled wait here ends in a verdict step (e.g. then audit results), run that verdict once after the wait clears — not re-entered per tick.

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

SSH server:

ssh <server> "screen -ls"

Vast.ai instance (read ssh_host, ssh_port from vast-instances.json):

ssh -p <PORT> root@<HOST> "screen -ls"

Also check vast.ai instance status:

vastai show instances

Modal (when gpu: modal in CLAUDE.md):

modal app list         # List running/recent apps
modal app logs <app>   # Stream logs from a running app

Modal apps auto-terminate when done — if it's not in the list, it already finished. Check results via modal volume ls <volume> or local output.

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Skip this step entirely if wandb is not set or is false in CLAUDE.md.

Pull training curves and metrics from Weights & Biases via Python API:

# List recent runs in the project
ssh <server> "python3 -c \"
import wandb
api = wandb.Api()
runs = api.runs('<entity>/<project>', per_page=10)
for r in runs:
    print(f'{r.id}  {r.state}  {r.name}  {r.summary.get(\"eval/loss\", \"N/A\")}')
\""

# Pull specific metrics from a run (last 50 steps)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
history = list(run.scan_history(keys=['train/loss', 'eval/loss', 'eval/ppl', 'train/lr'], page_size=50))
print(json.dumps(history[-10:], indent=2))
\""

# Pull run summary (final metrics)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
print(json.dumps(dict(run.summary), indent=2, default=str))
\""

What to extract:

Training loss curve — is it converging? diverging? plateauing?
Eval metrics — loss, PPL, accuracy at latest checkpoint
Learning rate — is the schedule behaving as expected?
GPU memory — any OOM risk?
Run status — running / finished / crashed?

W&B dashboard link (include in summary for user):

https://wandb.ai/<entity>/<project>/runs/<run_id>

This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time.

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

Compare against known baselines
Flag unexpected results (negative delta, NaN, divergence)
Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check ~/.claude/feishu.json:

Send experiment_done notification: results summary table, delta vs baseline
If config absent or mode "off": skip entirely (no-op)

Key Rules

Always show raw numbers before interpretation
Compare against the correct baseline (same config)
Note if experiments are still running (check progress bars, iteration counts)
If results look wrong, check training logs for errors before concluding
Vast.ai cost awareness: When monitoring vast.ai instances, report the running cost (hours * $/hr from vast-instances.json). If all experiments on an instance are done, remind the user to run /vast-gpu destroy <instance_id> to stop billing
Modal cost awareness: Modal auto-scales to zero — no idle billing. When reporting results from Modal runs, note the actual execution time and estimated cost (time * $/hr from the GPU tier used). No cleanup action needed

Related Skills

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

Search GitHub Issues and Discussions for software errors, version compatibility problems, and exact error-string matches. Use for debugging and discovery only; results are not paper-citation evidence.

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

testing

VerifiedTrustedCommunity

Run the Anti-Autoresearch integrity-forensics sweep (span-anchored evidence ledger → GPT auditors propose findings → deterministic rules-only adjudicator) against a paper via a SHA-pinned thin launcher — then convert the verdict into a typed policy gate (BLOCK/WARN/NO_NEW_BLOCKER) and an append-only obligations ledger. Use when user says "integrity forensics", "forensic audit this paper", "投稿前自查诚信", "审这篇论文的诚信", or says "anti-autoresearch" when the upstream repo's own skills are not installed. Also invoked by /paper-writing (submission self-forensics, default ON), /peer-review (forensic appendix), /resubmit-pipeline.

13,401SKILL.mdUpdated Jul 13, 2026

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

testing

VerifiedTrustedCommunity

Privileged applier that LANDS meta-optimize / corpus-audit patches the user approved — the ONLY skill permitted to mutate the skill corpus from a self-modification proposal, with cross-model jury and human approval at landing. Use when the user says "meta apply", "/meta-apply", "land the staged patches", "应用优化", after a /meta-optimize run.

13,401SKILL.mdUpdated May 31, 2026

wanshuiyin/meta-apply

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r Auto-claude-code-research-in-sleep/skills/monitor-experiment ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wanshuiyin/Auto-claude-code-research-in-sleep

13,323 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

wanshuiyin/monitor-experiment

$ install --global

Security Scan Results

SKILL.md

Monitor Experiment Results

Workflow

Step 1: Check What's Running

Step 2: Collect Output from Each Screen

Step 3: Check for JSON Result Files

Step 3.5: Pull W&B Metrics (when wandb: true in CLAUDE.md)

Step 4: Summarize Results

Step 5: Interpret

Step 6: Feishu Notification (if configured)

Key Rules

Related Skills

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

wanshuiyin/monitor-experiment

$ install --global

Security Scan Results

SKILL.md

Monitor Experiment Results

Workflow

Step 1: Check What's Running

Step 2: Collect Output from Each Screen

Step 3: Check for JSON Result Files

Step 3.5: Pull W&B Metrics (when wandb: true in CLAUDE.md)

Step 4: Summarize Results

Step 5: Interpret

Step 6: Feishu Notification (if configured)

Key Rules

Related Skills

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)