Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

shaun-z/monitor-experiment

Name: monitor-experiment
Author: shaun-z

skills/monitor-experiment/SKILL.md

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep monitor-experiment

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Monitor Experiment Results

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

SSH server:

ssh <server> "screen -ls"

Vast.ai instance (read ssh_host, ssh_port from vast-instances.json):

ssh -p <PORT> root@<HOST> "screen -ls"

Also check vast.ai instance status:

vastai show instances

Modal (when gpu: modal in CLAUDE.md):

modal app list         # List running/recent apps
modal app logs <app>   # Stream logs from a running app

Modal apps auto-terminate when done — if it's not in the list, it already finished. Check results via modal volume ls <volume> or local output.

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Skip this step entirely if wandb is not set or is false in CLAUDE.md.

Pull training curves and metrics from Weights & Biases via Python API:

# List recent runs in the project
ssh <server> "python3 -c \"
import wandb
api = wandb.Api()
runs = api.runs('<entity>/<project>', per_page=10)
for r in runs:
    print(f'{r.id}  {r.state}  {r.name}  {r.summary.get(\"eval/loss\", \"N/A\")}')
\""

# Pull specific metrics from a run (last 50 steps)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
history = list(run.scan_history(keys=['train/loss', 'eval/loss', 'eval/ppl', 'train/lr'], page_size=50))
print(json.dumps(history[-10:], indent=2))
\""

# Pull run summary (final metrics)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
print(json.dumps(dict(run.summary), indent=2, default=str))
\""

What to extract:

Training loss curve — is it converging? diverging? plateauing?
Eval metrics — loss, PPL, accuracy at latest checkpoint
Learning rate — is the schedule behaving as expected?
GPU memory — any OOM risk?
Run status — running / finished / crashed?

W&B dashboard link (include in summary for user):

https://wandb.ai/<entity>/<project>/runs/<run_id>

This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time.

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

Compare against known baselines
Flag unexpected results (negative delta, NaN, divergence)
Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check ~/.claude/feishu.json:

Send experiment_done notification: results summary table, delta vs baseline
If config absent or mode "off": skip entirely (no-op)

Key Rules

Always show raw numbers before interpretation
Compare against the correct baseline (same config)
Note if experiments are still running (check progress bars, iteration counts)
If results look wrong, check training logs for errors before concluding
Vast.ai cost awareness: When monitoring vast.ai instances, report the running cost (hours * $/hr from vast-instances.json). If all experiments on an instance are done, remind the user to run /vast-gpu destroy <instance_id> to stop billing
Modal cost awareness: Modal auto-scales to zero — no idle billing. When reporting results from Modal runs, note the actual execution time and estimated cost (time * $/hr from the GPU tier used). No cleanup action needed

shaun-z/monitor-experiment

skills/monitor-experiment/SKILL.md

Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep monitor-experiment

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 3:08 AM5.1s1 file scanned

SKILL.md

name:: monitor-experiment
description:: Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.
argument-hint:: [server-alias or screen-name]
allowed-tools:: Bash(ssh *), Bash(echo *), Read, Write, Edit

Monitor Experiment Results

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

SSH server:

ssh <server> "screen -ls"

Vast.ai instance (read ssh_host, ssh_port from vast-instances.json):

ssh -p <PORT> root@<HOST> "screen -ls"

Also check vast.ai instance status:

vastai show instances

Modal (when gpu: modal in CLAUDE.md):

modal app list         # List running/recent apps
modal app logs <app>   # Stream logs from a running app

Modal apps auto-terminate when done — if it's not in the list, it already finished. Check results via modal volume ls <volume> or local output.

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Skip this step entirely if wandb is not set or is false in CLAUDE.md.

Pull training curves and metrics from Weights & Biases via Python API:

# List recent runs in the project
ssh <server> "python3 -c \"
import wandb
api = wandb.Api()
runs = api.runs('<entity>/<project>', per_page=10)
for r in runs:
    print(f'{r.id}  {r.state}  {r.name}  {r.summary.get(\"eval/loss\", \"N/A\")}')
\""

# Pull specific metrics from a run (last 50 steps)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
history = list(run.scan_history(keys=['train/loss', 'eval/loss', 'eval/ppl', 'train/lr'], page_size=50))
print(json.dumps(history[-10:], indent=2))
\""

# Pull run summary (final metrics)
ssh <server> "python3 -c \"
import wandb, json
api = wandb.Api()
run = api.run('<entity>/<project>/<run_id>')
print(json.dumps(dict(run.summary), indent=2, default=str))
\""

What to extract:

Training loss curve — is it converging? diverging? plateauing?
Eval metrics — loss, PPL, accuracy at latest checkpoint
Learning rate — is the schedule behaving as expected?
GPU memory — any OOM risk?
Run status — running / finished / crashed?

W&B dashboard link (include in summary for user):

https://wandb.ai/<entity>/<project>/runs/<run_id>

This gives the auto-review-loop richer signal than just screen output — training dynamics, loss curves, and metric trends over time.

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

Compare against known baselines
Flag unexpected results (negative delta, NaN, divergence)
Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check ~/.claude/feishu.json:

Send experiment_done notification: results summary table, delta vs baseline
If config absent or mode "off": skip entirely (no-op)

Key Rules

Always show raw numbers before interpretation
Compare against the correct baseline (same config)
Note if experiments are still running (check progress bars, iteration counts)
If results look wrong, check training logs for errors before concluding
Vast.ai cost awareness: When monitoring vast.ai instances, report the running cost (hours * $/hr from vast-instances.json). If all experiments on an instance are done, remind the user to run /vast-gpu destroy <instance_id> to stop billing
Modal cost awareness: Modal auto-scales to zero — no idle billing. When reporting results from Modal runs, note the actual execution time and estimated cost (time * $/hr from the GPU tier used). No cleanup action needed

Related Skills

shaun-z/paper-illustration-image2

development

VerifiedTrustedCommunity

Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.

SKILL.mdUpdated Apr 25, 2026

shaun-z/paper-illustration-image2

shaun-z/overleaf-sync

development

VerifiedTrustedCommunity

Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.

SKILL.mdUpdated Apr 25, 2026

shaun-z/overleaf-sync

shaun-z/citation-audit

development

VerifiedTrustedCommunity

Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.

SKILL.mdUpdated Apr 20, 2026

shaun-z/citation-audit

shaun-z/writing-systems-papers

data-ai

VerifiedTrustedCommunity

Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.

SKILL.mdUpdated Apr 17, 2026

shaun-z/writing-systems-papers

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/shaun-z/auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r auto-claude-code-research-in-sleep/skills/monitor-experiment ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

shaun-z/auto-claude-code-research-in-sleep

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

shaun-z/monitor-experiment

$ install --global

Security Scan Results

SKILL.md

Monitor Experiment Results

Workflow

Step 1: Check What's Running

Step 2: Collect Output from Each Screen

Step 3: Check for JSON Result Files

Step 3.5: Pull W&B Metrics (when wandb: true in CLAUDE.md)

Step 4: Summarize Results

Step 5: Interpret

Step 6: Feishu Notification (if configured)

Key Rules

Related Skills

shaun-z/paper-illustration-image2

shaun-z/overleaf-sync

shaun-z/citation-audit

shaun-z/writing-systems-papers

shaun-z/monitor-experiment

$ install --global

Security Scan Results

SKILL.md

Monitor Experiment Results

Workflow

Step 1: Check What's Running

Step 2: Collect Output from Each Screen

Step 3: Check for JSON Result Files

Step 3.5: Pull W&B Metrics (when wandb: true in CLAUDE.md)

Step 4: Summarize Results

Step 5: Interpret

Step 6: Feishu Notification (if configured)

Key Rules

Related Skills

shaun-z/paper-illustration-image2

shaun-z/overleaf-sync

shaun-z/citation-audit

shaun-z/writing-systems-papers

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)

Step 3.5: Pull W&B Metrics (when `wandb: true` in CLAUDE.md)