Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

paulnsorensen/test-sandbox

Name: test-sandbox
Author: paulnsorensen

claude/skills/test-sandbox/SKILL.md

npx skillsauth add paulnsorensen/dotfiles test-sandbox

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/test-sandbox — Isolated Test Sandboxing

Run Python test code in an isolated, sandboxed environment without polluting the main context. Ideal for quick assertions and verification during development.

Usage

Quick Test

/test-sandbox "assert 1 + 1 == 2"

Test with Imports

/test-sandbox "from src.orders import process_order; assert process_order({}) == expected"

Test from File

/test-sandbox --file tests/test_edge_cases.py

Sweep Stale Tests

/test-sandbox --sweep

How It Works

Writes test file to .claude/testing/test_<hash>.py (isolated from repo)
Runs test via sub-agent: uv run pytest .claude/testing/test_<hash>.py --tb=short
Reports concisely: ✓/✗ pass count, assertions run, findings >= 50 confidence only
Cleans up the test file (optional, configurable per session)

The skill delegates to sub-agents to keep your main context clean — you only see the results, not the verbose test output or implementation details.

Examples

Example 1: Quick Assertion

> /test-sandbox "assert 'hello'.upper() == 'HELLO'"

✓ Test passed: 1 assertion ran, all passed
Wrote: .claude/testing/test_abc123.py

Example 2: Module Test

> /test-sandbox "from src.auth import verify_token; assert verify_token('valid') is True; assert verify_token('bad') is False"

✓ Test passed: 2 assertions ran, all passed

Example 3: Test Failure

> /test-sandbox "assert 1 + 1 == 3"

✗ Test failed: AssertionError
  Expected: 3
  Actual: 2
File: .claude/testing/test_xyz789.py (not cleaned up for inspection)

Example 4: Sweep

> /test-sandbox --sweep

Cleaned 7 stale test files (> 24 hours old)
`.claude/testing/` now contains 2 recent tests

Flags

| Flag | Behavior | |------|----------| | --sweep | Delete test files older than 24 hours. Does not run tests. | | --file <path> | Run tests from an existing file instead of inline code. | | --keep | Don't clean up test file after run (for inspection). |

Gitignore Integration

On first use, /test-sandbox automatically adds .claude/testing/ to .gitignore if not already present. No manual action needed.

Quality

Real test runner: Uses your project's uv run pytest, not a mock runner. Respects venv, fixtures, conftest.
Confidence scoring: Only surfaces findings scored >= 50 (high-confidence issues).
Context discipline: Sub-agent reports summarized to ~2K max. Full details available in $TMPDIR if needed.
Fail-safe cleanup: Stale test files are swept automatically (24-hour age threshold).

When to Use This

Good for:

Quick verifications during development
Testing a new function before committing
Edge case exploration
Validation of refactoring

Not ideal for:

Long-term test suites (use tests/ directory instead)
Collaborative tests (tests belong in repo, not .claude/testing/)
Fixtures or setup that needs persistence (use conftest.py)

Tips & Tricks

Capture Output

/test-sandbox "from src.module import fn; result = fn(); assert result > 0; print(f'Result: {result}')"

Multiple Assertions

Separate with semicolons:

/test-sandbox "from mymodule import Cls; c = Cls(); assert c.x == 1; assert c.y == 2; assert c.z == 3"

Test a Refactor

/test-sandbox --file src/old_module.py  # Run old module's internal test suite

Debug a Failure

/test-sandbox --keep "assert my_fn() == expected"  # Don't delete file after failure
cat .claude/testing/test_*.py  # Inspect the generated test

Implementation

Skill: Routes test code to sub-agents (roquefort-wrecker for TDD work)
Sub-agents: Spawn in parallel for independence, write test files, run tests, score findings
Output: Only pass/fail counts + findings >= 50. Verbose output trapped in sub-agent context.
Cleanup: Automatic after run (unless --keep flag used)
Gitignore: Idempotent (safe to run multiple times)

See claude/CLAUDE.md for sub-agent delegation patterns and context discipline rules.

Gotchas

Module imports fail if PYTHONPATH doesn't include project root — prefix with PYTHONPATH=.
conftest.py fixtures from tests/ are not available in .claude/testing/ — copy needed fixtures
uv must be installed — fall back to python -m pytest if unavailable
Test files in .claude/testing/ are gitignored but accumulate — use --sweep periodically

paulnsorensen/test-sandbox

claude/skills/test-sandbox/SKILL.md

Run Python test code in an isolated sandbox without polluting the main context. Writes test files to .claude/testing/ (gitignored), runs via sub-agent, and reports only pass/fail counts and assertion details. Use when you want to quickly verify code without writing inline python3 -c scripts. Also supports --sweep to clean stale test files. Use when the user says "run a quick test", "verify this works", "sanity check", "test this snippet", or invokes /test-sandbox.

2 stars

development

Updated Apr 27, 2026

$ install --global

skillsauth

npx skillsauth add paulnsorensen/dotfiles test-sandbox

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 27, 2026, 10:10 AM74.5s1 file scanned

SKILL.md

name:: test-sandbox
model:: haiku
context:: fork
allowed-tools:: Read, Write, Bash(python3:*), Bash(uv:*), Bash(pytest:*), Bash(ls:*), Bash(rm:*)
description:: >

/test-sandbox — Isolated Test Sandboxing

Run Python test code in an isolated, sandboxed environment without polluting the main context. Ideal for quick assertions and verification during development.

Usage

Quick Test

/test-sandbox "assert 1 + 1 == 2"

Test with Imports

/test-sandbox "from src.orders import process_order; assert process_order({}) == expected"

Test from File

/test-sandbox --file tests/test_edge_cases.py

Sweep Stale Tests

/test-sandbox --sweep

How It Works

Writes test file to .claude/testing/test_<hash>.py (isolated from repo)
Runs test via sub-agent: uv run pytest .claude/testing/test_<hash>.py --tb=short
Reports concisely: ✓/✗ pass count, assertions run, findings >= 50 confidence only
Cleans up the test file (optional, configurable per session)

The skill delegates to sub-agents to keep your main context clean — you only see the results, not the verbose test output or implementation details.

Examples

Example 1: Quick Assertion

> /test-sandbox "assert 'hello'.upper() == 'HELLO'"

✓ Test passed: 1 assertion ran, all passed
Wrote: .claude/testing/test_abc123.py

Example 2: Module Test

> /test-sandbox "from src.auth import verify_token; assert verify_token('valid') is True; assert verify_token('bad') is False"

✓ Test passed: 2 assertions ran, all passed

Example 3: Test Failure

> /test-sandbox "assert 1 + 1 == 3"

✗ Test failed: AssertionError
  Expected: 3
  Actual: 2
File: .claude/testing/test_xyz789.py (not cleaned up for inspection)

Example 4: Sweep

> /test-sandbox --sweep

Cleaned 7 stale test files (> 24 hours old)
`.claude/testing/` now contains 2 recent tests

Flags

Gitignore Integration

On first use, /test-sandbox automatically adds .claude/testing/ to .gitignore if not already present. No manual action needed.

Quality

Real test runner: Uses your project's uv run pytest, not a mock runner. Respects venv, fixtures, conftest.
Confidence scoring: Only surfaces findings scored >= 50 (high-confidence issues).
Context discipline: Sub-agent reports summarized to ~2K max. Full details available in $TMPDIR if needed.
Fail-safe cleanup: Stale test files are swept automatically (24-hour age threshold).

When to Use This

Good for:

Quick verifications during development
Testing a new function before committing
Edge case exploration
Validation of refactoring

Not ideal for:

Long-term test suites (use tests/ directory instead)
Collaborative tests (tests belong in repo, not .claude/testing/)
Fixtures or setup that needs persistence (use conftest.py)

Tips & Tricks

Capture Output

/test-sandbox "from src.module import fn; result = fn(); assert result > 0; print(f'Result: {result}')"

Multiple Assertions

Separate with semicolons:

/test-sandbox "from mymodule import Cls; c = Cls(); assert c.x == 1; assert c.y == 2; assert c.z == 3"

Test a Refactor

/test-sandbox --file src/old_module.py  # Run old module's internal test suite

Debug a Failure

/test-sandbox --keep "assert my_fn() == expected"  # Don't delete file after failure
cat .claude/testing/test_*.py  # Inspect the generated test

Implementation

Skill: Routes test code to sub-agents (roquefort-wrecker for TDD work)
Sub-agents: Spawn in parallel for independence, write test files, run tests, score findings
Output: Only pass/fail counts + findings >= 50. Verbose output trapped in sub-agent context.
Cleanup: Automatic after run (unless --keep flag used)
Gitignore: Idempotent (safe to run multiple times)

See claude/CLAUDE.md for sub-agent delegation patterns and context discipline rules.

Gotchas

Module imports fail if PYTHONPATH doesn't include project root — prefix with PYTHONPATH=.
conftest.py fixtures from tests/ are not available in .claude/testing/ — copy needed fixtures
uv must be installed — fall back to python -m pytest if unavailable
Test files in .claude/testing/ are gitignored but accumulate — use --sweep periodically

Related Skills

paulnsorensen/work-recovery

tools

VerifiedTrustedCommunity

Reconstruct what a past coding-agent session was doing so you can resume it — goal, files touched, last verified state, and the next step — by querying the session logs. Use when the user says "what was I working on", "recover that session", "reconstruct where I left off", "resume my last session", "what did that session change", "rebuild context from logs", or invokes /work-recovery. Report-only — it never scores or judges. Do NOT use for usage scoring (that is /skill-improver, /tool-efficiency, /prompt-analytics) or one-off interactive log queries (that is /session-analytics).

2SKILL.mdUpdated Jun 3, 2026

paulnsorensen/work-recovery

paulnsorensen/wiki-curator

development

VerifiedTrustedCommunity

Curate this repo's hallouminate wiki (.hallouminate/wiki/, the repo:dotfiles:wiki corpus) — add or update architecture pages, per-harness docs, and gotchas. Use when the user says "update the wiki", "document this in the wiki", "refresh the harness docs", "add a wiki page", "curate the wiki", "the wiki is stale", or invokes /wiki-curator. Also use at session end to write back a non-obvious decision or gotcha worth preserving. Grounds the existing wiki first, follows one-topic-per-file conventions, verifies every external doc URL before writing, and reindexes. Do NOT use for general code search (that is cheez-search) or for editing AGENTS.md command reference.

2SKILL.mdUpdated Jun 3, 2026

paulnsorensen/wiki-curator

paulnsorensen/tool-efficiency

tools

VerifiedTrustedCommunity

Audit how a tool, command, or MCP server is actually used across coding-agent sessions and produce calibrated recommendations — tool-vs-task fit, error forensics, fix recommendations, permission friction, MCP health, and token economics. Use when the user says "tool efficiency", "am I using X efficiently", "audit tool usage", "why does X keep failing", "how do I fix this error", "what should I change", "permission friction", "is this MCP worth it", "tool error rate", "fix recommendations", or invokes /tool-efficiency. Do NOT use for auditing a skill or agent definition (that is /skill-improver) or for one-off interactive log queries (that is /session-analytics).

2SKILL.mdUpdated Jun 3, 2026

paulnsorensen/tool-efficiency

paulnsorensen/prompt-analytics

tools

VerifiedTrustedCommunity

Analyze how prompts and skill routing behave across coding-agent sessions and produce calibrated recommendations — prompt-pattern analysis, routing accuracy, and knowledge gaps. Use when the user says "analyze my prompts", "prompt patterns", "is routing working", "which skill should have fired", "knowledge gaps", "what do I keep asking", or invokes /prompt-analytics. Do NOT use for auditing a single skill/agent definition (that is /skill-improver), tool/MCP efficiency (that is /tool-efficiency), or one-off interactive log queries (that is /session-analytics).

2SKILL.mdUpdated Jun 3, 2026

paulnsorensen/prompt-analytics

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/paulnsorensen/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/test-sandbox ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

paulnsorensen/dotfiles

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT