Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

outlinedriven/tests-adversarial

Name: tests-adversarial
Author: outlinedriven

skills/tests-adversarial/SKILL.md

npx skillsauth add outlinedriven/odin-codex-plugin tests-adversarial

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Adversarial Testing — Think Like the Attacker

Every line of code makes assumptions. Your job is to find them and violate them — systematically, not randomly. The goal is distrust, not coverage. A passing test suite proves nothing if it only tests the happy path.

The Adversarial Mindset

Every input is a lie. Callers will send garbage, nulls, negative numbers, empty strings, and types that satisfy the compiler but violate intent.
Implicit contracts are targets. If the code assumes ordering, uniqueness, non-emptiness, or positive values without enforcing it — that is your entry point.
The system is your adversary. Files disappear, connections drop, clocks jump, memory runs out, permissions change between check and use.
Passing tests prove nothing. They prove the happy path works. Adversarial tests prove the sad paths do not silently corrupt.

Assumption Hunting (Core Technique)

For every function or module under test, ask these six questions:

What does it assume about inputs? Violate each assumption: wrong type coercion, boundary values, null/nil/None, empty collections, maximum-size payloads.
What does it assume about ordering? Reorder arguments, reverse sequences, interleave concurrent calls, call methods out of lifecycle order.
What does it assume about timing? Delay responses past timeouts, deliver results before the consumer is ready, inject clock skew, expire tokens mid-operation.
What does it assume about state? Start from half-initialized state, corrupt shared state mid-operation, test post-error recovery state, double-close resources.
What does it assume about resources? Exhaust file descriptors, fill disk, revoke permissions, return allocation failures, saturate connection pools.
What does it assume will NOT happen? Make it happen. Concurrent modification during iteration, recursive re-entry, self-referential data, stack overflow via deep nesting.

Attack Vectors (Thinking Prompts)

Data:

Zero, negative, MAX_INT, NaN, Infinity, negative zero
Empty string, null bytes in strings, multi-byte Unicode (emoji, RTL, ZWJ sequences)
Empty collections, single-element, collections at capacity
Encode a value, corrupt one byte, decode it

State:

Double-close, use-after-free/dispose, read-after-error
Concurrent mutation during iteration or serialization
Half-written state from interrupted operation (crash mid-transaction)
State machine receiving events for a different state

Environment:

File not found, permission denied, disk full, read-only filesystem
Network timeout, connection reset, DNS failure, partial write
Clock jumps (forward 1 hour, backward 5 minutes, NTP correction)
OOM at the worst possible moment (during cleanup/rollback)

Protocol:

Out-of-order messages, duplicate delivery, missing acknowledgment
Partial writes (half a JSON object, truncated protobuf)
Version mismatch between client and server
Request after connection close, response after timeout already fired

The No-Cheating Rule

Test through the public API only. If you need private access to break it, the abstraction is leaking — file that as a finding.
If a scenario is "impossible," prove it with types or contracts. If you cannot prove it, it is not impossible — test it.
Every test scenario must be production-plausible. Cosmic rays flipping bits are not plausible; a user pasting 10MB into a text field is.

Writing Strategy

Read the code. Understand what it does, not what the docs say it does.
List assumptions. Write them down explicitly — one per line, no hedging.
Write violation tests. One test per assumption. Name it after what it violates: test_rejects_negative_quantity, test_handles_empty_result_set, test_recovers_from_mid_write_crash.
Verify error quality. When the code fails, does it produce a meaningful error? Silent corruption is worse than a crash.
Test boundaries from both sides. If the limit is 100, test 99, 100, and 101. If the limit is 0, test -1, 0, and 1.
Run sanitizers and race detectors. After writing tests: ASan, MSan, TSan, -race, Miri, or your language's equivalent. Tests that pass without sanitizers may hide undefined behavior.

Validation Gates

| Gate | Condition | |------|-----------| | Assumptions documented | Every implicit assumption in the code under test is written down | | Violations tested | Each documented assumption has at least one test that violates it | | Errors are meaningful | Every failure path produces a descriptive error, not silence or generic message | | Sanitizers pass | All tests pass under sanitizers / race detectors with zero warnings |

Exit Codes

| Code | Meaning | |------|---------| | 0 | All assumptions identified, violated, and handled — error paths produce meaningful output | | 1 | Untested assumptions remain — some assumptions lack violation tests | | 2 | Silent failures found — code swallows errors or produces wrong output without signaling | | 3 | Crashes or panics discovered — unhandled exceptions, segfaults, or undefined behavior found |

outlinedriven/tests-adversarial

skills/tests-adversarial/SKILL.md

Write adversarial tests that intentionally stress failure paths. Use when hardening error handling, stress-testing assumptions, validating boundary behavior, or hunting silent failures.

11 stars

testing

Updated Apr 27, 2026

$ install --global

skillsauth

npx skillsauth add outlinedriven/odin-codex-plugin tests-adversarial

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 27, 2026, 9:22 AM140.1s1 file scanned

SKILL.md

name:: tests-adversarial
description:: Write adversarial tests that intentionally stress failure paths. Use when hardening error handling, stress-testing assumptions, validating boundary behavior, or hunting silent failures.

Adversarial Testing — Think Like the Attacker

The Adversarial Mindset

Every input is a lie. Callers will send garbage, nulls, negative numbers, empty strings, and types that satisfy the compiler but violate intent.
Implicit contracts are targets. If the code assumes ordering, uniqueness, non-emptiness, or positive values without enforcing it — that is your entry point.
The system is your adversary. Files disappear, connections drop, clocks jump, memory runs out, permissions change between check and use.
Passing tests prove nothing. They prove the happy path works. Adversarial tests prove the sad paths do not silently corrupt.

Assumption Hunting (Core Technique)

For every function or module under test, ask these six questions:

What does it assume about inputs? Violate each assumption: wrong type coercion, boundary values, null/nil/None, empty collections, maximum-size payloads.
What does it assume about ordering? Reorder arguments, reverse sequences, interleave concurrent calls, call methods out of lifecycle order.
What does it assume about timing? Delay responses past timeouts, deliver results before the consumer is ready, inject clock skew, expire tokens mid-operation.
What does it assume about state? Start from half-initialized state, corrupt shared state mid-operation, test post-error recovery state, double-close resources.
What does it assume about resources? Exhaust file descriptors, fill disk, revoke permissions, return allocation failures, saturate connection pools.
What does it assume will NOT happen? Make it happen. Concurrent modification during iteration, recursive re-entry, self-referential data, stack overflow via deep nesting.

Attack Vectors (Thinking Prompts)

Data:

Zero, negative, MAX_INT, NaN, Infinity, negative zero
Empty string, null bytes in strings, multi-byte Unicode (emoji, RTL, ZWJ sequences)
Empty collections, single-element, collections at capacity
Encode a value, corrupt one byte, decode it

State:

Double-close, use-after-free/dispose, read-after-error
Concurrent mutation during iteration or serialization
Half-written state from interrupted operation (crash mid-transaction)
State machine receiving events for a different state

Environment:

File not found, permission denied, disk full, read-only filesystem
Network timeout, connection reset, DNS failure, partial write
Clock jumps (forward 1 hour, backward 5 minutes, NTP correction)
OOM at the worst possible moment (during cleanup/rollback)

Protocol:

Out-of-order messages, duplicate delivery, missing acknowledgment
Partial writes (half a JSON object, truncated protobuf)
Version mismatch between client and server
Request after connection close, response after timeout already fired

The No-Cheating Rule

Test through the public API only. If you need private access to break it, the abstraction is leaking — file that as a finding.
If a scenario is "impossible," prove it with types or contracts. If you cannot prove it, it is not impossible — test it.
Every test scenario must be production-plausible. Cosmic rays flipping bits are not plausible; a user pasting 10MB into a text field is.

Writing Strategy

Read the code. Understand what it does, not what the docs say it does.
List assumptions. Write them down explicitly — one per line, no hedging.
Write violation tests. One test per assumption. Name it after what it violates: test_rejects_negative_quantity, test_handles_empty_result_set, test_recovers_from_mid_write_crash.
Verify error quality. When the code fails, does it produce a meaningful error? Silent corruption is worse than a crash.
Test boundaries from both sides. If the limit is 100, test 99, 100, and 101. If the limit is 0, test -1, 0, and 1.
Run sanitizers and race detectors. After writing tests: ASan, MSan, TSan, -race, Miri, or your language's equivalent. Tests that pass without sanitizers may hide undefined behavior.

Validation Gates

Exit Codes

Related Skills

outlinedriven/tidy

testing

VerifiedTrustedCommunity

ODIN's compress-operations dispatcher under the Compressor/Extender role. Invoke on "tidy", "clean up", "tidy this file/memory/workspace/git/docs", or when active context (current file, diff, stack, memory directory) has structural rot to resolve before touching behavior. Detects target domain from context and routes to the sibling skill. Requires explicit target or clear active-context signal — do not invoke speculatively.

12SKILL.mdUpdated May 7, 2026

outlinedriven/taste

development

VerifiedTrustedCommunity

Cross-domain taste skill — apply distinctive judgment to any artifact (prose, code, design, decisions) instead of converging to AI defaults. Two modes — `audit` (judge work against the two-sided charter and portable anchors) and `anchor` (load register before producing). Auto-detects by phrasing; override via `/taste audit | anchor`. Trigger on "is this slop?", "overkill?", "elegant?", "taste-test this".

12SKILL.mdUpdated May 4, 2026

outlinedriven/strict-validation-setup

tools

VerifiedTrustedCommunity

One-shot bootstrap of strict-mode tooling per ecosystem plus per-task GOALS.md scaffolding so an agentic loop can self-verify. Writes typechecker/linter/schema-validator config for TS (strict + noUncheckedIndexedAccess + exactOptionalPropertyTypes), Python (Pyright strict, Ruff strict), Rust (Clippy deny-correctness), Go (golangci-lint with staticcheck), OCaml (dune --release); establishes `.agent-tasks/<id>/GOALS.md` per-task convention distinct from project-stable AGENTS.md. C++/Java/Kotlin and framework specifics (Spring Boot, Nest, React-strict) are out of scope. Trigger on new project bootstrap, agentic-task setup, "make this self-verifying", "set the loop's goal", "scaffold goals for this issue". Pairs with `llm-self-loop` runtime.

12SKILL.mdUpdated May 4, 2026

outlinedriven/strict-validation-setup

outlinedriven/setup-pre-commit

tools

VerifiedTrustedCommunity

Install git pre-commit hooks via the project's hook tool — Husky+lint-staged (JS), pre-commit (Python/OCaml), lefthook (Go), cargo-husky (Rust). Use when the user wants commit-time formatting, linting, type-checking, or test gates. Detects ecosystem first.

12SKILL.mdUpdated May 4, 2026

outlinedriven/setup-pre-commit

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/outlinedriven/odin-codex-plugin.git

# Copy into Claude Code skills folder (global)
cp -r odin-codex-plugin/skills/tests-adversarial ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

outlinedriven/odin-codex-plugin

11 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT