Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

gtylee/modelgas/skills/tests

Name: modelgas/skills/tests
Author: gtylee

modelgas/skills/tests/SKILL.md

npx skillsauth add gtylee/codexgas modelgas/skills/tests

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill: Test Writing

Purpose

Design and generate a validation test suite that assesses conceptual soundness, implementation correctness, numerical stability, and outcome reasonableness.

This skill converts model risk into executable tests.

Inputs

Required IR fields:

methodology outputs
ALW outputs
code evidence snippets

Skill data inputs:

test_matrix.yaml (required test categories and patterns)

Outputs

A test plan matrix (test name, purpose, category)
Generated pytest files with executable tests
Dataset requests (schema-only, no data values)
Acceptance criteria placeholders linked to OPM thresholds

Rules

Evidence & uncertainty (non-negotiable)

Every materially non-trivial claim (including why a test exists) must be supported by evidence ids.
If a test cannot be specified from evidence, mark it Not evidenced and add an unknown stating what’s missing.

Coverage & traceability

Tests must be aligned with identified assumptions and weaknesses (ALW).
Each ALW weakness should map to at least one proposed test, or include an explicit reason it cannot be tested.
For each test, cite: (a) the ALW item(s) it targets and (b) evidence motivating it.

Determinism & robustness

Prefer property-based and monotonicity tests where possible.
Set seeds for stochastic components; if not possible, explain why and use statistical assertions + tolerances.
Avoid brittle “golden output” snapshots unless the model is deterministic and numerically stable.
Separate correctness tests from performance/stability tests.

Data requests

If a test cannot be written without data, request schema only (no concrete values).
Explicitly state required fields, shapes, units, and acceptable ranges if evidenced.

Code quality

Generated code must be syntactically valid pytest and runnable in isolation.
Use tolerances consistent with numerical noise; avoid false precision.

JSON / schema contract

Return JSON matching the schema exactly: no extra keys, no missing required keys.
Use explicit null/sentinel only where allowed by the schema.

System Prompt

You are a model validation engineer writing tests for a financial model. Design tests that would catch real failures, not just pass happy paths.

User Prompt Template

Using the model IR and ALW:

Propose a structured test plan across validation dimensions.
Generate pytest test code where feasible.
Identify required datasets by schema only.
Define acceptance criteria placeholders.

Return JSON matching the schema exactly.

Post-run Checks

Generated files contain valid Python.
Test coverage maps to ALW items.

gtylee/modelgas/skills/tests

modelgas/skills/tests/SKILL.md

# Skill: Test Writing ## Purpose Design and generate a validation test suite that assesses conceptual soundness, implementation correctness, numerical stability, and outcome reasonableness. This skill converts model risk into executable tests. ## Inputs Required IR fields: - methodology outputs - ALW outputs - code evidence snippets Skill data inputs: - test_matrix.yaml (required test categories and patterns) ## Outputs - A test plan matrix (test name, purpose, category) - Generated pytest

development

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add gtylee/codexgas modelgas/skills/tests

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 21, 2026, 6:59 PM119.7s3 files scanned

SKILL.md

Skill: Test Writing

Purpose

Design and generate a validation test suite that assesses conceptual soundness, implementation correctness, numerical stability, and outcome reasonableness.

This skill converts model risk into executable tests.

Inputs

Required IR fields:

methodology outputs
ALW outputs
code evidence snippets

Skill data inputs:

test_matrix.yaml (required test categories and patterns)

Outputs

A test plan matrix (test name, purpose, category)
Generated pytest files with executable tests
Dataset requests (schema-only, no data values)
Acceptance criteria placeholders linked to OPM thresholds

Rules

Evidence & uncertainty (non-negotiable)

Every materially non-trivial claim (including why a test exists) must be supported by evidence ids.
If a test cannot be specified from evidence, mark it Not evidenced and add an unknown stating what’s missing.

Coverage & traceability

Tests must be aligned with identified assumptions and weaknesses (ALW).
Each ALW weakness should map to at least one proposed test, or include an explicit reason it cannot be tested.
For each test, cite: (a) the ALW item(s) it targets and (b) evidence motivating it.

Determinism & robustness

Prefer property-based and monotonicity tests where possible.
Set seeds for stochastic components; if not possible, explain why and use statistical assertions + tolerances.
Avoid brittle “golden output” snapshots unless the model is deterministic and numerically stable.
Separate correctness tests from performance/stability tests.

Data requests

If a test cannot be written without data, request schema only (no concrete values).
Explicitly state required fields, shapes, units, and acceptable ranges if evidenced.

Code quality

Generated code must be syntactically valid pytest and runnable in isolation.
Use tolerances consistent with numerical noise; avoid false precision.

JSON / schema contract

Return JSON matching the schema exactly: no extra keys, no missing required keys.
Use explicit null/sentinel only where allowed by the schema.

System Prompt

You are a model validation engineer writing tests for a financial model. Design tests that would catch real failures, not just pass happy paths.

User Prompt Template

Using the model IR and ALW:

Propose a structured test plan across validation dimensions.
Generate pytest test code where feasible.
Identify required datasets by schema only.
Define acceptance criteria placeholders.

Return JSON matching the schema exactly.

Post-run Checks

Generated files contain valid Python.
Test coverage maps to ALW items.

Related Skills

gtylee/modelgas/skills/risk_tiering

development

VerifiedTrustedCommunity

# Skill: Risk Tiering ## Purpose Determine the governance risk tier of the model by assessing its financial impact, operational reliance, usage pattern, implementation complexity, and strength of existing risk mitigations. This skill establishes the downstream control requirements for all other skills. ## Inputs Required IR fields: - project metadata - symbols and public interfaces - imports and dependencies - commentary_md - evidence_index Skill data inputs: - rubric.yaml (axis definitions,

SKILL.mdUpdated Apr 21, 2026

gtylee/modelgas/skills/risk_tiering

gtylee/modelgas/skills/remediation_pack

development

VerifiedTrustedCommunity

# Skill: Remediation Pack ## Purpose Convert CodexGAS findings into implementable remediation artifacts (patches, tests, config/control updates) that close evidenced weaknesses and governance gaps. ## Inputs Required inputs: - Prior skill outputs (1–7 at minimum; include 8/9 if present) - remediation rules (`data/remediation_rules.yaml`) - Optional `human_declarations` (only if provided; do not invent) - IR evidence index (for evidence ids) ## Outputs Produce a remediation pack containing: -

SKILL.mdUpdated Apr 21, 2026

gtylee/modelgas/skills/remediation_pack

gtylee/modelgas/skills/prod_controls

development

VerifiedTrustedCommunity

# Skill: Production Control ## Purpose Define technical controls that ensure safe, observable, and auditable operation of the model in batch and service environments. This skill turns governance into runtime behavior. ## Inputs Required IR fields: - model interfaces - deployment assumptions - risk tier output Skill data inputs: - monitors.yaml (control patterns and snippets) ## Outputs - Logging and lineage requirements - Monitoring hooks (input/output, drift, failures) - Audit artifacts -

SKILL.mdUpdated Apr 21, 2026

gtylee/modelgas/skills/prod_controls

gtylee/modelgas/skills/opm

development

VerifiedTrustedCommunity

# Skill: OPM Tailoring ## Purpose Define ongoing performance monitoring (OPM) metrics and thresholds that are proportional to the model’s risk tier and usage. This skill operationalizes “model performance” in production. ## Inputs Required IR fields: - risk tier output - test outputs (especially metrics) - model usage characteristics Skill data inputs: - thresholds.yaml (default metrics and bands per tier) ## Outputs - Selected monitoring metrics - Thresholds (green/amber/red) - Breach defi

SKILL.mdUpdated Apr 21, 2026

gtylee/modelgas/skills/opm

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/gtylee/codexgas.git

# Copy into Claude Code skills folder (global)
cp -r codexgas/modelgas/skills/tests ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

gtylee/codexgas

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT