Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/test-suite-prioritizer

Name: test-suite-prioritizer
Author: santosomar

skills/testing/test-suite-prioritizer/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills test-suite-prioritizer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test Suite Prioritizer

If the suite takes 40 minutes and fails at minute 38, you wasted 38 minutes. Run the test that's going to fail first. Prioritization is predicting failures and front-loading them.

Signals for priority

| Signal | Why it predicts failure | How to get it | | ------------------------------ | -------------------------------------------------------- | ------------------------------------------------- | | Covers changed code | This change might have broken it | Coverage map + git diff --name-only | | Failed recently | What failed yesterday fails today | CI history — last N runs | | Flaky | Runs early → flakes detected early, can be retried | CI history — pass/fail variance | | Fast | More tests per minute of budget | Duration from last run | | High code coverage (this test) | Covers more → more chance of catching something | Per-test coverage | | Co-change with modified files | Historically changes with these files | git log correlation |

Combine into a score. Sort. Run in order.

Scoring formula (starting point — tune it)

priority(test) =
    10.0 * covers_changed_lines(test, diff)       # binary: 1 if any overlap
  +  3.0 * recent_failure_rate(test, last_20_runs)
  +  1.0 * flake_rate(test)
  +  0.1 * (1.0 / (duration_seconds(test) + 1))   # tie-break: faster first
  -  5.0 * is_quarantined(test)                   # known-broken go last

The covers_changed_lines weight dominates: if you changed foo.py, tests that cover foo.py run first. Everything else is tie-breaking.

Building the change → test map

You need: which tests cover which lines. Per-test coverage:

| Ecosystem | How | | --------- | -------------------------------------------------------------------- | | Python | pytest-testmon maintains the map incrementally; or coverage run --parallel per test + coverage combine | | Java | JaCoCo per-test — tricky, needs agent per test; or use git diff → changed classes → tests importing them (approximation) | | JS | jest --changedSince=<ref> does this natively | | Go | go test -run with package granularity; or gotestsum --junitfile + parse |

The map is expensive to build from scratch (run every test in isolation, collect coverage). Build it once, update incrementally.

Tiered execution

Don't just reorder — cut off:

| Tier | What | Runs on | Time budget | | ---------- | ----------------------------------------------- | ---------------- | ----------- | | Smoke | Top-20 by priority | Every commit | < 2 min | | Affected | Everything covering changed code | Every PR | < 10 min | | Full | Everything, priority-ordered | Merge to main | Whatever it takes | | Nightly | Full + slow integration + flake retry loop | Cron | Hours OK |

Fail fast at each tier. Smoke fail → don't run affected. Affected fail → don't run full.

Worked example

Change: PR touches auth/session.py (lines 45–60) and auth/tokens.py (new file).

Coverage map says:

| Test | Covers session.py 45–60 | Covers tokens.py | Last 20 runs | Duration | | ---------------------------------- | ----------------------- | ---------------- | ------------ | -------- | | test_session_refresh | Yes | No | 20/20 pass | 0.3s | | test_token_issuance | No | Yes | (new) | 0.1s | | test_session_expiry | Yes (line 58) | No | 18/20 pass | 0.2s | | test_login_e2e | Yes (indirectly) | Yes | 20/20 pass | 12s | | test_unrelated_billing | No | No | 20/20 pass | 0.4s | | ...300 more unrelated tests... | | | | |

Priority order:

test_session_expiry — covers change, recent failures (2/20), fast → score 13.2
test_session_refresh — covers change, clean, fast → 10.1
test_token_issuance — covers new file, clean, fastest → 10.1
test_login_e2e — covers both, but slow → 10.008 5–305. Everything else (scores < 1)

Affected tier runs 1–4 (12.6s total). If they pass, high confidence the change is safe. Full suite runs on merge.

Flake handling

Flaky tests are a prioritization headache: run them early (catch flakes fast, retry within budget) or run them late (don't block good changes on noise)?

Answer: run them early, with automatic retry. A flake that fails-then-passes-on-retry cost you 5 seconds. A flake that fails at minute 38 with no retry budget cost you 38 minutes.

Do not

Do not trust static import-based "affected test" approximations for dynamic languages. getattr(module, name)() isn't an import. Use real coverage.
Do not let the priority order stagnate. Rebuild it when coverage changes significantly (nightly at minimum).
Do not skip the full suite entirely. Prioritization is for fast feedback, not for replacing comprehensive checks. Something always slips past "affected."
Do not deprioritize a test into oblivion. A test that's always last never runs if the budget is tight — at that point delete it or fix it.

Output format

## Change
<diff summary — files, line ranges>

## Coverage map freshness
Last rebuilt: <when>
Staleness: <N tests may have stale coverage>

## Priority order (top 20)
| Rank | Test | Score | Covers change | Recent fails | Duration |
| ---- | ---- | ----- | ------------- | ------------ | -------- |

## Tiers
| Tier | Tests | Est. duration | Run condition |
| ---- | ----- | ------------- | ------------- |

## Run plan
<pytest/jest/mvn command(s) — ordered>

## Flakes
<tests with flake_rate > 0 — retry policy>

santosomar/test-suite-prioritizer

skills/testing/test-suite-prioritizer/SKILL.md

Orders tests so failures surface earliest — runs tests covering changed code first, historically flaky/failing tests early, and slow low-value tests last. Use when the suite is too slow to run in full on every change, when CI feedback takes too long, or when deciding what to run in a smoke-test tier.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills test-suite-prioritizer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:39 AM60.0s1 file scanned

SKILL.md

name:: test-suite-prioritizer
description:: Orders tests so failures surface earliest — runs tests covering changed code first, historically flaky/failing tests early, and slow low-value tests last. Use when the suite is too slow to run in full on every change, when CI feedback takes too long, or when deciding what to run in a smoke-test tier.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: coverage-enhancer, test-deduplicator

Test Suite Prioritizer

If the suite takes 40 minutes and fails at minute 38, you wasted 38 minutes. Run the test that's going to fail first. Prioritization is predicting failures and front-loading them.

Signals for priority

Combine into a score. Sort. Run in order.

Scoring formula (starting point — tune it)

priority(test) =
    10.0 * covers_changed_lines(test, diff)       # binary: 1 if any overlap
  +  3.0 * recent_failure_rate(test, last_20_runs)
  +  1.0 * flake_rate(test)
  +  0.1 * (1.0 / (duration_seconds(test) + 1))   # tie-break: faster first
  -  5.0 * is_quarantined(test)                   # known-broken go last

The covers_changed_lines weight dominates: if you changed foo.py, tests that cover foo.py run first. Everything else is tie-breaking.

Building the change → test map

You need: which tests cover which lines. Per-test coverage:

The map is expensive to build from scratch (run every test in isolation, collect coverage). Build it once, update incrementally.

Tiered execution

Don't just reorder — cut off:

Fail fast at each tier. Smoke fail → don't run affected. Affected fail → don't run full.

Worked example

Change: PR touches auth/session.py (lines 45–60) and auth/tokens.py (new file).

Coverage map says:

Priority order:

test_session_expiry — covers change, recent failures (2/20), fast → score 13.2
test_session_refresh — covers change, clean, fast → 10.1
test_token_issuance — covers new file, clean, fastest → 10.1
test_login_e2e — covers both, but slow → 10.008 5–305. Everything else (scores < 1)

Affected tier runs 1–4 (12.6s total). If they pass, high confidence the change is safe. Full suite runs on merge.

Flake handling

Flaky tests are a prioritization headache: run them early (catch flakes fast, retry within budget) or run them late (don't block good changes on noise)?

Answer: run them early, with automatic retry. A flake that fails-then-passes-on-retry cost you 5 seconds. A flake that fails at minute 38 with no retry budget cost you 38 minutes.

Do not

Do not trust static import-based "affected test" approximations for dynamic languages. getattr(module, name)() isn't an import. Use real coverage.
Do not let the priority order stagnate. Rebuild it when coverage changes significantly (nightly at minimum).
Do not skip the full suite entirely. Prioritization is for fast feedback, not for replacing comprehensive checks. Something always slips past "affected."
Do not deprioritize a test into oblivion. A test that's always last never runs if the budget is tight — at that point delete it or fix it.

Output format

## Change
<diff summary — files, line ranges>

## Coverage map freshness
Last rebuilt: <when>
Staleness: <N tests may have stale coverage>

## Priority order (top 20)
| Rank | Test | Score | Covers change | Recent fails | Duration |
| ---- | ---- | ----- | ------------- | ------------ | -------- |

## Tiers
| Tier | Tests | Est. duration | Run condition |
| ---- | ----- | ------------- | ------------- |

## Run plan
<pytest/jest/mvn command(s) — ordered>

## Flakes
<tests with flake_rate > 0 — retry policy>

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/test-suite-prioritizer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT