framework/engineering/skills/flowai-jit-review/SKILL.md
Use when the user asks for a JIT review, wants to catch hidden regressions in a staged/unstaged diff or commit range, or says "check my changes for hidden bugs". Synthesizes ephemeral Catching JiTTests (pass on parent, fail on diff). Not for fixing known failing tests (see flowai-fix-tests) or general code review (flowai-review).
npx skillsauth add korchasa/flowai flowai-jit-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Adapts Meta's Intent-Aware JiTTests methodology to flowai. On a diff (staged, unstaged, or commit-range), synthesize ephemeral tests that pass on the parent revision and fail on the diff revision — the only objective signal that you caught a real regression. Tests are discarded by default.
Development Commands section MUST declare a test (or
check) command. If absent, STOP with:
JIT-review requires a declared
testorcheckcommand in AGENTS.md Development Commands. Aborting. Do NOT guess a runner (npm test,pytest,go test, etc.).
git MUST be available. If the repo is a shallow clone or git worktree
is unsupported, fall back to reconstructing parent files via
git show HEAD:<path> into a scratch dir.git diff --cachedgit diff<sha>..<sha> range → that rangeHEAD<range-start>^).git worktree add <SCRATCH>/jit-parent-<session-id> <parent-sha>
If git worktree fails → fall back to copying each changed file via
git show <parent-sha>:<path> into <SCRATCH>/jit-parent-<session-id>/.git log -1 --pretty=%B <parent-sha>..HEAD (or the range's commit
messages).gh pr view --json body IF the gh CLI is available AND the
branch has a PR. If gh is missing or errors, proceed silently — PR body
is a bonus, not a requirement.test command inside the parent worktree.Pre-existing test failures on the parent revision. JIT review requires a green parent baseline — please fix those first.
For each changed hunk, record explicitly, in prose:
Cap the total list at 5 intents. If the diff implies more, merge related intents or drop the least-risky.
For each intent, list up to 3 hypotheses of the form:
"If the author, while trying to do X, had slipped on Y, the code would now fail at Z."
Risks must be diff-specific — not generic code smells ("null deref", "unhandled exception") unless the diff directly exposes that risk.
Generate one mutant per risk: a tiny patch on the diff-side file that models the concrete failure mode. Typical mutations tied to common risks:
> ↔ >=, < ↔ <=, == ↔ !=if (x == null) return …else bodyi < n ↔ i <= n, length - 1 ↔ lengthf(a, b) ↔ f(b, a)Edge case: if the diff is pure deletion of code, no behavioural mutant
applies — record no behavioral risks to probe and jump to stage 9.
For each mutant, write ONE test that:
Tests are written to an ephemeral directory. Apply ALL of:
tests/, NOT
next to the file under test). Pick a scratch location that the host runner
is configured to pick up — typically one of:
$(mktemp -d)/jit-tests-<session-id>/ (preferred when
the project's test command can be invoked with an explicit path)..flowai/jit/<session-id>/ — add the dir to
.gitignore first (check .gitignore entry exists; if missing, append)..gitignore or under system temp.save.discard — the directory is considered owned by the skill.Drop, in order:
Use this markdown skeleton verbatim (never collapse the Intents list into per-test bullets — it MUST appear as its own top-level enumerated section before any catching tests, so the author can audit whether the skill read the diff correctly):
## JIT Review: <N> catching tests
### Intents (inferred from diff, ≤5)
1. **<intent-1>** — invariants: <expected pre/post-conditions>
2. **<intent-2>** — invariants: ...
...
### Catching tests (pass on parent, fail on diff)
1. **<test-name>** — risk: <short risk description>
- Intent ref: <#N from Intents list>
- Mutant killed: <yes / no / skipped:degraded>
- Failure on diff: <one-line assertion output>
- Location: <source file>:<line>
2. ...
### Uncovered risks (no catching test generated)
- <risk> — reason: <why no test — e.g. non-deterministic, external I/O>
### Degradation notes
- Mutant-prove skipped: single test run exceeded 30 s threshold.
### Next
- save <test-name> → <proposed destination path>
- save all → <proposed destination directory>
- discard all → <scratch path>
Ranking: top 5 catching tests by severity × uniqueness, where severity is
"how plausible and impactful the caught regression is" and uniqueness is
"how many of the current catching tests assert a distinct symptom".
Time-budget degradation: if a single invocation of the test command on
the smallest scope takes > 30 seconds, SKIP stage 7(c) for the rest of the
session. Flag this in the "Degradation notes" section. The catching
invariant (pass on parent, fail on diff) is still preserved; only the
mutant kill-rate metric is lost.
After the report, prompt the user interactively:
Save which tests?
- save <test-name> → move one test into the main test tree
(I will propose a destination path)
- save all → move every catching test
- discard all → delete the scratch directory
Handle the response:
save <name>: propose a destination based on the file under test
(e.g. alongside discount_test.ts). Ask for confirmation before moving.
On approval, git mv or equivalent — then stage the file.save all: same, iterated.discard all (default on timeout or ambiguous answer): delete the
entire scratch directory. Do NOT leave stray files.save. Every
generated test must start in the ephemeral directory.test command, stop
(see Preconditions).save.discard.development
Use when the user asks to add TypeScript strict-mode code-style rules to AGENTS.md for a TypeScript project using strict mode. Do NOT trigger for Deno projects (use setup-agent-code-style-deno) or non-strict TS configurations.
development
Use when the user asks to add Deno/TypeScript code-style rules to AGENTS.md, or during initial Deno project setup when code-style guidelines need to be established. Do NOT trigger for non-Deno TypeScript projects (use setup-agent-code-style-strict), or for runtime-agnostic style advice.
testing
Use when the user provides a source (URL, file path, or free text) to save into the project's memex — a long-term knowledge bank for AI agents. Stores the raw source, extracts entities into cross-linked pages, runs a backlink audit, and updates the index and activity log. Do NOT trigger on casual reads; only when the intent is to persist a source into the memex.
development
Use when the user asks to audit a memex (long-term knowledge bank for AI agents) for orphans, dead SALP REFs, missing sections, contradictions, or index drift. Runs a deterministic structural check, layers LLM-judgement findings, optionally auto-fixes trivial issues with `--fix`. Do NOT trigger on general code linting.