agentic/code/addons/rlm/skills/rlm-search/SKILL.md
Run the full Recursive Language Model pipeline — prep, fan out across chunks, and recursively synthesize until results fit one context window
npx skillsauth add jmagly/aiwg rlm-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The full Recursive Language Model pipeline in one command. Prepares content if needed, fans the query out across all chunks, and recursively synthesizes results until they fit in a single context window. Use this when you need to answer a question against content too large to read at once.
Alternate expressions and non-obvious activations:
..| Pattern | Example | Action |
|---------|---------|--------|
| Whole-repo search | "search the entire codebase for all usages of deprecated API" | rlm-search "..." --source . |
| Directory search | "recursively search src/ for logging calls" | --source src/ |
| File search | "use RLM to analyze this 5000-line file" | --source path/to/file.ts |
| Budget limit | "search but cap at 200k tokens" | --budget 200000 |
| Depth limit | "search up to 2 levels deep" | --depth 2 |
| Skip re-prep | "search using the existing prep" | No re-prep if manifest exists |
When triggered:
Extract query and source — identify the natural language query and the source path (file or directory). Default source is . (current directory).
Check for existing prep — look for a valid manifest in .aiwg/rlm-prep/ matching the source. Reuse only when the prep index covers every source file, manifest, and chunk. If missing, stale, incomplete, or from an older single-chunk-dropping prep run, rebuild with rlm-prep.
Initial fanout (level 1) — dispatch the query across all chunks, up to --max-parallel subagents at a time. Collect results with provenance.
Check synthesis fit — measure the total size of all level-1 results. If they fit in a single context window, synthesize directly (base case). If not, recurse.
Recursive reduction — chunk the level-1 results into a new set of chunks and fan out again. Each level-N subagent synthesizes the results from one batch of level-(N-1) answers. Repeat until the output fits in one window.
Final synthesis — produce a single coherent answer from the last reduction level. Include provenance: trace each claim back to a source file and line range.
Cost summary — report total tokens consumed, number of subagents launched, recursion depth reached, and USD cost estimate.
Level 0 (root query)
└── Level 1 fanout: N subagents (one per chunk)
├── chunk-0001 → answer fragment A
├── chunk-0002 → answer fragment B
├── chunk-0003 → (no match)
└── chunk-0004 → answer fragment C
If A + B + C fit in one window:
└── Synthesize → Final Answer ✓
If A + B + C do NOT fit:
Level 2 fanout: chunk the level-1 results
├── [A + B] → synthesis fragment 1
└── [C] → synthesis fragment 2
└── Synthesize fragments 1 + 2 → Final Answer ✓
The default --depth 3 means the pipeline will recurse at most 3 times before forcing synthesis even if results are large.
RLM Search Complete
Query: "Where is rate limiting implemented?"
Source: src/ | Chunks: 47 | Depth reached: 1 | Subagents: 14
Answer:
Rate limiting is implemented in three places:
1. **API gateway level** — `src/gateway/rate-limit.ts` (lines 12-45) applies
a sliding window limiter using Redis. Limits are configured per route in
`config/rate-limits.yaml`.
2. **Auth service** — `src/auth/middleware.ts` (lines 88-102) imposes a
per-IP limit of 10 login attempts per minute using an in-memory store.
3. **WebSocket connections** — `src/realtime/server.ts` (lines 231-248)
limits new connections per second to prevent connection floods.
Cost summary: 47 subagents, 184,320 tokens (~$0.18), 1 synthesis pass
<query> — Natural language question or task (required)--source <file|dir> — Source content to search (default: .)--depth N — Maximum recursion depth before forcing synthesis (default: 3)--max-parallel N — Max parallel subagents per level (default: 4, bounded by context budget). Alias --parallel is also accepted by the CLI.--budget N — Token budget for the entire operation (default: 500000)Prep coverage note: files that fit within one chunk are still written to a manifest and included in the search plan. Existing prep indexes are validated before reuse so older partial indexes are rebuilt automatically.
User: "search the entire codebase for where authentication tokens are validated"
Action: Check for existing prep of ., fanout across all chunks, synthesize.
Response:
RLM Search Complete
Query: "where are authentication tokens validated?"
Source: . | Chunks: 84 | Depth: 1 | Subagents: 84
Answer:
Token validation occurs at two layers:
1. **HTTP middleware** — `src/auth/middleware.ts` lines 34-67: the
`validateToken` function decodes and verifies JWTs using the
`jsonwebtoken` library, checking signature and expiry.
2. **GraphQL context** — `src/graphql/context.ts` lines 18-31: calls
`validateToken` on every request and attaches the decoded payload
to the GraphQL execution context.
Cost: 84 subagents, 241,800 tokens (~$0.24)
User: "use RLM to find all compliance-relevant data handling in the entire codebase"
Action:
aiwg rlm-search "find all places where PII or sensitive data is stored, transmitted, or logged" --source .
Level-1 produces 28 matching fragments totaling 40,000 tokens (too large for one pass). Level-2 reduces to 4 synthesis fragments, then final synthesis produces the answer.
Response: "Depth reached: 2. Found 14 locations across 9 files. [Full provenance-tagged answer]"
User: "deep search src/payments/ for Stripe webhook handling, cap at 100k tokens"
Action:
aiwg rlm-search "how are Stripe webhooks handled?" \
--source src/payments/ \
--budget 100000 \
--max-parallel 4
Response: If budget would be exceeded, the pipeline pauses and reports: "Budget checkpoint: 82,400 tokens used. Continue (remaining budget: 17,600)? [y/n]"
User: "use RLM to analyze this 8,000-line migration file for rollback risk"
Action:
aiwg rlm-search "identify any irreversible operations with no rollback path" \
--source db/migrations/0099_big_schema.sql \
--depth 2
Response: Preps the single file into ~40 chunks, fans out, synthesizes. Reports all DROP, TRUNCATE, and ALTER TABLE ... DROP COLUMN statements with line numbers.
User: "quick RLM search: where is the database connection string set?"
Action:
aiwg rlm-search "where is the database connection string configured?" \
--source . \
--depth 1 \
--max-parallel 8
Response: Forces synthesis at depth 1 — faster but may miss cross-chunk context. Reports results within a single fanout pass.
If the user's intent is ambiguous:
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.