agentic/code/addons/rlm/skills/chunk/SKILL.md
Split a file into overlapping chunks suitable for parallel fanout processing and emit a manifest describing each chunk
npx skillsauth add jmagly/aiwg chunkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Split a file into overlapping chunks suitable for parallel fanout processing. Produces numbered chunk files and a manifest.json describing each chunk's location, line range, and overlap metadata.
Alternate expressions and non-obvious activations:
| Pattern | Example | Action |
|---------|---------|--------|
| Chunk file | "chunk this file" | Apply semantic-boundary strategy, write to .aiwg/rlm-chunks/ |
| Size override | "chunk into 100-line pieces" | --size 100 |
| Overlap override | "chunk with 50-line overlap" | --overlap 50 |
| Fixed count | "split into fixed-size chunks" | --strategy fixed-count |
| JSON output | "chunk as JSON" | --format json |
| Custom directory | "chunk into tmp/chunks/" | --output tmp/chunks/ |
| Dry run | "how would this file be chunked?" | Read file, describe strategy, no writes |
When triggered:
Parse arguments — identify source file, strategy, size, overlap, format, and output directory from user input.
Read the source file — determine total line count and content type (code, markdown, prose, config).
Select chunking strategy:
semantic-boundary (default) — split at headings (##, ###), blank lines between sections, function/class definitions, or import blocks. Preserves logical units.fixed-count — fixed number of lines per chunk regardless of content. Use when content has no clear structure.adaptive — measure content density (code density, average line length) and shrink chunk size for dense regions, expand for sparse ones.Apply overlap — each chunk includes the last --overlap lines of the previous chunk and the first --overlap lines of the next. This ensures queries that span chunk boundaries are answerable from either side.
Write output:
chunk-0001.txt, chunk-0002.txt, etc.chunks.json with each chunk's content embedded as a field.manifest.json regardless of format.Report result — print chunk count, output directory, and manifest path.
{
"source": "src/auth/middleware.ts",
"source_lines": 842,
"strategy": "semantic-boundary",
"chunk_size": 200,
"overlap": 20,
"format": "text",
"output_dir": ".aiwg/rlm-chunks/middleware-ts/",
"created_at": "2026-04-01T14:23:00Z",
"chunks": [
{
"id": "chunk-0001",
"file": ".aiwg/rlm-chunks/middleware-ts/chunk-0001.txt",
"start_line": 1,
"end_line": 218,
"overlap_start": 0,
"overlap_end": 20,
"boundary_type": "function",
"boundary_label": "validateToken()"
},
{
"id": "chunk-0002",
"file": ".aiwg/rlm-chunks/middleware-ts/chunk-0002.txt",
"start_line": 199,
"end_line": 412,
"overlap_start": 20,
"overlap_end": 20,
"boundary_type": "class",
"boundary_label": "AuthMiddleware"
}
]
}
<file> — Source file to chunk (required)--size N — Target chunk size in lines (default: 200). For adaptive, this is the base size before density adjustments.--overlap N — Lines of overlap on each side of a chunk boundary (default: 20)--strategy semantic-boundary|fixed-count|adaptive — Chunking strategy (default: semantic-boundary)--format json|text — Output format (default: text)--output <dir> — Output directory (default: .aiwg/rlm-chunks/<filename>/)User: "chunk src/auth/middleware.ts"
Action:
aiwg chunk src/auth/middleware.ts
Response: "Split middleware.ts (842 lines) into 5 chunks using semantic-boundary strategy. Overlap: 20 lines. Manifest: .aiwg/rlm-chunks/middleware-ts/manifest.json"
User: "chunk this file into 100-line pieces with 30-line overlap for the RLM fanout"
Action:
aiwg chunk src/core/parser.ts --size 100 --overlap 30
Response: "Split parser.ts (1,240 lines) into 14 chunks. 100-line target, 30-line overlap. Manifest: .aiwg/rlm-chunks/parser-ts/manifest.json"
User: "split config/nginx.conf into fixed chunks"
Action:
aiwg chunk config/nginx.conf --strategy fixed-count --size 150
Response: "Split nginx.conf (620 lines) into 5 fixed-count chunks. Manifest: .aiwg/rlm-chunks/nginx-conf/manifest.json"
User: "chunk the migration SQL file as JSON"
Action:
aiwg chunk db/migrations/0042_schema.sql --format json --output .aiwg/rlm-chunks/migration/
Response: "Split 0042_schema.sql (380 lines) into 2 JSON chunks. Output: .aiwg/rlm-chunks/migration/chunks.json. Manifest: .aiwg/rlm-chunks/migration/manifest.json"
If the user's intent is ambiguous:
.aiwg/rlm-chunks/ or a custom directory?"data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.