plugins/research/skills/induct-research/SKILL.md
Induct research sources (issue, file, directory, or URI) into a research corpus — reads, annotates, and files structured induction tasks. Like address-issues but for research.
npx skillsauth add jmagly/aiwg induct-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process one or more research sources — an issue, a file, a directory of papers, or a URI — and file structured induction tasks into a research repository so nothing gets lost. The analogue of address-issues for research corpora.
As of ADR-021,
induct-researchdelegates core ingest mechanics to the semantic memory kernel.
Delegation pattern:
induct-research retains its public name and interactive research-induction UXmemory-ingest --consumer research-completeingestRequires: ["grade-quality"])ingestRequires: ["provenance"])@-mentions per consumer schemaWhat changed: The ingest pipeline (source reading, page creation, index update, log append) is now handled by memory-ingest. This skill adds the research-specific quality and citation layers on top.
Backward compatibility: No UX changes. Existing invocations work identically.
@agentic/code/addons/semantic-memory/skills/memory-ingest/SKILL.md
.aiwg/research/queue//induct-research <target> → direct invocation<target> (required)What to induct. Three formats accepted:
| Format | Example | Behavior |
|--------|---------|----------|
| File path | .aiwg/research/queue/ | Read all .md files in the directory |
| Single file | .aiwg/research/queue/ref-dapper.md | Induct one source |
| URI | https://arxiv.org/abs/2307.09288 | Fetch and induct the paper at that URL |
| Directory glob | papers/**/*.pdf | Induct all matched files recursively |
| Issue reference | gitea:roctinam/research#42 | Read the issue body as a research stub |
--repo <dest> (optional)Where to file induction tasks. Accepts the same three formats as --induct-research in issue-planner:
| Format | Example | Behavior |
|--------|---------|----------|
| File path | --repo .aiwg/research/inducted/ | Write task .md files locally |
| URI | --repo https://git.integrolabs.net/roctinam/research | File issues to that Gitea/GitHub/Jira instance |
| Named MCP | --repo gitea | Use mcp__gitea__issue_write directly |
| Named MCP | --repo codehound | Register in Hound search index |
Falls back to AIWG_RESEARCH_REPO env var if --repo is omitted.
--dry-run (optional)List what would be inducted and where, without writing or filing anything.
--priority high|medium|low (optional)Override the suggested priority for all inducted items. Default: assessed per source.
--tag <topic> (optional)Apply a topic tag to all inducted items. Repeatable: --tag llm --tag evaluation.
--recursive (optional)When target is a directory, recurse into subdirectories. Default: top-level only.
<target> — determine input type (file, directory, URI, issue ref).md, .pdf, .txt, .yaml filesFound 9 sources to induct:
3 Markdown stubs (.aiwg/research/queue/)
4 PDF papers (papers/2024/)
2 URI references
Skipping 1 (already inducted: REF-042)
CRITICAL: Never write analysis docs from metadata or abstracts alone. The pipeline is: acquire full content → read full content → write analysis doc.
This was learned from a session where 88 of 120 papers were inducted as shallow stubs written from arXiv abstract pages — not the actual papers. See #817.
For each source, ensure full content is available before analysis:
For PDFs / full papers:
/research-acquire <url> --extract-text to download the PDF
to sources/pdfs/full/ and extract full text to sources/text/acquisition-failed in frontmatter,
file a stub with status: pending-acquisition, and skip to next source.
Do NOT write a full analysis doc from the abstract alone.For URIs (web sources):
sources/web/<slug>.html/research-acquire to get the actual PDF — do not analyze from
the landing page HTMLFor Markdown stubs (from issue-planner queue files):
For issue references:
Only after full content is acquired, run analysis:
For PDFs / full papers (with full text available):
.aiwg/research/ for related REF-XXX files)For web sources (with full content saved):
Quality gate: If the resulting analysis doc is under 80 lines, flag it as a potential stub. Either the source content wasn't fully read or the analysis was superficial. Consider re-running with explicit instructions to read the full text.
For each analyzed source, file one induction task using the standard template.
Induction task body:
## Reference Induction
**Source**: <URL, file path, or issue reference>
**Type**: <paper | blog | docs | repo | spec | stub | issue>
**GRADE**: <A | B | C | D | unassessed>
**Priority**: <high | medium | low>
**Tags**: <topic1>, <topic2>
## Summary
<2–3 sentences: what this source covers and why it's relevant>
## Key Claims / Findings
- <Specific claim or finding>
- <Specific claim or finding>
- <Specific claim or finding>
## Relevance to Corpus
<How this relates to existing research — cross-references to REF-XXX if applicable>
## Induction Checklist
- [ ] Read full source
- [ ] Extract key insights as Zettelkasten notes
- [ ] Cross-reference with existing corpus
- [ ] Assign REF-XXX identifier
- [ ] Tag with topic taxonomy
- [ ] Assess with /research-quality
- [ ] Archive with /research-archive (if paper/PDF)
- [ ] Add to citation graph with /research-cite
## Origin
- Surfaced by: <issue-planner | manual | other>
- Surfaced for: <objective or context>
- Induction date: <YYYY-MM-DD>
Filing based on --repo target:
induct-<slug>.md to destination directorymcp__gitea__issue_write with label research-inductiongh issue create --label research-inductionPOST /rest/api/2/issue with issue type TaskAfter creating each new literature note, update the broader corpus with bidirectional cross-references. This is what makes a corpus compound rather than just accumulate.
For each newly inducted source:
Search existing findings for topically related REF-XXX notes:
Add "Related Sources" cross-references:
## Related Sources section listing existing REF-XXX notes and how they relate (confirms, contradicts, extends, prerequisite)## Related Sources section with relationship typeFlag contradictions or confirmations:
contradiction marker to both notesconfirms markerUpdate synthesis documents in .aiwg/research/synthesis/:
Example cross-reference entry:
## Related Sources
- **REF-034** — Confirms: both identify prompt injection as the primary attack vector for LLM agents
- **REF-042** — Extends: this source adds quantitative benchmarks missing from REF-042's qualitative analysis
- **REF-067** — Contradicts: claims agent sandboxing overhead is <5%, while REF-067 measured 15-20%
Batch optimization: When inducting multiple sources in a batch, defer cross-referencing until all new notes are created, then run a single fan-out pass across all new + existing notes. This avoids redundant searches.
Skip conditions: Skip cross-referencing when:
--dry-run is set## Induction Summary
| # | Source | Type | Priority | Filed At |
|---|--------|------|----------|----------|
| 1 | RFC 9110 HTTP Semantics | spec | high | gitea#301 |
| 2 | "Dapper" Google Tracing Paper | paper | high | gitea#302 |
| 3 | opentelemetry.io/docs | docs | medium | gitea#303 |
| 4 | github.com/jaegertracing/jaeger | repo | medium | gitea#304 |
| 5 | arxiv.org/abs/2012.15161 | paper | low | gitea#305 |
...
Inducted: 9
Skipped: 1 (already present)
Destination: gitea:roctinam/research
Next steps:
- /research-acquire <URL> for any paper that needs PDF download
- /research-document to annotate inducted sources
- /research-quality to score GRADE for each inducted item
resolve_target(target):
if target starts with "http://" or "https://":
host = extract_host(target)
if host matches known_gitea_instances: use mcp__gitea__issue_write
if host == "github.com": use gh CLI
if host matches jira pattern: use Jira REST API
else: fetch as web resource, induct as URI reference
elif target matches "gitea:<owner>/<repo>#<n>":
fetch issue via mcp__gitea__issue_read
elif target is a named MCP service ("gitea", "codehound", "github"):
use that service's write/register tool directly
elif target is a file path:
if path is directory: glob for .md/.pdf/.txt files
if path is a file: induct single source
When target is a directory, process all supported files:
/induct-research papers/2024/ --repo gitea --tag llm --recursive
⏳ Scanning papers/2024/ (recursive)...
Found 23 PDF files
Found 7 Markdown stubs
Found 2 YAML records
Deduplicating against gitea:roctinam/research...
Skipping 4 (already inducted)
⏳ Analyzing 28 sources (parallel agents)...
✓ Batch A (7 sources): complete
✓ Batch B (7 sources): complete
✓ Batch C (7 sources): complete
✓ Batch D (7 sources): complete
⏳ Filing 28 induction tasks to gitea:roctinam/research...
✓ Inducted: 28 | Skipped: 4 | Total: 32
issue-planner --induct-research <target> calls this skill's Phase 3 (filing) logic directly after Phase 2 research synthesis. The references are the URLs and sources discovered during the parallel research pass.
/induct-research can also be invoked standalone to process:
/induct-research .aiwg/research/queue//induct-research https://arxiv.org/abs/2307.09288/induct-research ~/Downloads/papers/ --repo giteainduct-research <target>
│
├── Phase 1: Source discovery
│ ├── File/directory: glob + read
│ ├── URI: WebFetch + classify
│ └── Issue ref: mcp__gitea__issue_read or gh CLI
├── Phase 2: Source acquisition (acquire before analyze)
│ ├── PDF/paper → /research-acquire --extract-text
│ ├── URI → WebFetch full page → /research-acquire if paper
│ ├── Stub with URL → acquire referenced source
│ └── Skip analysis if acquisition fails (mark pending-acquisition)
├── Phase 2.5: Per-source analysis (on full content only)
│ ├── PDF agent → read full text, extract claims + GRADE
│ ├── Web agent → read full saved page, assess credibility
│ ├── Stub agent → parse relevance summary
│ └── Quality gate: flag docs under 80 lines as potential stubs
├── Phase 3: Induction task filing
│ ├── File path → write .md task files
│ ├── Gitea URI/MCP → mcp__gitea__issue_write
│ ├── GitHub URI → gh issue create
│ └── Codehound MCP → register in search index
├── Phase 3.5: Cross-reference fan-out
│ ├── Search existing findings by tags + claims
│ ├── Add bidirectional Related Sources sections
│ ├── Flag contradictions / confirmations
│ └── Update synthesis documents
└── Phase 4: Summary report
This skill's persistence flows through resolveStorage('research'). On the default fs backend the research corpus lives at .aiwg/research/. Heavy artifacts (papers, archived sources) can move to a secondary drive by setting roots.research in .aiwg/storage.config (one of the headline #934 use cases).
aiwg research-store path # resolved root
aiwg research-store list --prefix sources/
aiwg research-store get sources/paper-123.md
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.