agentic/code/frameworks/research-complete/skills/corpus-export/SKILL.md
Package corpus subsets as distribution archives. Filter by cluster, topic, REF range, or custom expression; bundles PDFs, analysis, citations, and BibTeX into tar.gz.
npx skillsauth add jmagly/aiwg corpus-exportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Package corpus subsets as distribution archives. Selects papers by cluster, topic, REF range, or custom filter and bundles all artifacts (PDF, analysis doc, citation sidecar, web source, BibTeX) into a portable archive with manifest.
/corpus-export--cluster <name>Select all papers in a named cluster (from /research-gap-detect).
/corpus-export --cluster "Agentic Canon"
--refs <range>Explicit REF range or list. Supports ranges, multi-ranges, and individual IDs.
/corpus-export --refs REF-016:REF-024,REF-121
/corpus-export --refs REF-016,REF-018,REF-024
--topic <name>Select all papers tagged with a specific topic.
/corpus-export --topic "GUI Agents"
--filter <expr>Custom filter expression (frontmatter field comparisons).
/corpus-export --filter "year>=2023 AND incoming>=10"
/corpus-export --filter "grade=High AND tag:reproducibility"
--output <path> (optional)Output archive path. Default: .aiwg/research/exports/corpus-<selector>-<date>.tar.gz.
--format tar.gz|zip (optional)Archive format. Default: tar.gz.
--include (optional, repeatable)Artifact types to include. Defaults: pdf,analysis,citations,bibtex.
Available: pdf, text, web, analysis, citations, bibtex, metadata, provenance.
--dry-run (optional)List what would be included without creating the archive.
Resolve the selection criteria to a list of REF-XXX identifiers:
--cluster: look up cluster in citation-network index, return member REFs--refs: parse range expression--topic: scan findings frontmatter for matching tags--filter: evaluate expression against frontmatterReport resolved selection:
Selection: "Agentic Canon" cluster
Papers: 17 (REF-001, REF-016, REF-018, REF-024, ...)
For each selected REF, gather the configured artifact types from canonical locations:
REF-016:
✓ PDF: sources/pdfs/full/REF-016-autogen.pdf (2.4 MB)
✓ Analysis: findings/REF-016-autogen.md (287 lines)
✓ Citations: documentation/citations/REF-016.md (43 outgoing, 12 incoming)
✓ BibTeX: citations/bibtex/REF-016.bib
✗ Web: no web source (PDF primary)
✓ Metadata: sources/metadata/REF-016.yaml
Flag missing artifacts:
REF-299:
✗ PDF: MISSING (acquisition failed)
✓ Analysis: findings/REF-299-stub.md (22 lines — STUB)
...
Write a MANIFEST.md to the archive root describing the export:
# Corpus Export Manifest
**Date**: 2026-04-13
**Selector**: --cluster "Agentic Canon"
**Papers**: 17
**Total size**: 48.3 MB
## Contents
| REF | Title | Year | GRADE | PDF | Analysis | Citations |
|-----|-------|------|-------|-----|----------|-----------|
| REF-016 | AutoGen | 2023 | High | ✓ | 287 lines | 43/12 |
| REF-018 | Multi-Agent Debate | 2024 | High | ✓ | 312 lines | 28/17 |
...
## Missing Artifacts
- REF-299: PDF missing (acquisition failed)
- REF-312: Analysis doc is a skeleton (<40 lines)
## Provenance
Generated by `corpus-export` v1.0 from corpus at:
- Fixity manifest: .aiwg/research/fixity-manifest.json (checksum: abc123...)
- Citation graph: indices/citation-network.md (generated 2026-04-13T10:00Z)
Create the archive with structure:
corpus-agentic-canon-2026-04-13.tar.gz
├── MANIFEST.md
├── pdfs/
│ ├── REF-016-autogen.pdf
│ ├── REF-018-multi-agent-debate.pdf
│ └── ...
├── findings/
│ ├── REF-016-autogen.md
│ └── ...
├── citations/
│ ├── REF-016.md
│ └── ...
├── bibtex/
│ ├── REF-016.bib
│ └── all.bib # concatenated bibliography
└── README.md # extraction + usage instructions
Corpus Export Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Selector: --cluster "Agentic Canon"
Papers selected: 17
Artifacts bundled: 68 files
Missing artifacts: 2 (reported in MANIFEST.md)
Archive: .aiwg/research/exports/corpus-agentic-canon-2026-04-13.tar.gz
Size: 48.3 MB
SHA-256: abc123def456...
Contents:
17 PDFs (45.1 MB)
17 analysis docs (1.2 MB)
17 citation sidecars (0.8 MB)
17 BibTeX entries + all.bib (50 KB)
1 MANIFEST.md (4 KB)
1 README.md (2 KB)
Share a cluster with collaborators without sharing the entire corpus.
/corpus-export --cluster "Agentic Canon"
Package the corpus state referenced by a paper for reproducibility.
/corpus-export --refs REF-016:REF-024 --include pdf,analysis,citations,provenance
Export everything on a specific topic for a focused review.
/corpus-export --topic "Evaluation" --filter "year>=2023"
Export only high-quality sources.
/corpus-export --filter "grade=High"
| Component | Relationship |
|-----------|-------------|
| research-gap-detect | Provides --cluster names |
| corpus-index-build | Provides topic and metadata for selection |
| research-quality-audit | Flags missing/skeleton artifacts in manifest |
| research-cite | Generates BibTeX entries bundled in export |
| Media curator /acquire | Source of PDF files packaged into export |
# Export a named cluster
/corpus-export --cluster "Agentic Canon"
# Export a REF range
/corpus-export --refs REF-016:REF-024,REF-121
# Export by topic
/corpus-export --topic "GUI Agents"
# Filter: recent high-grade papers with many citations
/corpus-export --filter "year>=2023 AND grade=High AND incoming>=10"
# Preview without creating archive
/corpus-export --cluster "Agentic Canon" --dry-run
# Minimal export (analysis docs only)
/corpus-export --topic "Reproducibility" --include analysis,citations
# Custom output path
/corpus-export --refs REF-001:REF-100 --output /tmp/first-100.tar.gz
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.