agentic/code/frameworks/research-complete/skills/research-acquire/SKILL.md
Download research papers and extract metadata
npx skillsauth add jmagly/aiwg research-acquireInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Download research papers from public repositories and extract metadata.
When invoked, perform automated paper acquisition:
Identify Source
.aiwg/research/sources/Download Paper
.aiwg/research/sources/[ref-id].pdfExtract Metadata
Generate Frontmatter
Extract Full Text (default, unless --no-extract-text)
.aiwg/research/sources/text/REF-XXX.txtfull_text_available: false in frontmatterCreate Finding Document
.aiwg/research/findings/REF-XXX-[slug].md from templatePost-Acquisition
.aiwg/research/acquisition-log.yaml[identifier] - DOI, arXiv ID, or URL (required)--output [path] - Custom output location (default: auto-generate)--ref-id [REF-XXX] - Specific REF-XXX identifier (default: auto-assign)--extract-text - Extract full text to .txt file for analysis (default: enabled; use --no-extract-text to skip)--no-metadata - Skip metadata enrichment--force - Re-download even if paper exists# Acquire by DOI
/research-acquire 10.48550/arXiv.2308.08155
# Acquire by arXiv ID
/research-acquire arXiv:2308.08155
# Acquire with custom identifier
/research-acquire https://arxiv.org/pdf/2308.08155.pdf --ref-id REF-022
# Acquire with full text extraction
/research-acquire 10.1145/3377811.3380330 --extract-text
Acquiring Paper: 10.48550/arXiv.2308.08155
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Resolving identifier
✓ DOI resolved to arXiv:2308.08155
✓ Paper not found in corpus
Step 2: Downloading PDF
✓ Downloaded from arxiv.org (2.4 MB)
✓ Saved to .aiwg/research/sources/REF-022.pdf
✓ Checksum: a1b2c3d4e5f6...
Step 3: Extracting metadata
✓ Title: AutoGen: Enabling Next-Gen LLM Applications...
✓ Authors: Wu, Q., Bansal, G., Zhang, J., et al. (9 authors)
✓ Year: 2023
✓ Source: arXiv preprint
✓ Citations: 234 (as of 2026-02-03)
Step 4: Creating finding document
✓ Generated .aiwg/research/findings/REF-022-autogen.md
✓ Frontmatter populated
✓ Template sections added
Step 5: Updating corpus
✓ Added to fixity manifest
✓ Updated INDEX.md
✓ Logged acquisition
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Acquisition complete!
REF-ID: REF-022
Title: AutoGen: Enabling Next-Gen LLM Applications...
File: .aiwg/research/sources/REF-022.pdf
Finding: .aiwg/research/findings/REF-022-autogen.md
Next Steps:
1. /research-quality REF-022 - Assess evidence quality
2. /research-document REF-022 - Create detailed summary
3. /research-cite REF-022 - Generate citation
All acquisitions create provenance records:
# .aiwg/research/provenance/records/REF-022-acquisition.yaml
entity:
id: "urn:aiwg:artifact:.aiwg/research/sources/REF-022.pdf"
type: "research_paper"
activity:
id: "urn:aiwg:activity:acquisition:REF-022:001"
type: "acquisition"
started_at: "2026-02-03T12:00:00Z"
ended_at: "2026-02-03T12:00:15Z"
agent:
id: "urn:aiwg:agent:research-acquisition-agent"
type: "aiwg_agent"
source:
identifier: "10.48550/arXiv.2308.08155"
url: "https://arxiv.org/pdf/2308.08155.pdf"
This skill's persistence flows through resolveStorage('research'). On the default fs backend the research corpus lives at .aiwg/research/. Heavy artifacts (papers, archived sources) can move to a secondary drive by setting roots.research in .aiwg/storage.config (one of the headline #934 use cases).
aiwg research-store path # resolved root
aiwg research-store list --prefix sources/
aiwg research-store get sources/paper-123.md
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.