agentic/code/frameworks/research-complete/skills/research-archive/SKILL.md
Package research artifacts for long-term archival
npx skillsauth add jmagly/aiwg research-archiveInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Package research artifacts for long-term preservation following archival best practices.
When invoked, create archival packages:
Identify Artifacts
Validate Integrity
Create Archive Package
Generate Archival Metadata
Verify Package
Store and Register
.aiwg/research/archives/[ref-id or --all] - Specific paper or entire corpus (required)--format [bagit|zip|tar] - Archive format (default: bagit)--output [path] - Custom output location (default: .aiwg/research/archives/)--verify - Perform integrity verification after creation--compression [none|gzip|bzip2] - Compression level (default: gzip)--include-notes - Include literature notes in package--metadata-only - Create metadata package without PDFsBagIt is the Library of Congress standard for digital preservation:
REF-022-archive/
├── bagit.txt # BagIt declaration
├── bag-info.txt # Package metadata
├── manifest-sha256.txt # Checksums for data files
├── tagmanifest-sha256.txt # Checksums for tag files
└── data/
├── REF-022.pdf # Source paper
├── REF-022-autogen.md # Finding document
├── metadata.yaml # Extracted metadata
├── provenance.yaml # Provenance records
└── quality-assessment.yaml # GRADE assessment
# Archive single paper in BagIt format
/research-archive REF-022
# Archive with verification
/research-archive REF-022 --verify
# Archive entire corpus
/research-archive --all --format bagit
# Archive with custom output
/research-archive REF-022 --output /backup/research-archives/
# Create metadata-only archive for sharing
/research-archive REF-022 --metadata-only --format zip
# Archive multiple papers
/research-archive REF-001 REF-013 REF-022 --format tar
Creating Archive: REF-022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Collecting artifacts
✓ Source PDF: .aiwg/research/sources/REF-022.pdf (2.4 MB)
✓ Finding: .aiwg/research/findings/REF-022-autogen.md (12 KB)
✓ Metadata: Extracted from frontmatter
✓ Provenance: .aiwg/research/provenance/records/REF-022-acquisition.yaml
✓ Quality: .aiwg/research/quality-assessments/REF-022-assessment.yaml
✓ Literature notes: .aiwg/research/literature-notes/REF-022-notes.md
Step 2: Validating integrity
✓ PDF checksum verified: a1b2c3d4e5f6...
✓ All files present and intact
✓ Metadata complete
✓ Provenance chain validated
Step 3: Creating BagIt package
✓ BagIt structure created
✓ Files copied to data/ directory
✓ SHA-256 checksums generated
✓ bag-info.txt created with metadata
✓ Package size: 2.5 MB
Step 4: Generating archival metadata
✓ Dublin Core record created
✓ PREMIS preservation metadata added
✓ Inventory generated: 6 files
Step 5: Verifying package
✓ BagIt validation passed
✓ All checksums verified
✓ Package structure correct
✓ Test extraction successful
Step 6: Registering archive
✓ Saved to: .aiwg/research/archives/REF-022-archive-20260203.bag
✓ Registered in archival index
✓ Archival report generated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Archive created successfully!
Package: .aiwg/research/archives/REF-022-archive-20260203.bag
Format: BagIt (Library of Congress standard)
Size: 2.5 MB
Files: 6
Contents:
- REF-022.pdf (source paper)
- REF-022-autogen.md (finding document)
- metadata.yaml (frontmatter + enrichment)
- provenance.yaml (acquisition + documentation history)
- quality-assessment.yaml (GRADE assessment)
- literature-notes.md (synthesis notes)
Verification: PASSED
Retrieval Instructions:
1. Extract: bagit.py --validate REF-022-archive-20260203.bag
2. Restore to corpus: /research-restore REF-022-archive-20260203.bag
Archive Report: .aiwg/research/archives/REF-022-archive-20260203-report.md
Each archive includes comprehensive metadata:
# bag-info.txt (BagIt metadata)
Source-Organization: AIWG Research Corpus
Organization-Address: https://github.com/jmagly/aiwg
Contact-Name: AIWG Archival Agent
Contact-Email: [email protected]
External-Description: Research paper archive for REF-022 (AutoGen)
Bagging-Date: 2026-02-03
Bag-Size: 2.5 MB
Payload-Oxum: 2621440.6
External-Identifier: REF-022
Internal-Sender-Identifier: ai-writing-guide/research-corpus
Internal-Sender-Description: AIWG Research Framework
# Dublin Core metadata
dc:title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
dc:creator: Wu, Qingyun; Bansal, Gagan; Zhang, Jieyu; et al.
dc:date: 2023
dc:identifier: 10.48550/arXiv.2308.08155
dc:type: Conference Paper
dc:format: application/pdf
dc:language: en
# PREMIS preservation metadata
premis:objectIdentifier: REF-022
premis:originalName: REF-022.pdf
premis:fixity: sha256:a1b2c3d4e5f6...
premis:dateCreatedByApplication: 2026-02-03T12:00:00Z
premis:preservationLevel: bit-level
All archives are tracked in .aiwg/research/archives/archive-index.yaml:
archives:
- archive_id: REF-022-archive-20260203
ref_id: REF-022
created_at: "2026-02-03T14:30:00Z"
format: bagit
size_bytes: 2621440
file_count: 6
checksum: "sha256:xyz789..."
location: ".aiwg/research/archives/REF-022-archive-20260203.bag"
verified: true
last_verified: "2026-02-03T14:30:15Z"
Archive entire corpus:
/research-archive --all
Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Archiving Entire Research Corpus
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Found 47 papers in corpus
Progress: [████████████████████] 47/47 (100%)
Summary:
✓ 47 papers archived
✓ Total size: 142.8 MB
✓ All packages verified
✓ Archival index updated
Archive Bundle: .aiwg/research/archives/corpus-archive-20260203.tar.gz
Manifest: .aiwg/research/archives/corpus-manifest-20260203.yaml
Individual Archives:
REF-001-archive-20260203.bag
REF-002-archive-20260203.bag
...
REF-047-archive-20260203.bag
Archived packages can be restored using /research-restore:
/research-restore REF-022-archive-20260203.bag
Archives follow best practices from:
All archives undergo validation:
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.