plugins/media-curator/skills/verify-archive/SKILL.md
Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance
npx skillsauth add jmagly/aiwg verify-archiveInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Detect bit rot, tampering, and transfer errors in media archives through cryptographic checksum verification. Maintain provenance records and fixity information for long-term preservation.
Media archives degrade over time through:
This command generates self-verifying checksum manifests, performs integrity verification, and optionally creates W3C PROV provenance records with PREMIS fixity metadata.
| Parameter | Required | Description |
|-----------|----------|-------------|
| archive_path | Yes | Path to media archive directory |
| --generate | No | Generate new CHECKSUMS.sha256 manifest |
| --verify | No | Verify existing manifest against files |
| --provenance | No | Generate PROVENANCE.jsonld with PREMIS fixity |
| --fix | No | Regenerate manifest after archive changes |
Mutually exclusive modes: Use --generate for initial setup, --verify for routine checks, --fix after making changes.
Self-verifying manifest with cryptographic integrity protection:
# MANIFEST_HASH: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Generated: 2026-02-14T18:45:22.387654321Z
# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum
1a2b3c4d... ./audio/episode-001.opus
5e6f7g8h... ./audio/episode-002.opus
9i0j1k2l... ./video/recording-2026-02-14.mp4
3m4n5o6p... ./playlists/favorites.m3u
7q8r9s0t... ./README.md
Format specification:
# MANIFEST_HASH: <sha256> - SHA-256 hash of manifest content (lines 4+)# Generated: <timestamp> - ISO 8601 UTC timestamp with nanosecond precision# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum - Self-verification command<hash> <path> - File checksums in sha256sum formatSelf-verification property: Modifying any hash entry invalidates the manifest hash. The manifest is tamper-evident.
Coverage: ALL files in the archive are checksummed:
Exclusions: Only CHECKSUMS.sha256 itself is excluded to prevent circular dependency.
Verify manifest has not been tampered with (sub-second):
cd /path/to/archive
# Extract manifest hash from header
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
# Compute hash of manifest content (lines 4+)
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
# Compare
if [ "$EXPECTED" = "$ACTUAL" ]; then
echo "✓ Manifest integrity verified"
else
echo "✗ Manifest has been tampered with"
exit 1
fi
Use case: Daily automated checks to detect manifest corruption without verifying all files.
Verify all files match their checksums (minutes to hours depending on archive size):
cd /path/to/archive
# First verify manifest integrity
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
if [ "$EXPECTED" != "$ACTUAL" ]; then
echo "✗ Manifest integrity check failed - stopping"
exit 1
fi
echo "✓ Manifest integrity verified"
# Then verify all files
tail -n +4 CHECKSUMS.sha256 | sha256sum -c
Use case: Weekly/monthly deep verification to detect bit rot.
Show only files that failed verification:
cd /path/to/archive
tail -n +4 CHECKSUMS.sha256 | sha256sum -c --quiet
Exit codes:
0 - All files verified successfully1 - One or more files failed verificationUse case: Cron jobs and automated monitoring.
All timestamps use ISO 8601 UTC with nanosecond precision:
Format: YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ
Example: 2026-02-14T18:45:22.387654321Z
Generation command:
date -u +%Y-%m-%dT%H:%M:%S.%NZ
Requirements:
Z), never local timezoneaiwg verify-archive /path/to/archive --generate
Steps:
CHECKSUMS.sha256)CHECKSUMS.sha256VERIFY.md with human-readable instructionsBash implementation:
ARCHIVE_PATH="/path/to/archive"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ)
cd "$ARCHIVE_PATH"
# Generate checksums (exclude manifest itself)
find . -type f ! -name "CHECKSUMS.sha256" -print0 | \
sort -z | \
xargs -0 sha256sum > /tmp/checksums.tmp
# Compute manifest hash
MANIFEST_HASH=$(sha256sum /tmp/checksums.tmp | awk '{print $1}')
# Write final manifest with self-verifying header
{
echo "# MANIFEST_HASH: $MANIFEST_HASH"
echo "# Generated: $TIMESTAMP"
echo "# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum"
cat /tmp/checksums.tmp
} > CHECKSUMS.sha256
rm /tmp/checksums.tmp
echo "✓ Generated CHECKSUMS.sha256 with $MANIFEST_HASH"
aiwg verify-archive /path/to/archive --verify
Steps:
MANIFEST_HASH from headeraiwg verify-archive /path/to/archive --fix
Use case: After adding/removing/modifying files in the archive.
Steps:
CHECKSUMS.sha256 to CHECKSUMS.sha256.bak--generate)Example output:
Backed up existing manifest to CHECKSUMS.sha256.bak
Regenerating checksums...
Changes detected:
Added: ./audio/episode-003.opus
Modified: ./README.md (hash changed)
Removed: ./audio/episode-001-draft.opus
✓ Generated new CHECKSUMS.sha256
Human-readable verification instructions placed in archive root:
# Archive Integrity Verification
This archive contains a self-verifying checksum manifest for detecting corruption, tampering, or transfer errors.
## Quick Verification (Manifest Integrity)
Verify the manifest has not been tampered with (sub-second):
\`\`\`bash
cd "$(dirname "$0")"
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
[ "$EXPECTED" = "$ACTUAL" ] && echo "✓ Manifest integrity verified" || echo "✗ Manifest tampered"
\`\`\`
## Full Verification (All Files)
Verify all files match their checksums:
\`\`\`bash
cd "$(dirname "$0")"
tail -n +4 CHECKSUMS.sha256 | sha256sum -c
\`\`\`
## Archive Information
- **Generated**: {TIMESTAMP}
- **File count**: {FILE_COUNT}
- **Total size**: {TOTAL_SIZE}
- **Manifest hash**: {MANIFEST_HASH}
## Recommended Verification Schedule
- **Daily**: Quick manifest integrity check
- **Weekly**: Full file verification
- **After transfer**: Full verification immediately after copying/downloading
- **Before backup**: Verify source integrity before creating backup
## If Verification Fails
1. **Manifest integrity failure**: Manifest has been tampered with or corrupted. Regenerate from source.
2. **File verification failure**: File has been corrupted or modified. Restore from backup or regenerate.
## Regeneration
After making changes to the archive, regenerate the manifest:
\`\`\`bash
aiwg verify-archive . --fix
\`\`\`
---
*Self-verifying checksums using SHA-256 (NIST FIPS 180-4)*
Template variables:
{TIMESTAMP} - ISO 8601 generation timestamp{FILE_COUNT} - Number of files in manifest{TOTAL_SIZE} - Total archive size in human-readable format{MANIFEST_HASH} - SHA-256 hash of manifest contentW3C PROV-O format with PREMIS fixity metadata:
{
"@context": {
"prov": "http://www.w3.org/ns/prov#",
"premis": "http://www.loc.gov/premis/rdf/v3/",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@graph": [
{
"@id": "urn:uuid:ARCHIVE_UUID",
"@type": ["prov:Entity", "premis:IntellectualEntity"],
"prov:generatedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
},
"premis:hasFixity": {
"@type": "premis:Fixity",
"premis:hasMessageDigest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"premis:hasMessageDigestAlgorithm": "SHA-256",
"premis:hasMessageDigestOriginator": "aiwg verify-archive v2026.2.6",
"premis:fixityCheckDateTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
}
},
"schema:contentUrl": "./CHECKSUMS.sha256"
},
{
"@id": "urn:uuid:ACTIVITY_UUID",
"@type": "prov:Activity",
"prov:startedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:20.123456789Z"
},
"prov:endedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
},
"prov:wasAssociatedWith": {
"@id": "urn:aiwg:agent:media-curator"
}
},
{
"@id": "urn:aiwg:agent:media-curator",
"@type": "prov:SoftwareAgent",
"schema:name": "AIWG Media Curator",
"schema:softwareVersion": "2026.2.6"
}
]
}
Key relationships:
premis:hasFixity - Links entity to fixity informationpremis:hasMessageDigest - SHA-256 hash with sha256: prefixpremis:fixityCheckDateTime - When verification was performedprov:wasAssociatedWith - Agent that performed verificationUse case: Long-term digital preservation requiring audit trails and provenance chains.
| Standard | Purpose | Reference | |----------|---------|-----------| | SHA-256 | Cryptographic hash function | NIST FIPS 180-4 | | ISO 8601 | Timestamp format | ISO 8601:2019 | | PREMIS 3.0 | Preservation metadata | Library of Congress | | W3C PROV-O | Provenance ontology | W3C Recommendation 2013 | | JSON-LD 1.1 | Linked data format | W3C Recommendation 2020 |
Missing archive: Exit with error if archive path does not exist.
No CHECKSUMS.sha256: For --verify, exit with error. For --generate, create new manifest.
Hash mismatch: Report which files failed, exit code 1.
Permission errors: Report files that cannot be read, continue verification for accessible files.
Corrupt manifest: If manifest cannot be parsed, exit with error and suggest regeneration.
When running --generate or --fix:
When running --provenance:
Initial setup:
aiwg verify-archive ~/media/podcast-archive --generate
Routine verification:
aiwg verify-archive ~/media/podcast-archive --verify
After adding new episodes:
aiwg verify-archive ~/media/podcast-archive --fix
Generate with provenance:
aiwg verify-archive ~/media/podcast-archive --generate --provenance
Automated monitoring (cron):
0 2 * * * aiwg verify-archive /media/archives/podcast --verify --quiet || echo "Archive verification failed" | mail -s "Alert" [email protected]
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.