plugins/media-curator/skills/gap-documentation/SKILL.md
Document missing content in media archives with standardized GAP-NOTE markers that guide acquisition and measure completeness
npx skillsauth add jmagly/aiwg gap-documentationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A systematic approach to documenting missing content in media archives using standardized gap markers that guide acquisition, measure completeness, and enable parallel collection workflows.
GAP-NOTE.md files serve as placeholders in directories where content is expected but not yet acquired. They transform empty directories from ambiguous voids into actionable instructions, creating a clear separation between "not yet collected" and "intentionally excluded."
This pattern emerged from field testing during music archive curation (issue #81), where agents needed explicit guidance on acquisition priorities and strategies without requiring central coordination.
# GAP-NOTE: {Directory Purpose}
## Expected Content
{What should be here}
## Acquisition Strategy
{How to obtain it}
## Priority
{HIGH | MEDIUM | LOW}
## Sources
- {Source 1}
- {Source 2}
Title Line (# GAP-NOTE: ...)
# GAP-NOTE: BBC Radio Sessions (1990-1995)Expected Content
Acquisition Strategy
Priority
Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqbCreated when establishing directory structure but before content acquisition:
# Create directory and gap marker
mkdir -p "Artist Name/Radio Sessions/BBC Radio 1"
cat > "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" <<'INNER_EOF'
# GAP-NOTE: BBC Radio 1 Sessions
## Expected Content
Lossless audio recordings of all BBC Radio 1 sessions (1990-1995)
with metadata: broadcast date, presenter, tracklist, duration
## Acquisition Strategy
1. Query BBC Sounds API for artist sessions
2. Download via yt-dlp with metadata extraction
3. Transcode to FLAC if necessary
4. Extract and verify tracklist from audio
## Priority
HIGH
## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
INNER_EOF
Agents read GAP-NOTE.md as instructions during acquisition:
# Find high-priority gaps
find . -name "GAP-NOTE.md" -exec grep -l "^## Priority$" {} \; | \
xargs grep -A1 "^## Priority$" | grep "HIGH" | cut -d: -f1
# Read acquisition strategy
grep -A20 "^## Acquisition Strategy$" "path/to/GAP-NOTE.md"
Once content acquired, GAP-NOTE.md is removed or replaced:
# Option 1: Remove when content present
rm "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md"
# Option 2: Replace with catalog
mv "Artist Name/Radio Sessions/BBC Radio 1/GAP-NOTE.md" \
"Artist Name/Radio Sessions/BBC Radio 1/CATALOG.md"
Criteria:
Examples:
Expected timeframe: Hours to days
Criteria:
Examples:
Expected timeframe: Days to weeks
Criteria:
Examples:
Expected timeframe: Weeks to months (or indefinite)
Count remaining gaps to measure collection progress:
# Total gap count
find . -name "GAP-NOTE.md" | wc -l
# Gaps by priority
echo "=== Collection Gaps by Priority ==="
echo -n "HIGH: "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "HIGH"
echo -n "MEDIUM: "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "MEDIUM"
echo -n "LOW: "
grep -r "^## Priority" --include="GAP-NOTE.md" . | grep -A1 "^## Priority$" | grep -c "LOW"
Calculate completion based on gap ratio:
#!/bin/bash
# Calculate archive completeness
TOTAL_DIRS=$(find . -type d | wc -l)
GAP_DIRS=$(find . -name "GAP-NOTE.md" | wc -l)
COMPLETE_DIRS=$((TOTAL_DIRS - GAP_DIRS))
PERCENT=$((COMPLETE_DIRS * 100 / TOTAL_DIRS))
echo "Archive Completeness: $PERCENT%"
echo " Complete: $COMPLETE_DIRS/$TOTAL_DIRS directories"
echo " Gaps remaining: $GAP_DIRS"
Generate structured report for review:
#!/bin/bash
# Generate gap report with locations and priorities
echo "# Archive Gap Report"
echo "Generated: $(date -I)"
echo ""
for priority in HIGH MEDIUM LOW; do
echo "## $priority Priority"
find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
if grep -A1 "^## Priority$" "$file" | grep -q "$priority"; then
dir=$(dirname "$file")
title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
echo "- **$dir**"
echo " $title"
fi
done
echo ""
done
Multiple agents can work on gaps simultaneously by claiming GAP-NOTE files:
# Add agent claim to gap note
echo "" >> "path/to/GAP-NOTE.md"
echo "## Claimed By" >> "path/to/GAP-NOTE.md"
echo "Agent: $AGENT_ID" >> "path/to/GAP-NOTE.md"
echo "Timestamp: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"
Follow the acquisition strategy documented in the GAP-NOTE:
# Read strategy section
STRATEGY=$(sed -n '/^## Acquisition Strategy$/,/^##/p' "path/to/GAP-NOTE.md" | \
grep -v "^##")
# Execute steps (example)
while IFS= read -r step; do
echo "Executing: $step"
# ... implement step execution ...
done <<< "$STRATEGY"
# Verify content acquired
if [ -f "expected-content-file.flac" ]; then
# Move GAP-NOTE to completion log
mkdir -p .archive-logs/completed-gaps
mv "path/to/GAP-NOTE.md" ".archive-logs/completed-gaps/$(date -I)-gap-filled.md"
echo "Gap filled successfully"
else
# Update GAP-NOTE with failure info
echo "" >> "path/to/GAP-NOTE.md"
echo "## Acquisition Attempt" >> "path/to/GAP-NOTE.md"
echo "Failed: $(date -Iseconds)" >> "path/to/GAP-NOTE.md"
echo "Reason: Content not available from listed sources" >> "path/to/GAP-NOTE.md"
fi
# GAP-NOTE: BBC Radio 1 Sessions (1990-1995)
## Expected Content
Lossless audio recordings (FLAC/ALAC) of all BBC Radio 1 sessions
between 1990-1995, including:
- Full session audio
- Broadcast date and presenter name
- Complete tracklist with song titles
- Duration and audio quality metadata
## Acquisition Strategy
1. Query BBC Sounds API: `https://api.bbc.co.uk/sounds/search?q={artist}`
2. Filter results for Radio 1 sessions in date range
3. Download via yt-dlp: `yt-dlp --extract-audio --audio-format flac {url}`
4. Extract metadata using `ffprobe` and validate against expected format
5. Organize into subdirectories by broadcast date
## Priority
HIGH
## Sources
- https://www.bbc.co.uk/sounds/brand/b006wkqb
- https://archive.org/details/bbcradio1sessions
- https://musicbrainz.org (for session metadata validation)
# GAP-NOTE: Album Artwork Collection
## Expected Content
High-resolution album covers (minimum 1400x1400px, JPEG or PNG)
for all studio albums, EPs, and major singles:
- Front cover (required)
- Back cover (optional)
- CD/vinyl labels (optional)
- Embedded artwork extracted from audio files
## Acquisition Strategy
1. Check MusicBrainz Cover Art Archive API
2. Query Discogs API for releases: `https://api.discogs.com/artists/{id}/releases`
3. Download largest available image from each release
4. Fallback: extract embedded artwork from FLAC files using `metaflac --export-picture-to=cover.jpg`
5. Validate dimensions and format
## Priority
MEDIUM
## Sources
- https://coverartarchive.org/release/{mbid}
- https://api.discogs.com/releases/{id}
- Embedded in existing FLAC files
# GAP-NOTE: Interview Archive
## Expected Content
Transcripts of all major interviews (print, radio, video) from 1988-2000:
- Full text transcription
- Interview date and publication/broadcast source
- Interviewer name
- Topic tags (e.g., "album recording", "tour", "influences")
## Acquisition Strategy
1. Search online archives: Rock's Backpages, music magazine databases
2. For video/audio interviews: use Whisper API for transcription
3. Manual transcription for short clips (<5 minutes)
4. Cross-reference dates with tour/album timeline for validation
5. Format as markdown with YAML frontmatter
## Priority
LOW
## Sources
- https://www.rocksbackpages.com
- https://archive.org/details/audio (radio interviews)
- YouTube channels (fan uploads, use Whisper for transcription)
# Function to scaffold new gap note
create_gap_note() {
local dir="$1"
local title="$2"
local priority="${3:-MEDIUM}"
mkdir -p "$dir"
cat > "$dir/GAP-NOTE.md" <<INNER_EOF
# GAP-NOTE: $title
## Expected Content
{Describe what content should be here}
## Acquisition Strategy
1. {Step 1}
2. {Step 2}
## Priority
$priority
## Sources
- {Source URL 1}
INNER_EOF
echo "Created gap note: $dir/GAP-NOTE.md"
}
# Usage
create_gap_note "Artist/Radio Sessions/NPR Tiny Desk" \
"NPR Tiny Desk Concert" \
"HIGH"
# Check all gaps and report status
check_gaps() {
echo "=== Archive Gap Status ==="
echo "Generated: $(date)"
echo ""
find . -name "GAP-NOTE.md" -print0 | sort -z | while IFS= read -r -d '' file; do
dir=$(dirname "$file")
title=$(grep "^# GAP-NOTE:" "$file" | sed 's/# GAP-NOTE: //')
priority=$(grep -A1 "^## Priority$" "$file" | tail -n1 | tr -d ' ')
claimed=$(grep -q "^## Claimed By$" "$file" && echo "[CLAIMED]" || echo "")
printf "%-60s [%s] %s\n" "$dir" "$priority" "$claimed"
done
}
# Archive completed gap notes
archive_completed_gaps() {
mkdir -p .archive-logs/completed-gaps
find . -name "GAP-NOTE.md" -print0 | while IFS= read -r -d '' file; do
dir=$(dirname "$file")
# Check if directory has actual content now
content_count=$(find "$dir" -type f ! -name "GAP-NOTE.md" | wc -l)
if [ "$content_count" -gt 0 ]; then
echo "Gap filled: $dir"
mv "$file" ".archive-logs/completed-gaps/$(date -I)-$(basename "$dir" | tr ' ' '-').md"
fi
done
}
The Media Curator framework can automatically detect gaps:
# Detect expected-but-missing directories
aiwg media-curator detect-gaps --archive /path/to/archive
# Fill high-priority gaps
aiwg media-curator fill-gaps --priority HIGH --parallel 3
# Generate gap report
aiwg media-curator gap-report --format markdown > GAPS.md
@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/catalog-generation.md - Replacing GAP-NOTEs with catalogs@$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/archive-organization.md - Directory structure conventionsdata-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.