agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md
W3C PROV-O patterns for tracking media derivation chains and production history
npx skillsauth add jmagly/aiwg Provenance TrackingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
W3C PROV-compliant provenance tracking for media archives using the Entity-Activity-Agent model in JSON-LD format.
A media file — the thing being tracked.
A transformation — download, transcode, tag, extract audio.
A tool or person — yt-dlp, ffmpeg, opustags, the curator.
Media files often go through multiple transformations:
YouTube video (Entity)
→ downloaded by yt-dlp (Activity, Agent: yt-dlp)
→ raw .mkv file (Entity)
→ audio extracted by ffmpeg (Activity, Agent: ffmpeg)
→ raw .opus file (Entity)
→ tagged by opustags (Activity, Agent: opustags)
→ final .opus file (Entity)
Each step is recorded with timestamps, tool versions, and parameters.
{
"@context": {
"prov": "http://www.w3.org/ns/prov#",
"schema": "http://schema.org/",
"premis": "http://www.loc.gov/premis/rdf/v3/",
"dc": "http://purl.org/dc/terms/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@graph": [
{
"@id": "urn:archive:entity:collection",
"@type": "prov:Collection",
"dc:title": "Artist Name - Complete Collection",
"prov:generatedAtTime": "2026-01-26T01:33:51.302755851Z",
"schema:numberOfItems": 1109,
"schema:size": "94GB",
"premis:hasFixity": {
"@type": "premis:Fixity",
"premis:messageDigestAlgorithm": {
"@id": "http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/sha256"
},
"premis:messageDigestOriginator": "sha256sum (GNU coreutils)",
"schema:url": "CHECKSUMS.sha256",
"schema:numberOfItems": 1109
}
},
{
"@id": "urn:archive:activity:acquisition",
"@type": "prov:Activity",
"prov:startedAtTime": "2026-01-24T15:00:00.000000000Z",
"prov:endedAtTime": "2026-01-26T01:33:51.302755851Z",
"prov:wasAssociatedWith": [
{"@id": "urn:archive:agent:yt-dlp"},
{"@id": "urn:archive:agent:ffmpeg"},
{"@id": "urn:archive:agent:opustags"}
],
"prov:generated": {"@id": "urn:archive:entity:collection"}
},
{
"@id": "urn:archive:agent:yt-dlp",
"@type": ["prov:SoftwareAgent", "prov:Agent"],
"schema:name": "yt-dlp",
"schema:softwareVersion": "2024.12.23",
"schema:description": "Video/audio downloader"
},
{
"@id": "urn:archive:agent:ffmpeg",
"@type": ["prov:SoftwareAgent", "prov:Agent"],
"schema:name": "ffmpeg",
"schema:softwareVersion": "7.1",
"schema:description": "Audio/video transcoder"
},
{
"@id": "urn:archive:agent:opustags",
"@type": ["prov:SoftwareAgent", "prov:Agent"],
"schema:name": "opustags",
"schema:softwareVersion": "1.9.0",
"schema:description": "Opus metadata tagger"
}
]
}
For detailed tracking, record provenance per file:
{
"@id": "urn:archive:entity:car-radio-opus",
"@type": "prov:Entity",
"dc:title": "Car Radio",
"schema:encodingFormat": "audio/opus",
"prov:wasDerivedFrom": {
"@id": "urn:archive:entity:car-radio-mkv"
},
"prov:wasGeneratedBy": {
"@id": "urn:archive:activity:extract-audio-001",
"@type": "prov:Activity",
"prov:used": {"@id": "urn:archive:entity:car-radio-mkv"},
"prov:wasAssociatedWith": {"@id": "urn:archive:agent:ffmpeg"},
"prov:startedAtTime": "2026-01-25T10:15:30.000000000Z",
"schema:description": "ffmpeg -i car-radio.mkv -vn -acodec libopus -b:a 128k car-radio.opus"
}
}
All timestamps MUST use ISO 8601 UTC with nanosecond precision:
Format: YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ
Example: 2026-01-26T01:33:51.302755851Z
Command: date -u +%Y-%m-%dT%H:%M:%S.%NZ
yt-dlp --version
ffmpeg -version | head -1 | awk '{print $3}'
opustags --version 2>&1 | head -1
date -u +%Y-%m-%dT%H:%M:%S.%NZ
find . -type f ! -name "CHECKSUMS.sha256" ! -name "PROVENANCE.jsonld" | wc -l
du -sh . | cut -f1
| Standard | Version | Usage | |----------|---------|-------| | W3C PROV-O | 2013-04-30 | Provenance ontology | | PREMIS | 3.0 | Preservation fixity metadata | | Dublin Core | 1.1 | Descriptive metadata | | Schema.org | Latest | Structured data | | JSON-LD | 1.1 | Serialization format | | ISO 8601 | 2019 | Timestamps |
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.