docs/ai-context/archive/cursor-skills/dpla-oai-harvest-watch/SKILL.md
Watch an OAI harvest log and report set-by-set progress + ETA (for hubs using harvest.setlist). Use when user asks to watch OAI harvest progress, track sets, estimate completion, or monitor a long OAI harvest.
npx skillsauth add dpla/ingestion3 dpla-oai-harvest-watchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Uses scripts/status/watch-oai-harvest.py to parse ListRecords&set=... transitions in a harvest log file.
# Export vars from .env so child processes (harvest + watcher) see them
set -a
source .env
set +a
HUB=<hub>
LOG="logs/harvest-${HUB}-$(date +%Y%m%d_%H%M%S).log"
CONF="${I3_CONF:-$HOME/dpla/code/ingestion3-conf/i3.conf}"
# Start harvest with log capture
./scripts/harvest.sh "$HUB" 2>&1 | tee "$LOG"
# In another terminal: watch set progress (total auto-parsed from i3.conf when possible)
./venv/bin/python scripts/status/watch-oai-harvest.py --log="$LOG" --conf="$CONF" --hub="$HUB"
# If the hub does not have harvest.setlist in i3.conf, you can provide a manual total:
# ./venv/bin/python scripts/status/watch-oai-harvest.py --log="$LOG" --total=<n>
set= (i.e., a harvest.setlist in i3.conf).ListRecords&set=..., use orchestrator status instead: ./scripts/status/ingest-status.sh --watch.data-ai
Show key i3.conf config for a hub (provider, harvest.type, harvest.endpoint, schedule, email, setlist). Use when user asks for hub config, harvest type/endpoint, who gets emails, schedule months, or OAI setlist details.
development
Run Community Webs ingest. Use when the user says harvest community-webs, run community-webs ingest, export community webs, or process community webs DB.
testing
Verify ingest outcomes and send failure or status notifications to Slack or [email protected]. Use when the user asks to verify the ingest, check if it succeeded, notify about a failure, or post to tech-alerts.
business
Report which hubs have new JSONL staged in S3 for a given month, and optionally post the report to Slack. Use when user asks what hubs are staged/ready for indexing, /ingest staged, or what changed this month in S3.