docs/ai-context/archive/cursor-skills/dpla-ingest-debug/SKILL.md
Debug and fix DPLA hub ingestion failures (harvest/mapping/enrichment/jsonl/s3-sync/anomaly). Use when user asks why a hub failed, to debug an ingest failure, check an escalation report, or retry a failed hub/stage.
npx skillsauth add dpla/ingestion3 dpla-ingest-debugInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quickly identify what stage failed, find the relevant logs/escalation report, apply a targeted fix, and re-run only the necessary steps.
For any commands that depend on project env (especially the orchestrator), run source .env first so JAVA_HOME, DPLA_DATA, I3_CONF, and SLACK_WEBHOOK are available.
ls -lt data/escalations/ | head
# then open the relevant failures-<run_id>.md
./scripts/status/ingest-status.sh --watch
# or for one hub:
./scripts/status/ingest-status.sh <hub>
ls -lt logs/ | head
Look for one of: harvest, mapping, enrichment, jsonl, sync, anomaly.
If you have only a hub name and “it failed”, the quickest approach is:
logs/status/<hub>.status (JSON), andlogs/.All commands below assume you’re running from repo root.
Harvest failed:
./scripts/harvest.sh <hub>
Mapping/enrichment/jsonl failed (re-run remap):
./scripts/remap.sh <hub>
S3 sync failed (or you want to re-sync after a successful run):
./scripts/s3-sync.sh <hub>
Orchestrator retry (failed hubs from last run):
./venv/bin/python -m scheduler.orchestrator.main --retry-failed
Orchestrator retry (one hub):
./venv/bin/python -m scheduler.orchestrator.main --hub=<hub>
pgrep -fl 'sbt|java.*ingestion3'kill <pid> (avoid broad pkill patterns unless you’re sure).After re-running, verify _SUCCESS markers and counts:
# Examples
ls "$DPLA_DATA/<hub>/harvest"/*/_SUCCESS
ls "$DPLA_DATA/<hub>/mapping"/*/_SUCCESS
ls "$DPLA_DATA/<hub>/jsonl"/*/_SUCCESS
cat "$DPLA_DATA/<hub>/mapping"/*/_SUMMARY | head
data-ai
Show key i3.conf config for a hub (provider, harvest.type, harvest.endpoint, schedule, email, setlist). Use when user asks for hub config, harvest type/endpoint, who gets emails, schedule months, or OAI setlist details.
development
Run Community Webs ingest. Use when the user says harvest community-webs, run community-webs ingest, export community webs, or process community webs DB.
testing
Verify ingest outcomes and send failure or status notifications to Slack or [email protected]. Use when the user asks to verify the ingest, check if it succeeded, notify about a failure, or post to tech-alerts.
business
Report which hubs have new JSONL staged in S3 for a given month, and optionally post the report to Slack. Use when user asks what hubs are staged/ready for indexing, /ingest staged, or what changed this month in S3.