docs/ai-context/archive/cursor-skills/dpla-run-ingest/SKILL.md
Execute a single-hub or manual ingest by following the correct runbook and scripts. Use when the user says run ingest for [hub], harvest [hub], remap [hub], or run the full pipeline for [hub]. Ensures harvest type is identified, correct runbook and scripts are used, and outputs are verified.
npx skillsauth add dpla/ingestion3 dpla-run-ingestInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run an ingest for a specific hub using the right runbook and scripts, then verify output. Use this for single-hub or manual runs (for multi-hub/scheduled runs, use the orchestrator instead).
Environment: Scripts that source common.sh (harvest.sh, ingest.sh, remap.sh, etc.) automatically load $I3_HOME/.env when present, so JAVA_HOME, DPLA_DATA, I3_CONF, SLACK_WEBHOOK, etc. are set before the JAR is built or the pipeline runs. You do not need to run source .env separately. Full checklist: AGENTS.md § Environment and build.
./scripts/harvest.sh (or ingest.sh, remap.sh, etc.), run_entry in common.sh runs sbt assembly if the JAR is missing or if any Scala source is newer than the JAR. So "harvest indiana" will use current code without a separate build step. (You can still run sbt assembly first to avoid a build delay on the first harvest.)$I3_CONF, default ~/dpla/code/ingestion3-conf/i3.conf): <hub>.harvest.type. Values: localoai, api, file, nara.file.delta../scripts/ingest.sh <hub>./scripts/harvest.sh <hub>./scripts/remap.sh <hub>./scripts/harvest/nara-ingest.sh <nara-export.zip>_SUCCESS in the step output dirs; _MANIFEST / _SUMMARY for counts../scripts/s3-sync.sh <hub>.--output must be $DPLA_DATA (the data root), never $DPLA_DATA/<hub>. Scripts handle this. OutputHelper builds paths as rootPath / shortName / activity / timestamp-schema../venv/bin/python for Python; use ./scripts/ scripts from repo root. AWS: --profile dpla.# Check a step completed
ls $DPLA_DATA/<hub>/harvest/<timestamped-dir>/_SUCCESS
ls $DPLA_DATA/<hub>/mapping/<timestamped-dir>/_SUCCESS
ls $DPLA_DATA/<hub>/jsonl/<timestamped-dir>/_SUCCESS
# Record counts
cat $DPLA_DATA/<hub>/harvest/<timestamped-dir>/_MANIFEST
cat $DPLA_DATA/<hub>/mapping/<timestamped-dir>/_SUMMARY
Incomplete runs (e.g. _temporary but no _SUCCESS) should be deleted before retrying.
Before running:
sbt assembly so the fat JAR reflects the current code (or confirm no Scala changes since last build).SLACK_WEBHOOK is set (or plan to email [email protected] on failure).After a run:
_SUCCESS files in harvest/, mapping/, enrichment/, jsonl/)../scripts/enrich.sh <hub> then ./scripts/jsonl.sh <hub>.| Resource | Path | |----------|------| | Runbook index and mapping | runbooks/README.md | | Script reference | scripts/SCRIPTS.md | | Agent guide | AGENTS.md | | Config | i3.conf at $I3_CONF | | Debug ingest failures | .cursor/skills/dpla-ingest-debug/SKILL.md |
data-ai
Show key i3.conf config for a hub (provider, harvest.type, harvest.endpoint, schedule, email, setlist). Use when user asks for hub config, harvest type/endpoint, who gets emails, schedule months, or OAI setlist details.
development
Run Community Webs ingest. Use when the user says harvest community-webs, run community-webs ingest, export community webs, or process community webs DB.
testing
Verify ingest outcomes and send failure or status notifications to Slack or [email protected]. Use when the user asks to verify the ingest, check if it succeeded, notify about a failure, or post to tech-alerts.
business
Report which hubs have new JSONL staged in S3 for a given month, and optionally post the report to Slack. Use when user asks what hubs are staged/ready for indexing, /ingest staged, or what changed this month in S3.