Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

dpla/dpla-orchestrator

Name: dpla-orchestrator
Author: dpla

docs/ai-context/archive/cursor-skills/dpla-orchestrator/SKILL.md

npx skillsauth add dpla/ingestion3 dpla-orchestrator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

DPLA Orchestrator

Purpose

Run and monitor the Python orchestrator that drives the full ingestion pipeline (harvest → mapping → enrichment → JSONL → anomaly → S3 sync) for one or more hubs, with Slack notifications and parallel execution.

When to Use

"Run the orchestrator"
"Parallel ingest"
"Ingest status"
"Run hubs X, Y, Z"
"Orchestrator dry-run"
"Retry failed hubs"
"What's running in the orchestrator?"

Environment: Always run source .env from repo root before running the orchestrator so JAVA_HOME, SLACK_WEBHOOK, and other vars are set. Ensure the fat JAR is current: from repo root run source .env then sbt assembly before starting the orchestrator (or confirm no Scala changes since last build). Full checklist: AGENTS.md § Environment and build.

Always Use the Project Venv

# From repo root; source .env for JAVA_HOME, SLACK_WEBHOOK, etc.
source .env
./venv/bin/python -m scheduler.orchestrator.main [options]

Do not use system python3; use ./venv/bin/python so dependencies and environment are correct.

Main Commands

| Goal | Command | |------|--------| | Current month, all scheduled hubs | ./venv/bin/python -m scheduler.orchestrator.main | | Specific hubs | ./venv/bin/python -m scheduler.orchestrator.main --hub=wisconsin,p2p | | Parallel (2–3 hubs at once) | ./venv/bin/python -m scheduler.orchestrator.main --hub=wi,va,mn --parallel=3 | | Specific month | ./venv/bin/python -m scheduler.orchestrator.main --month=2 | | Preview only (no run) | ./venv/bin/python -m scheduler.orchestrator.main --dry-run | | Retry last run's failures | ./venv/bin/python -m scheduler.orchestrator.main --retry-failed | | Skip harvest (reuse data) | ./venv/bin/python -m scheduler.orchestrator.main --hub=wisconsin --skip-harvest | | Skip S3 sync | ./venv/bin/python -m scheduler.orchestrator.main --hub=wisconsin --skip-s3-sync |

Checking Status

Per-hub status is written to logs/status/<hub>.status (JSON). Use the status script:

# Table view
./scripts/status/ingest-status.sh

# Auto-refresh (e.g. every 30s)
./scripts/status/ingest-status.sh --watch

# Specific hubs
./scripts/status/ingest-status.sh wisconsin p2p

# Verbose (stage history, durations)
./scripts/status/ingest-status.sh -v

# JSON (for scripting)
./scripts/status/ingest-status.sh --json

Key Locations

| Resource | Path | |----------|------| | Orchestrator entry | scheduler/orchestrator/main.py | | Config | scheduler/orchestrator/config.py; .env for SLACK_WEBHOOK, JAVA_HOME | | Per-hub status | logs/status/<hub>.status | | Escalation reports | data/escalations/failures-<run_id>.md | | Email drafts (after run) | logs/hub-emails-<run_id>/ |

Long-Running Runs

Harvests can run 12–24 hours. Use tmux or nohup so the run survives disconnection:

# tmux (recommended; reattach with: tmux attach -t ingest)
tmux new -s ingest
cd /path/to/ingestion3 && source .env
./venv/bin/python -m scheduler.orchestrator.main --hub=wisconsin,p2p --parallel=2
# Ctrl-B, D to detach

# nohup (fire and forget)
nohup ./venv/bin/python -m scheduler.orchestrator.main --hub=wi,p2p --parallel=2 \
  > logs/orchestrator-$(date +%Y%m%d_%H%M%S).log 2>&1 &

Pipeline Stages (per hub)

Prepare (file hubs: download from S3)
Harvest
Mapping
Enrichment
JSONL export
Anomaly detection → S3 sync

Slack notifications go to #tech-alerts (and hub-complete to #tech when configured). Failures are written to data/escalations/.

Reference

Full runbook: GOLDEN_PATH.md
Agent runbooks and notify policy: AGENTS.md

dpla/dpla-orchestrator

docs/ai-context/archive/cursor-skills/dpla-orchestrator/SKILL.md

Run or monitor the DPLA Python ingest orchestrator. Use when the user says run orchestrator, parallel ingest, ingest status, run hubs, orchestrator dry-run, or retry failed hubs. Covers venv, main entry point, status script, and logs.

33 stars

development

Updated May 26, 2026

$ install --global

skillsauth

npx skillsauth add dpla/ingestion3 dpla-orchestrator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 26, 2026, 3:38 AM196.4s1 file scanned

SKILL.md

name:: dpla-orchestrator
description:: Run or monitor the DPLA Python ingest orchestrator. Use when the user says run orchestrator, parallel ingest, ingest status, run hubs, orchestrator dry-run, or retry failed hubs. Covers venv, main entry point, status script, and logs.

DPLA Orchestrator

Purpose

When to Use

"Run the orchestrator"
"Parallel ingest"
"Ingest status"
"Run hubs X, Y, Z"
"Orchestrator dry-run"
"Retry failed hubs"
"What's running in the orchestrator?"

Always Use the Project Venv

# From repo root; source .env for JAVA_HOME, SLACK_WEBHOOK, etc.
source .env
./venv/bin/python -m scheduler.orchestrator.main [options]

Do not use system python3; use ./venv/bin/python so dependencies and environment are correct.

Main Commands

Checking Status

Per-hub status is written to logs/status/<hub>.status (JSON). Use the status script:

# Table view
./scripts/status/ingest-status.sh

# Auto-refresh (e.g. every 30s)
./scripts/status/ingest-status.sh --watch

# Specific hubs
./scripts/status/ingest-status.sh wisconsin p2p

# Verbose (stage history, durations)
./scripts/status/ingest-status.sh -v

# JSON (for scripting)
./scripts/status/ingest-status.sh --json

Key Locations

Long-Running Runs

Harvests can run 12–24 hours. Use tmux or nohup so the run survives disconnection:

# tmux (recommended; reattach with: tmux attach -t ingest)
tmux new -s ingest
cd /path/to/ingestion3 && source .env
./venv/bin/python -m scheduler.orchestrator.main --hub=wisconsin,p2p --parallel=2
# Ctrl-B, D to detach

# nohup (fire and forget)
nohup ./venv/bin/python -m scheduler.orchestrator.main --hub=wi,p2p --parallel=2 \
  > logs/orchestrator-$(date +%Y%m%d_%H%M%S).log 2>&1 &

Pipeline Stages (per hub)

Prepare (file hubs: download from S3)
Harvest
Mapping
Enrichment
JSONL export
Anomaly detection → S3 sync

Slack notifications go to #tech-alerts (and hub-complete to #tech when configured). Failures are written to data/escalations/.

Reference

Full runbook: GOLDEN_PATH.md
Agent runbooks and notify policy: AGENTS.md

Related Skills

dpla/dpla-hub-info

data-ai

VerifiedTrustedCommunity

Show key i3.conf config for a hub (provider, harvest.type, harvest.endpoint, schedule, email, setlist). Use when user asks for hub config, harvest type/endpoint, who gets emails, schedule months, or OAI setlist details.

35SKILL.mdUpdated Apr 16, 2026

dpla/dpla-community-webs-ingest

development

VerifiedTrustedCommunity

Run Community Webs ingest. Use when the user says harvest community-webs, run community-webs ingest, export community webs, or process community webs DB.

35SKILL.mdUpdated Apr 15, 2026

dpla/dpla-community-webs-ingest

dpla/dpla-verify-and-notify

testing

VerifiedTrustedCommunity

Verify ingest outcomes and send failure or status notifications to Slack or [email protected]. Use when the user asks to verify the ingest, check if it succeeded, notify about a failure, or post to tech-alerts.

33SKILL.mdUpdated May 26, 2026

dpla/dpla-verify-and-notify

dpla/dpla-staged-report

business

VerifiedTrustedCommunity

Report which hubs have new JSONL staged in S3 for a given month, and optionally post the report to Slack. Use when user asks what hubs are staged/ready for indexing, /ingest staged, or what changed this month in S3.

33SKILL.mdUpdated May 26, 2026

dpla/dpla-staged-report

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/dpla/ingestion3.git

# Copy into Claude Code skills folder (global)
cp -r ingestion3/docs/ai-context/archive/cursor-skills/dpla-orchestrator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

dpla/ingestion3

33 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT