docs/ai-context/archive/cursor-skills/dpla-s3-and-aws/SKILL.md
Run S3 sync and AWS data operations for DPLA ingestion using the correct profile and scripts. Use when the user says sync to S3, check S3 sync, upload to S3, AWS bucket, or check JSONL sync.
npx skillsauth add dpla/ingestion3 dpla-s3-and-awsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Perform S3 and AWS data operations for the ingestion pipeline using the project's profile and scripts so credentials and buckets are correct.
Environment: Source .env from repo root before running scripts that need DPLA_DATA or AWS env (e.g. source .env).
All AWS CLI commands must use:
aws ... --profile dpla
Scripts in scripts/ use AWS_PROFILE=dpla by default (see scripts/common.sh / SCRIPTS.md). When invoking aws directly, always add --profile dpla.
Use the project script (handles anomaly detection and paths):
./scripts/s3-sync.sh <hub>
# Optional: sync a specific subdir
./scripts/s3-sync.sh <hub> <subdir>
The script uses the correct bucket and prefix; do not bypass it with raw aws s3 sync unless you have a specific reason and use --profile dpla.
To see how local JSONL exports compare to S3:
./scripts/status/check-jsonl-sync.sh
Uses AWS profile via script env; see scripts/SCRIPTS.md for options.
The orchestrator (and s3-sync.sh when used in that flow) runs anomaly checks before syncing. If counts or failure rates change sharply, sync may be blocked (critical) or proceed with a warning. Escalation and Slack alerts are sent; see AGENTS.md and GOLDEN_PATH.md.
If you must run aws directly (e.g. list bucket, copy one file):
aws s3 ls s3://bucket-name/ --profile dpla
aws s3 cp local s3://bucket/key --profile dpla
Never omit --profile dpla.
| Resource | Path | |----------|------| | Script reference | scripts/SCRIPTS.md (s3-sync.sh, check-jsonl-sync.sh) | | Agent / notify policy | AGENTS.md |
data-ai
Show key i3.conf config for a hub (provider, harvest.type, harvest.endpoint, schedule, email, setlist). Use when user asks for hub config, harvest type/endpoint, who gets emails, schedule months, or OAI setlist details.
development
Run Community Webs ingest. Use when the user says harvest community-webs, run community-webs ingest, export community webs, or process community webs DB.
testing
Verify ingest outcomes and send failure or status notifications to Slack or [email protected]. Use when the user asks to verify the ingest, check if it succeeded, notify about a failure, or post to tech-alerts.
business
Report which hubs have new JSONL staged in S3 for a given month, and optionally post the report to Slack. Use when user asks what hubs are staged/ready for indexing, /ingest staged, or what changed this month in S3.