Emu_bulk_upload_FMNH/SKILL.md
Help museum insect curators bulk upload specimen data to the Emu database. Maps any input format to Emu's template, matches localities to existing records, finds parent sites, creates bulk upload tables, and walks users through the upload process.
npx skillsauth add brunoasm/my_claude_skills emu-bulk-uploadInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Help users prepare and upload entomological specimen data to the Emu collection database at the Field Museum of Natural History (FMNH). The process starts by mapping whatever data the user provides into Emu's standard format, then works through each Emu module (Sites, Events, Catalog) to match existing records, create new ones, and produce properly formatted tables for bulk upload.
references/Emu_upload_default.xlsx — The Emu upload template (49 columns, 3 modules)references/emu_field_reference.md — Complete field reference with descriptions, export name mappings, and normalization rulesreferences/phase1_sites.md — Sites phase detailed reference (scripts, matching, export guide)At the start of a session, search the working directory for files the user may have already placed there:
.xlsx, .csv, .tsv, .txt, or other tabular formats.csv files that may contain site/event dataIf any are found, list them and confirm with the user:
I found these files in the working directory:
specimen_data.csv(csv, 145 KB)US_sites_export.csv(csv, 12 MB)Which of these should I work with? Is
specimen_data.csvyour specimen data andUS_sites_export.csvan Emu sites export?
Before any work, ask the user:
Do you have Emu bulk upload privileges?
- Yes
- No (not sure = no)
Record the answer. It determines how upload steps are handled later:
Also ask once at session start:
Are the coordinates in your data from your own sampling metadata (i.e. primary coordinates, recorded at the sampling site), or inherited from named places (centroids / generic coordinates)?
- Primary — from sampling metadata
- Inherited / centroid
If primary, the skill applies the unnamed-Precise-Locality rule: each specimen row with coordinates produces an unnamed Precise Locality node whose parent is the most specific named node (Village/Town/etc.). If inherited, coordinates stay attached to the named node directly.
This step takes the user's data in any format and maps it into the Emu template format (references/Emu_upload_default.xlsx).
The user's data may be:
references/emu_field_reference.md to understand all Emu fieldsPresent the mapping as a table for user review:
Here's how I'd map your columns to Emu fields:
| Your column | Emu field | User-friendly name | |---|---|---| | lat | LatLatitude_nesttab | Latitude | | long | LatLongitude_nesttab | Longitude | | state | LocProvinceStateTerritory_tab | Province/State | | county | LocDistrictCountyShire_tab | County | | locality | LocPreciseLocation | Precise Location | | species | IdeTaxonRef_tab.irn | Taxon | | collector | ColParticipantRef_tab(2).irn | Collectors | | unmapped | — | (columns with no Emu equivalent) |
Does this look right? Any corrections?
Wait for user confirmation before proceeding. Unmapped columns are preserved but not used in Emu uploads.
If the file already has the 3-row header structure (Row 1 = friendly names, Row 2 = Emu field names, Row 3 = example) with correct Emu field names, skip the mapping and confirm:
Your file is already in Emu template format. I found N data rows with columns for [Sites, Events, Catalog] fields.
Once the mapping is confirmed, write a Python script on the fly to transform the user's data into the Emu template format:
FFCCFFCC) for Sites, gray (FFC0C0C0) for Events, tan (FFFFCC99) for CatalogSave the script to /tmp/emu_transform.py and run it. Save the output to /tmp/emu_user_data.xlsx.
Reference scripts/parse_user_data.py for patterns on reading xlsx files with openpyxl and applying cell colors.
After transformation, report:
Match user localities to existing Emu site records, create new records where needed, and obtain site IRNs.
Detailed reference: references/phase1_sites.md
From the transformed user xlsx (/tmp/emu_user_data.xlsx), extract the site columns (green-filled: hierarchy, elevation, coordinates, site number).
python3 scripts/parse_user_data.py /tmp/emu_user_data.xlsx /tmp/emu_user_sites.json
Report: number of specimens, site columns detected, sample data.
Ask for the Emu sites export (CSV). If identified during file discovery, use it directly. If the user doesn't have one:
references/phase1_sites.md § "Choosing search criteria")references/phase1_sites.md § "Quick summary"references/phase1_sites.md § "Step-by-step screenshot guide"). Show one step, wait for confirmation, then proceed.Parse the export:
python3 scripts/parse_emu_export.py <emu_export.csv> /tmp/emu_index.json
Report: records loaded, coordinate coverage.
python3 scripts/deduplicate_sites.py /tmp/emu_user_sites.json /tmp/emu_dedup.json
python3 scripts/match_sites.py /tmp/emu_dedup.json /tmp/emu_index.json /tmp/emu_match.json
Report: "Your N specimens contain M unique sites."
Review matches using your judgment (see references/phase1_sites.md § "Match review guidelines"):
python3 scripts/find_parents.py /tmp/emu_match.json /tmp/emu_index.json /tmp/emu_parents.json
Review parent results (see references/phase1_sites.md § "Parent rules"):
The output also includes a chain array for each unmatched site listing every intermediate level the user specified (default status needs_creation). For each intermediate, call match_named_node (from scripts/match_sites.py) against the Emu index — if it finds an existing node with no lower-level data, flip that chain entry's status to exists and record the IRN.
For each distinct (level, name) pair remaining as needs_creation, call:
python3 scripts/osm_rank_lookup.py "<name>" "<country>" --parent "<parent name>" [--lat L --lon M]
confidence == high → auto-apply the suggested rank, note it in the summary.confidence == medium or low, or no result → present candidates to the user for confirmation. If no OSM hit at all, fall back to pd2/pd3/pd4 by depth.needs_creation node, show its rank, name, and proposed parent, and ask the user to approve before including it in the upload plan.The coordinates for "Paradise" — do they refer to the named area broadly (town center / approximate location), or to a specific sampling site within the area?
- Broadly (keep as a single named node with coords)
- Specific site (split into named parent + unnamed Precise Locality child) Apply the chosen treatment to that chain.
Assemble /tmp/emu_site_chains.json (see scripts/generate_bulk_upload.py docstring). Then:
python3 scripts/generate_bulk_upload.py /tmp/emu_site_chains.json /tmp/emu_upload/
Output is always CSV (sites_upload_batch_N.csv). Multi-batch output uses __PENDING_Bx_Ry__ placeholders in PolParentRef.irn; after each batch uploads and new IRNs come back, substitute and proceed to the next batch.
Handle upload based on user privileges (see references/phase1_sites.md § "Upload instructions").
Once all sites have IRNs:
python3 scripts/finalize_user_table.py /tmp/emu_user_data.xlsx <irn_mapping.json> /tmp/emu_user_data_with_irns.xlsx
Create irn_mapping.json with format: {"row_irn_map": {"4": "123456", ...}}
Report the output file path.
Match collecting events (date, collector, method, habitat) to site IRNs from Phase 1.
Reference: references/phase2_events.md (to be created)
Event fields are gray-colored columns in the template. See references/emu_field_reference.md § "Collection Events module".
Upload specimen catalog records linking to event records from Phase 2.
Reference: references/phase3_catalog.md (to be created)
Catalog fields are tan/orange-colored columns in the template. See references/emu_field_reference.md § "Catalog module".
Taxonomy, collector parties, and other dependent records.
Present numbered choices on separate lines:
1. Yes
2. No
For near-match review:
**Site 3**: "Cochise County" vs "Chochise County" (score 87, likely typo)
1. Accept match (use Emu record IRN 45678)
2. Reject match (create new record)
/tmp/ with a descriptive name/tmp/emu_user_specimens.xlsx"/tmp/emu_upload/sites_upload_batch_1.csv"development
Place lab supply orders from member requests — route by request header to Amazon Business, the Pritzker Lab Google Form, or a direct vendor; stage the cart/form and stop for human review before any purchase. Use when the user pastes an order request or asks to order supplies, place an order, or fill the Pritzker form.
tools
Convert scanned PDFs and document images into clean Markdown using docling for layout (figures, tables, reading order) plus a vision-language OCR model. Use when a user needs high-quality OCR of scanned documents, historical literature, or photographed pages — preserving multi-column reading order, diacritics, special characters, and figures. Supports local vLLM/Ollama servers and cloud vision APIs (OpenAI, Anthropic). Assumes an OCR backend already exists.
tools
Engages structured analysis to explore multiple perspectives and context dependencies before responding. Use when users ask confirmation-seeking questions, make leading statements, request binary choices, or when feeling inclined to quickly agree or disagree without thorough consideration.
tools
Generate phylogenies from genome assemblies using BUSCO/compleasm-based single-copy orthologs with scheduler-aware workflow generation