FindEmail Skill

When to Use

User says "find email", "enrich emails", "get contacts" AND names a specific .xlsx filename
User wants to add email addresses to a prospect/company list, named explicitly
DO NOT auto-trigger just because there are .xlsx files in the input folder. The skill requires an explicit filename to prevent accidental processing of the wrong file.

How to Invoke

A specific filename is REQUIRED. Bare filenames are resolved against the fixed input folder C:\Users\dansk\Claude\FindEmail\ — no full path needed.

python find_email.py "myfile.xlsx"                # Resolved against C:\Users\dansk\Claude\FindEmail\myfile.xlsx
python find_email.py "C:\full\path\file.xlsx"     # Absolute path still works (override)
python find_email.py "myfile.xlsx" --no-title-filter   # Generic mode — return any contacts
python find_email.py "myfile.xlsx" --dry-run           # Show stats only, no API calls
python find_email.py "myfile.xlsx" --max-contacts 3    # Override contacts per company (default: 2)
python find_email.py "myfile.xlsx" --yes               # Skip confirmation prompt
python find_email.py "myfile.xlsx" --no-ddg            # Skip DuckDuckGo, use Apollo for domain lookup
python find_email.py "myfile.xlsx" -y --max-contacts 1 # Combine flags

# Single domain lookup (no Excel file needed — file argument omitted):
python find_email.py --domain acme.com                        # Find ETS contacts at acme.com
python find_email.py --domain acme.com --company "Acme Ltd"  # Add company name for better results
python find_email.py --domain acme.com --no-title-filter     # Any contacts, not just ETS roles
python find_email.py --domain acme.com --max-contacts 3      # Get up to 3 contacts

If you run python find_email.py with NO file and NO --domain, the script lists the available files in the input folder and exits with an error — by design, no auto-pick.

CLI Flags

| Flag | Description | |------|-------------| | --no-title-filter | Disable ETS title filtering — returns any contacts | | --max-contacts N | Override max contacts per company (default: 2) | | --dry-run | Scan file, show stats, exit without API calls | | --yes / -y | Skip the "Continue?" confirmation prompt | | --no-ddg | Skip DuckDuckGo domain lookup — falls back to Apollo org enrich |

Title Filtering (v1.1)

By default the skill filters Apollo results to ETS-relevant job titles only:

Environmental & Regulatory: ETS Compliance, Regulatory Affairs, Environmental Manager, Carbon Manager
Finance & Risk: Group Treasurer, Tax Director, Risk Manager, Financial Controller
Energy & Procurement: Energy Procurement, Energy Manager, Utilities Manager, CPO, Commodity Buyer
Strategic & ESG: Chief Sustainability Officer, Decarbonization, ESG, Net Zero

Use --no-title-filter for generic prospect lists not related to EU ETS/carbon.

Quality Filters (v1.3, updated v1.7)

Generic email filter: Automatically skips role-based emails (info@, sales@, contact@, support@, admin@, office@, hello@, etc.). Always on.
Domain match validation (v1.7): Domain mismatches are NO LONGER discarded — emails are kept with a comment in the "Comments" column explaining the mismatch (e.g. parent/subsidiary company). User can review and decide.
Both filters report counts in the summary output.

Comments Column (v1.7)

A "Comments" column is added to the output. It flags issues per contact:

Domain mismatch: email domain differs from company domain — may be parent/subsidiary
Apollo locked: contact found but email not revealed — check Apollo credits
Contacts with no issues have an empty Comments cell.

Credit Estimate & Confirmation (v1.3)

Before processing, the script calculates estimated API calls (companies_to_process × 2) and asks Continue? [Y/n]. Use --yes to skip the prompt for automation.

Dry-Run Mode (v1.3)

--dry-run scans the file and reports: total companies, already enriched (skip count), companies to process, estimated API calls. Exits without calling any APIs or modifying files.

Domain-First Pipeline (v1.4)

The pipeline minimizes paid API calls by resolving domains via free methods first, then using Hunter (cheaper) before Apollo (expensive):

Domain resolution (stops at first success):

Spreadsheet domain column (free — already exists)
DuckDuckGo search (free, no API key)
Apollo org enrichment (paid, cached)
guess_domain() heuristic (free)

Email discovery (stops at first success):

Hunter.io domain search (cheaper)
Lusha.com contact search (enabled by default since v1.7)
Apollo people search (expensive — last resort)

Expected savings: ~60-70% fewer Apollo API calls vs v1.3.

API Dependencies

DuckDuckGo (free) — Domain discovery via duckduckgo-search library
Hunter.io (primary for emails) — Domain Search (cheaper per call)
Apollo.io (fallback) — Org Enrichment + People Search (selective use)
API keys must be set as Windows environment variables: APOLLO_API_KEY, HUNTER_API_KEY
Python package: pip install duckduckgo-search

Workflow

Validate API keys exist and work (test call)
Detect .xlsx file in input folder (newest non-backup file)
Create backup: {name}-backup{YYYYMMDD}.xlsx
Smart-detect header row (scan rows 1-10)
Auto-detect columns: company, domain, country, email
Add 4 columns if not present: First Name, Last Name, Email1, Email2
Pre-scan: count companies to process, show estimate, ask confirmation
For each company (max 2 contacts by default):
- Domain: spreadsheet → DDG → Apollo org enrich → guess
- Emails: Hunter first → Apollo people search fallback
- Filter generic emails → validate domain match
- Fill contacts into rows for that company
Incremental save every 50 companies
Print summary with counts, domain source breakdown, and API usage

Input Folder

C:\Users\dansk\Claude\FindEmail\

Drop your .xlsx here. The skill auto-detects the newest non-backup file.

Output Format

Backup: {original}-backup{YYYYMMDD}.xlsx (same folder)
Enriched: {original}-enriched.xlsx (same folder)
6 new columns appended: First Name, Last Name, Email1, Company Domain, Role (Title), Comments

Data Quality (v1.9)

Blocked email domains: Emails from data provider domains (e.g. @zoominfo.com) are automatically filtered out — these are employees of the data provider, not your target contacts. Add more domains to BLOCKED_EMAIL_DOMAINS in the code as needed.

Domain mismatch handling: Emails from a different domain than the target company are kept but flagged with a comment. The email may still be correct (subsidiaries, holding groups, rebranded companies).

No tool gives 100% accuracy — best practice: Apollo/Hunter for leads → LinkedIn to confirm → email verifier (NeverBounce, ZeroBounce) to validate.

Re-run Behavior

If Email1 column already exists, rows with a filled Email1 are skipped to save API credits.

Forensic Bundle (v2.0)

Every run produces a timestamped subfolder under C:\Users\dansk\Claude\FindEmail\FindEmail-Reports\. The bundle is the source of truth for diagnosing the skill's miss rate over time and lets a future implementer (or a re-imported run six months later) replay the exact pipeline state without re-spending API credits.

FindEmail-Reports/
└── 2026-04-27_05-05-38/
    ├── report.txt          — human summary + new failure-class breakdown
    ├── run-config.json     — every CLI flag, skill version, input file sha256
    ├── input-snapshot.xlsx — bytes of the input at run-start (replay-safe)
    ├── output.xlsx         — copy of the enriched output at run-end
    └── trace.jsonl         — one JSON record per company with full pipeline trace

Domain confidence + row coloring (v2.0)

Each resolved domain carries a confidence score based on which resolver produced it:

spreadsheet → 1.00 (Danny supplied — ground truth)
apollo_org → 0.90 (Apollo matched the company)
ddg → 0.70 (DuckDuckGo top organic)
guess → 0.05 (guess_domain() pattern substitution)

Rows in output.xlsx are colored based on domain confidence — applied during format_output() after sorting/dedup so the color tracks the data:

No color when confidence ≥ 0.80 (clean win — the skill did its job)
Yellow (FFEB9C) when 0.30 ≤ confidence < 0.80 (review recommended)
Red (FFC7CE) when confidence < 0.30 (likely false positive — verify before outreach)

The Comments column is only populated when there's something to flag (domain mismatch, low/medium confidence, Apollo lock, generic email). High-confidence wins keep the Comments cell empty — no editorial markup on rows that worked.

Rejection-reason taxonomy (v2.0)

Replaces the lossy single "not_found" bucket. report.txt now prints a failure-class breakdown using these canonical reasons:

| Reason | Meaning | |---|---| | domain_resolution_failed | no resolver returned a domain | | domain_low_confidence | only guess() resolved — too risky to trust | | domain_mismatch_filtered | provider returned email with mismatched domain | | generic_email_only | only role-based emails available | | title_filter_dropped_all | Apollo had candidates but ETS title filter rejected all | | all_providers_zero | all providers returned 0 candidates | | lusha_rate_limited | Lusha quota / rate limit hit | | shell_company | input matched shell-company / SPV heuristic (reserved for v2.1) |

JSONL trace schema (v2.0)

trace.jsonl — one record per company. Key fields:

{
  "run_id": "2026-04-27_05-05-38",
  "company_idx": 4,
  "company_input": "Abbott Laboratories",
  "row_indices": [5, 6, 7],
  "domain_attempts": [
    {"resolver": "ddg", "result": "abbott.com"}
  ],
  "domain_resolved": "abbott.com",
  "domain_resolved_via": "ddg",
  "domain_confidence": 0.70,
  "email_attempts": [
    {"provider": "hunter", "results_count": 3}
  ],
  "winning_provider": "Hunter",
  "contacts_count": 3,
  "contacts": [{"first_name": "...", "last_name": "...", "email": "...", "title": "..."}],
  "status": "found_personal",
  "rejection_reason": null,
  "duration_ms": 740,
  "timestamp": "2026-04-27T05:06:18.123Z",
  "skill_version": "2.0"
}

Designed so a downstream filter (e.g. before SuperAGI import) can reject rows where domain_confidence < 0.30 AND status == "found_personal" — the false-positive class identified in research brief 2026-04-27-research-findemail-forensic-logging.md.

Verification Checklist

[ ] python find_email.py --help — all flags visible
[ ] python find_email.py --dry-run — shows stats, no API calls
[ ] API keys set as env vars and validated
[ ] Backup file created
[ ] Generic emails filtered (info@, sales@, etc.)
[ ] Domain mismatches skipped
[ ] --max-contacts 1 limits to 1 per company
[ ] Confirmation prompt appears before processing (unless --yes)
[ ] Re-run skips already-enriched rows
[ ] Incremental saves work

FindEmail Skill

When to Use

User says "find email", "enrich emails", "get contacts" AND names a specific .xlsx filename
User wants to add email addresses to a prospect/company list, named explicitly
DO NOT auto-trigger just because there are .xlsx files in the input folder. The skill requires an explicit filename to prevent accidental processing of the wrong file.

How to Invoke

A specific filename is REQUIRED. Bare filenames are resolved against the fixed input folder C:\Users\dansk\Claude\FindEmail\ — no full path needed.

python find_email.py "myfile.xlsx"                # Resolved against C:\Users\dansk\Claude\FindEmail\myfile.xlsx
python find_email.py "C:\full\path\file.xlsx"     # Absolute path still works (override)
python find_email.py "myfile.xlsx" --no-title-filter   # Generic mode — return any contacts
python find_email.py "myfile.xlsx" --dry-run           # Show stats only, no API calls
python find_email.py "myfile.xlsx" --max-contacts 3    # Override contacts per company (default: 2)
python find_email.py "myfile.xlsx" --yes               # Skip confirmation prompt
python find_email.py "myfile.xlsx" --no-ddg            # Skip DuckDuckGo, use Apollo for domain lookup
python find_email.py "myfile.xlsx" -y --max-contacts 1 # Combine flags

# Single domain lookup (no Excel file needed — file argument omitted):
python find_email.py --domain acme.com                        # Find ETS contacts at acme.com
python find_email.py --domain acme.com --company "Acme Ltd"  # Add company name for better results
python find_email.py --domain acme.com --no-title-filter     # Any contacts, not just ETS roles
python find_email.py --domain acme.com --max-contacts 3      # Get up to 3 contacts

If you run python find_email.py with NO file and NO --domain, the script lists the available files in the input folder and exits with an error — by design, no auto-pick.

CLI Flags

Title Filtering (v1.1)

By default the skill filters Apollo results to ETS-relevant job titles only:

Environmental & Regulatory: ETS Compliance, Regulatory Affairs, Environmental Manager, Carbon Manager
Finance & Risk: Group Treasurer, Tax Director, Risk Manager, Financial Controller
Energy & Procurement: Energy Procurement, Energy Manager, Utilities Manager, CPO, Commodity Buyer
Strategic & ESG: Chief Sustainability Officer, Decarbonization, ESG, Net Zero

Use --no-title-filter for generic prospect lists not related to EU ETS/carbon.

Quality Filters (v1.3, updated v1.7)

Generic email filter: Automatically skips role-based emails (info@, sales@, contact@, support@, admin@, office@, hello@, etc.). Always on.
Domain match validation (v1.7): Domain mismatches are NO LONGER discarded — emails are kept with a comment in the "Comments" column explaining the mismatch (e.g. parent/subsidiary company). User can review and decide.
Both filters report counts in the summary output.

Comments Column (v1.7)

A "Comments" column is added to the output. It flags issues per contact:

Domain mismatch: email domain differs from company domain — may be parent/subsidiary
Apollo locked: contact found but email not revealed — check Apollo credits
Contacts with no issues have an empty Comments cell.

Credit Estimate & Confirmation (v1.3)

Before processing, the script calculates estimated API calls (companies_to_process × 2) and asks Continue? [Y/n]. Use --yes to skip the prompt for automation.

Dry-Run Mode (v1.3)

--dry-run scans the file and reports: total companies, already enriched (skip count), companies to process, estimated API calls. Exits without calling any APIs or modifying files.

Domain-First Pipeline (v1.4)

The pipeline minimizes paid API calls by resolving domains via free methods first, then using Hunter (cheaper) before Apollo (expensive):

Domain resolution (stops at first success):

Spreadsheet domain column (free — already exists)
DuckDuckGo search (free, no API key)
Apollo org enrichment (paid, cached)
guess_domain() heuristic (free)

Email discovery (stops at first success):

Hunter.io domain search (cheaper)
Lusha.com contact search (enabled by default since v1.7)
Apollo people search (expensive — last resort)

Expected savings: ~60-70% fewer Apollo API calls vs v1.3.

API Dependencies

DuckDuckGo (free) — Domain discovery via duckduckgo-search library
Hunter.io (primary for emails) — Domain Search (cheaper per call)
Apollo.io (fallback) — Org Enrichment + People Search (selective use)
API keys must be set as Windows environment variables: APOLLO_API_KEY, HUNTER_API_KEY
Python package: pip install duckduckgo-search

Workflow

Validate API keys exist and work (test call)
Detect .xlsx file in input folder (newest non-backup file)
Create backup: {name}-backup{YYYYMMDD}.xlsx
Smart-detect header row (scan rows 1-10)
Auto-detect columns: company, domain, country, email
Add 4 columns if not present: First Name, Last Name, Email1, Email2
Pre-scan: count companies to process, show estimate, ask confirmation
For each company (max 2 contacts by default):
- Domain: spreadsheet → DDG → Apollo org enrich → guess
- Emails: Hunter first → Apollo people search fallback
- Filter generic emails → validate domain match
- Fill contacts into rows for that company
Incremental save every 50 companies
Print summary with counts, domain source breakdown, and API usage

Input Folder

C:\Users\dansk\Claude\FindEmail\

Drop your .xlsx here. The skill auto-detects the newest non-backup file.

Output Format

Backup: {original}-backup{YYYYMMDD}.xlsx (same folder)
Enriched: {original}-enriched.xlsx (same folder)
6 new columns appended: First Name, Last Name, Email1, Company Domain, Role (Title), Comments

Data Quality (v1.9)

No tool gives 100% accuracy — best practice: Apollo/Hunter for leads → LinkedIn to confirm → email verifier (NeverBounce, ZeroBounce) to validate.

Re-run Behavior

If Email1 column already exists, rows with a filled Email1 are skipped to save API credits.

Forensic Bundle (v2.0)

FindEmail-Reports/
└── 2026-04-27_05-05-38/
    ├── report.txt          — human summary + new failure-class breakdown
    ├── run-config.json     — every CLI flag, skill version, input file sha256
    ├── input-snapshot.xlsx — bytes of the input at run-start (replay-safe)
    ├── output.xlsx         — copy of the enriched output at run-end
    └── trace.jsonl         — one JSON record per company with full pipeline trace

Domain confidence + row coloring (v2.0)

Each resolved domain carries a confidence score based on which resolver produced it:

spreadsheet → 1.00 (Danny supplied — ground truth)
apollo_org → 0.90 (Apollo matched the company)
ddg → 0.70 (DuckDuckGo top organic)
guess → 0.05 (guess_domain() pattern substitution)

Rows in output.xlsx are colored based on domain confidence — applied during format_output() after sorting/dedup so the color tracks the data:

No color when confidence ≥ 0.80 (clean win — the skill did its job)
Yellow (FFEB9C) when 0.30 ≤ confidence < 0.80 (review recommended)
Red (FFC7CE) when confidence < 0.30 (likely false positive — verify before outreach)

Rejection-reason taxonomy (v2.0)

Replaces the lossy single "not_found" bucket. report.txt now prints a failure-class breakdown using these canonical reasons:

JSONL trace schema (v2.0)

trace.jsonl — one record per company. Key fields:

{
  "run_id": "2026-04-27_05-05-38",
  "company_idx": 4,
  "company_input": "Abbott Laboratories",
  "row_indices": [5, 6, 7],
  "domain_attempts": [
    {"resolver": "ddg", "result": "abbott.com"}
  ],
  "domain_resolved": "abbott.com",
  "domain_resolved_via": "ddg",
  "domain_confidence": 0.70,
  "email_attempts": [
    {"provider": "hunter", "results_count": 3}
  ],
  "winning_provider": "Hunter",
  "contacts_count": 3,
  "contacts": [{"first_name": "...", "last_name": "...", "email": "...", "title": "..."}],
  "status": "found_personal",
  "rejection_reason": null,
  "duration_ms": 740,
  "timestamp": "2026-04-27T05:06:18.123Z",
  "skill_version": "2.0"
}

Verification Checklist

[ ] python find_email.py --help — all flags visible
[ ] python find_email.py --dry-run — shows stats, no API calls
[ ] API keys set as env vars and validated
[ ] Backup file created
[ ] Generic emails filtered (info@, sales@, etc.)
[ ] Domain mismatches skipped
[ ] --max-contacts 1 limits to 1 per company
[ ] Confirmation prompt appears before processing (unless --yes)
[ ] Re-run skips already-enriched rows
[ ] Incremental saves work

Adoption

asets-gobizit/find-email

$ install --global

Security Scan Results

SKILL.md

FindEmail Skill

When to Use

How to Invoke

CLI Flags

Title Filtering (v1.1)

Quality Filters (v1.3, updated v1.7)

Comments Column (v1.7)

Credit Estimate & Confirmation (v1.3)

Dry-Run Mode (v1.3)

Domain-First Pipeline (v1.4)

API Dependencies

Workflow

Input Folder

Output Format

Data Quality (v1.9)

Re-run Behavior

Forensic Bundle (v2.0)

Domain confidence + row coloring (v2.0)

Rejection-reason taxonomy (v2.0)

JSONL trace schema (v2.0)

Verification Checklist

Related Skills

asets-gobizit/pka-backup

asets-gobizit/stress-test

asets-gobizit/obsidian-tidy

asets-gobizit/make-website-design

asets-gobizit/find-email

$ install --global

Security Scan Results

SKILL.md

FindEmail Skill

When to Use

How to Invoke

CLI Flags

Title Filtering (v1.1)

Quality Filters (v1.3, updated v1.7)

Comments Column (v1.7)

Credit Estimate & Confirmation (v1.3)

Dry-Run Mode (v1.3)

Domain-First Pipeline (v1.4)

API Dependencies

Workflow

Input Folder

Output Format

Data Quality (v1.9)

Re-run Behavior

Forensic Bundle (v2.0)

Domain confidence + row coloring (v2.0)

Rejection-reason taxonomy (v2.0)

JSONL trace schema (v2.0)

Verification Checklist

Related Skills

asets-gobizit/pka-backup

asets-gobizit/stress-test

asets-gobizit/obsidian-tidy

asets-gobizit/make-website-design