skills/apple-mail/mail-core/SKILL.md
Background technical knowledge for reading Apple Mail data locally on macOS. Auto-loaded when any mail-* skill executes. Contains architecture, query patterns, and parsing techniques for ~/Library/Mail/V10/.
npx skillsauth add aashari/ai-agent-skills mail-coreInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
~/Library/Mail/V10/
├── MailData/Envelope Index ← SQLite, central index of ALL mail across all accounts
└── <UUID>/ ← one directory per synced account
└── <Mailbox>.mbox/.../Data/[0/5/1/]Messages/<ROWID>.emlx
The Data/ subfolder depth is unpredictable (Data/Messages/, Data/1/Messages/, Data/0/4/1/Messages/, Data/0/5/1/Messages/). Always locate by ROWID using find:
find ~/Library/Mail/V10/ -name "<ROWID>.emlx" 2>/dev/null | head -1
Two emlx variants exist:
<ROWID>.emlx — full message downloaded (headers + complete body)<ROWID>.partial.emlx — IMAP lazy-load: headers + truncated body (Apple Mail downloaded the start but not everything). Body is usually readable; just may cut off near the end.The shared parser (~/.claude/skills/_mail-shared/parser.py) tries full emlx first, then falls back to partial automatically. Do not treat partial files as "not cached" — read them and extract what body is available.
DB="$HOME/Library/Mail/V10/MailData/Envelope Index"
sqlite3 "$DB" "..."
| Table | Key Columns |
|---|---|
| messages | ROWID, date_received (Unix epoch), date_sent (Unix epoch), sender→addresses, subject→subjects, mailbox→mailboxes, read, flagged, deleted, size, conversation_id, is_urgent, automated_conversation, unsubscribe_type, list_id_hash, summary→summaries |
| mailboxes | ROWID, url (format: imap://UUID/INBOX, ews://UUID/Inbox), total_count (reliable), unread_count (EWS only, IMAP always 0) |
| addresses | ROWID, address (email string), comment (display name) |
| subjects | ROWID, subject (text) |
| recipients | message, address, type (0=To, 1=CC) |
| attachments | message, name (filename) |
| summaries | ROWID, summary (full body text — Apple-extracted, sparse: ~5% of recent messages) |
| conversations | thread grouping via conversation_id on messages |
automated_conversation:
0 = unclassified (includes real humans AND some bots — not a clean human flag)
1 = transactional automated (Jira, Slack notifications — can accept replies)
2 = bulk automated (newsletters, monitoring, bank notifications, CI/CD, marketing)
unsubscribe_type:
0 = no unsubscribe header (humans, transactional)
1 = has List-Unsubscribe RFC header (newsletters, marketing — most reliable)
2,3,6,7 = other list/subscription header variants
NULL = not yet classified (treat like 0 for filtering)
Use the right filter for the context:
automated_conversation = 0 AND unsubscribe_type = 0automated_conversation IN (0,1) AND unsubscribe_type = 0automated_conversation != 2automated_conversation = 2 OR unsubscribe_type > 0unsubscribe_type > 0Note: automated_conversation = 1 includes Jira, Slack notifications — they can be action items but humans don't expect a reply to them. Use = 0 for "needs human reply" contexts.
mailboxes.total_count — accurate all-time total per mailbox, no COUNT(*) neededmailboxes.unread_count — only accurate for EWS/Exchange; IMAP accounts always 0. Use COUNT(*) WHERE read=0 for IMAP unread.messages.summary FK → summaries.summary text — fast body preview without emlx parsing, but only populated for ~5% of messages (mostly last 2-3 days). Use as fast-path when available.Numeric hash of the mailing list ID. 39k+ rows populated. Same hash = same mailing list. Use GROUP BY list_id_hash to find distinct mailing lists and their volume.
date_received is standard Unix epoch (NOT Apple's 2001 Core Data offset):
datetime(m.date_received, 'unixepoch', 'localtime')
-- Last N hours:
WHERE m.date_received >= (strftime('%s','now') - 3600 * N)
-- Yesterday:
WHERE datetime(m.date_received,'unixepoch','localtime') >= date('now','-1 day')
AND datetime(m.date_received,'unixepoch','localtime') < date('now')
SELECT datetime(m.date_received,'unixepoch','localtime') as dt,
s.subject, a.address as sender, a.comment as sender_name,
mb.url as mailbox, m.ROWID, m.read, m.flagged, m.size
FROM messages m
JOIN subjects s ON m.subject = s.ROWID
JOIN addresses a ON m.sender = a.ROWID
JOIN mailboxes mb ON m.mailbox = mb.ROWID
WHERE m.deleted = 0
AND mb.url NOT LIKE '%Spam%'
AND mb.url NOT LIKE '%Trash%'
AND mb.url NOT LIKE '%Junk%'
AND mb.url NOT LIKE '%Draft%'
ORDER BY m.date_received DESC;
imap://UUID/... → Gmail or IMAPews://UUID/... → Exchange/Outlook (work)ls ~/Library/Mail/V10/ | grep -v MailData
SELECT a.address, COUNT(*) as cnt
FROM messages m
JOIN mailboxes mb ON m.mailbox = mb.ROWID
JOIN recipients r ON r.message = m.ROWID
JOIN addresses a ON r.address = a.ROWID
WHERE mb.url LIKE '%UUID_HERE%' AND r.type IN (0,1)
GROUP BY a.address ORDER BY cnt DESC LIMIT 3;
import email, re
def parse_emlx(path):
with open(path, 'rb') as fh:
fh.readline() # skip byte count (emlx-specific)
content = fh.read()
msg = email.message_from_bytes(content)
body = ""
for part in msg.walk():
ct = part.get_content_type()
if ct in ('text/plain', 'text/html'):
payload = part.get_payload(decode=True)
if payload:
b = payload.decode('utf-8', errors='replace')
if len(b) > len(body):
body = b # prefer the largest part
# Strip HTML
body = re.sub(r'<style[^>]*>.*?</style>', ' ', body, flags=re.DOTALL|re.IGNORECASE)
body = re.sub(r'<[^>]+>', ' ', body)
body = re.sub(r' ', ' ', body)
body = re.sub(r'&', '&', body)
body = re.sub(r'&#\d+;', '', body)
return re.sub(r'\s+', ' ', body).strip()
Prefer column-based filters over address regex — they are indexed and more accurate:
automated_conversation = 2 — bulk automated (newsletters, monitoring, CI/CD, bank notifications)unsubscribe_type > 0 — newsletter/mailing list (has unsubscribe header)mb.url NOT LIKE '%Spam%' etc.)For subject-based noise: PROBLEM:, OK:, ALARM:, CRIT: prefixes indicate monitoring alerts.
Note: searchable_messages.message_body_indexed is an INTEGER flag (indexing status), NOT body text. Do not use it for content retrieval.
A reusable parser lives at ~/.claude/skills/_mail-shared/parser.py. Use it with:
python3 ~/.claude/skills/_mail-shared/parser.py <ROWID1> [ROWID2 ...]
Returns JSON array with body text and attachment filenames per ROWID.
data-ai
Show work emails only, filtered to Exchange/EWS accounts and corporate email domains. Digest with priorities. Use when user asks about work email, work inbox, or wants to separate work from personal mail. Arguments: optional date range or "today", "yesterday", "this week".
testing
Intelligent inbox triage — surface the most important emails across all accounts, prioritized by urgency and requiring attention. Use when user wants a smart overview of what needs their attention, asks "what's important in my email", or wants help deciding what to read first. Arguments: optional time window (default: last 48 hours) or account filter.
data-ai
Find flight bookings, hotel reservations, travel itineraries, and booking confirmations from email. Use when user asks about upcoming trips, travel plans, booking references, flight details, or hotel reservations. Arguments: optional destination, airline, date range, or booking service.
testing
Show who sends the most email, communication frequency analysis, and relationship mapping. Use when user asks who emails them most, top contacts, communication patterns, or wants to understand their email social graph. Arguments: optional time range (default: last 90 days), account filter, or "humans only" to exclude automated senders.