skills/inbound/inbound-processing/SKILL.md
Receive, parse, and process incoming email via provider webhooks. Use when setting up inbound email handling, parsing MIME messages, extracting content from replies, detecting threads, filtering spam on inbound, or routing incoming messages.
npx skillsauth add chunkydotdev/email-skills inbound-processingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Receive incoming email, parse it into structured data, and route it to the right place.
domain-authentication - SPF/DKIM/DMARC setup that affects inbound auth verificationreply-classification - classifying reply intent (interested, OOO, objection, etc.)thread-management - maintaining full conversation context across messageswebhook-processing - general webhook handling patterns (retries, idempotency)email-security - injection attacks, content sanitization, phishing preventionbounce-handling - processing delivery failures from outbound sendsWhen someone sends an email to your domain, it hits an MX server. You have two options:
The provider handles MX record reception, MIME parsing, spam pre-filtering, and delivers a clean JSON payload to your endpoint. You handle business logic.
The cleanest developer experience for inbound. Postmark parses emails and POSTs JSON to your webhook URL.
Setup:
Key payload fields:
{
"From": "[email protected]",
"FromFull": { "Email": "[email protected]", "Name": "Jane Smith" },
"To": "[email protected]",
"ToFull": [{ "Email": "[email protected]", "Name": "" }],
"Subject": "Re: Your proposal",
"TextBody": "Looks great, let's schedule a call.",
"HtmlBody": "<html>...</html>",
"MessageID": "<[email protected]>",
"Headers": [
{ "Name": "In-Reply-To", "Value": "<[email protected]>" },
{ "Name": "References", "Value": "<[email protected]>" },
{ "Name": "Authentication-Results", "Value": "spf=pass; dkim=pass; dmarc=pass" }
],
"Attachments": [
{
"Name": "proposal.pdf",
"Content": "base64-encoded-content",
"ContentType": "application/pdf",
"ContentLength": 54321
}
],
"MailboxHash": "ref123"
}
MailboxHash trick: Postmark parses the + portion of the To address into MailboxHash. Send from [email protected], and when the reply comes back, MailboxHash is userId123. Use this for stateless thread/user association without database lookups.
Retry behavior: Postmark retries on non-2xx responses. Return 200 quickly and process asynchronously.
SendGrid's Inbound Parse posts email data as multipart/form-data, not JSON. This catches people off guard.
Setup:
mx.sendgrid.net (priority 10)Key form fields:
| Field | Content |
|-------|---------|
| from | Sender address |
| to | Recipient address |
| subject | Subject line |
| text | Plain text body |
| html | HTML body |
| envelope | JSON string with actual SMTP envelope sender/recipients |
| headers | Full raw headers as a single string |
| attachments | Number of attachments |
| attachment1, attachment2... | File uploads |
Important: The headers field is a raw string, not parsed JSON. You need to parse it yourself to extract In-Reply-To, References, and Authentication-Results.
Raw mode: If you need the full raw MIME message (for your own parsing or archival), enable "Post the raw, full MIME message" in settings. The raw message arrives in the email field.
Mailgun's Routes feature is the most flexible for pattern-based inbound routing.
Setup:
Route matching examples:
# Match a specific address
match_recipient("[email protected]") -> forward("https://your-api.com/webhooks/support")
# Catch-all for a domain
match_recipient(".*@yourdomain.com") -> forward("https://your-api.com/webhooks/inbound")
# Match by header
match_header("subject", ".*urgent.*") -> forward("https://your-api.com/webhooks/urgent")
Payload: Mailgun POSTs multipart/form-data with fields like sender, recipient, subject, body-plain, body-html, stripped-text (body without quoted parts), stripped-html, and Message-Id.
Stripped content: Mailgun is the only major provider that strips quoted reply text for you automatically. The stripped-text and stripped-html fields contain only the new content, not the quoted thread below. This saves you from implementing your own reply stripping.
SES is the most powerful option but requires the most assembly. It does not POST webhooks - it stores raw messages and notifies you.
Setup:
Architecture pattern:
Email arrives
-> SES Receipt Rule matches recipient
-> Store raw MIME in S3
-> Publish SNS notification
-> Lambda triggered by SNS
-> Parse MIME from S3
-> Process and route
Key considerations:
When to use SES: When you need raw MIME access, want to store every message in S3 for compliance, or are already deep in the AWS ecosystem. Not recommended if you just want parsed JSON.
If you are processing raw email (from SES, or using raw mode on other providers), you need to understand MIME structure.
A typical email with HTML body and attachments has this MIME tree:
multipart/mixed
+-- multipart/alternative
| +-- text/plain (plain text body)
| +-- multipart/related
| +-- text/html (HTML body)
| +-- image/png (inline image, referenced by Content-ID)
+-- application/pdf (attachment)
Key multipart types:
| Type | Purpose |
|------|---------|
| multipart/mixed | Top-level container when message has attachments |
| multipart/alternative | Same content in multiple formats (text + HTML) |
| multipart/related | HTML body with inline resources (images referenced by cid:) |
Parse in this order:
Content-Type. If it is multipart/*, descend into parts.multipart/alternative, prefer text/html for rendering, keep text/plain as fallback.multipart/related, the first part is the HTML body. Subsequent parts are inline resources. Match them using Content-ID headers (the HTML references them as src="cid:image001").multipart/mixed, iterate children. Parts with Content-Disposition: attachment are attachments. Parts with Content-Disposition: inline are inline content.Content-Transfer-Encoding (usually base64 or quoted-printable).Inline images use the Content-ID header to create a reference that the HTML body can embed:
Content-Type: image/png
Content-ID: <[email protected]>
Content-Disposition: inline
Content-Transfer-Encoding: base64
The HTML body references this as <img src="cid:[email protected]">. When processing inbound HTML, you can either:
cid: references with data URIs (for immediate display)src attributesThe Content-Type header specifies the charset: Content-Type: text/plain; charset=utf-8. Common charsets you will encounter:
utf-8 - the standard, handles everythingiso-8859-1 / latin1 - Western European, still common in legacy systemswindows-1252 - Microsoft's extension of ISO-8859-1iso-2022-jp - Japanese email, especially from older systemsAlways normalize to UTF-8 after decoding. Libraries like iconv-lite (Node.js) or Python's built-in codecs handle this.
Don't write your own MIME parser. Use battle-tested libraries:
| Language | Library | Notes |
|----------|---------|-------|
| Node.js | mailparser (from Nodemailer) | Full-featured, handles edge cases well |
| Node.js | postal-mime | Lightweight, works in workers/edge |
| Python | email (stdlib) | Built-in, handles most cases |
| Go | net/mail + mime/multipart | Standard library, lower-level |
| Ruby | mail gem | Mature, widely used |
| C#/.NET | MimeKit | The gold standard for .NET MIME parsing |
Three headers control email threading. All are defined in RFC 5322.
Message-ID: A globally unique identifier for each message, enclosed in angle brackets.
Message-ID: <[email protected]>
Generate a unique Message-ID for every outbound email. Format: <unique-value@your-sending-domain>. Without this, replies cannot reference your message.
In-Reply-To: Contains the Message-ID of the message being replied to.
In-Reply-To: <[email protected]>
This is your primary thread-linking mechanism. When an inbound message has In-Reply-To, look up the original send by matching against your outbound Message-IDs.
References: Contains the Message-IDs of all messages in the thread chain, oldest first.
References: <[email protected]> <[email protected]> <[email protected]>
When building a reply, set References to the parent's References (if any) followed by the parent's Message-ID. This creates a full thread chain that any email client can reconstruct.
The reliable path for thread linking:
1. Inbound message arrives with In-Reply-To header
2. Look up In-Reply-To value against your stored outbound Message-IDs
3. If found: exact match, high confidence (1.0)
4. If not found: fall back to References header, check each ID
5. If still not found: fall back to heuristic matching
Fallback heuristics (lower confidence, use with caution):
+tag portion of the recipient address (Postmark's MailboxHash pattern)Assign a confidence score to each linking method. Exact In-Reply-To match gets 1.0. Heuristic matches should get 0.5 or lower. Let downstream logic (routing, auto-responses) use the confidence to decide how aggressively to act.
The Authentication-Results header is added by the receiving mail server and contains SPF, DKIM, and DMARC verification results.
Authentication-Results: mx.yourdomain.com;
spf=pass (sender IP is 198.51.100.1) [email protected];
dkim=pass header.d=example.com header.s=selector1;
dmarc=pass (policy=reject) header.from=example.com
Parse this to extract three values:
| Mechanism | Values | What it means | |-----------|--------|---------------| | SPF | pass, fail, softfail, neutral, none | Whether the sending IP is authorized | | DKIM | pass, fail, none | Whether the cryptographic signature is valid | | DMARC | pass, fail, none | Whether SPF/DKIM align with the From domain |
How to use auth results for inbound filtering:
Also check the Received-SPF header as a fallback for SPF results if Authentication-Results does not contain SPF.
When you receive HTML email but need plain text (for classification, search indexing, or display), do not just strip tags. That turns <p>Hello</p><p>World</p> into HelloWorld.
Proper conversion:
<p>, <div>, <br>, <li>, <tr>)<a href="url">text</a> to text (url) or just textLibraries: html-to-text (Node.js), html2text (Python), Jsoup (Java).
When someone replies to an email, their client includes the original message below a marker line. You want the new content, not the entire quoted history.
Common quote markers:
On Mon, Mar 30, 2026, Jane Smith <[email protected]> wrote:
From: Jane Smith <[email protected]>
Sent: Monday, March 30, 2026
> This is quoted text
> from the original message
-----Original Message-----
________________________________
Stripping approaches:
> are quoted. Simple but misses HTML-formatted quotes.On .* wrote:, -----Original Message-----, or From:.*Sent:.* blocks. Everything after the marker is quoted.stripped-text automatically. Postmark does not. SendGrid does not.email_reply_parser (Ruby, with ports to Python, JavaScript, Go) handles the common patterns. Mailgun's talon library (Python) uses machine learning for signature and reply detection.Practical advice: Start with marker-line detection for the most common patterns. Fall back to > prefix detection. Accept that you will never catch 100% of cases - email client formatting is inconsistent. Log raw content alongside stripped content so you can debug false positives.
Inbound email content is untrusted input. Sanitize before storing or displaying.
Plain text sanitization:
data:text/html;base64,...) that could embed executable contentHTML sanitization:
<script>, <iframe>, and event handler attributes (onclick, onload, etc.)display:none, visibility:hidden, font-size:0) - these are commonly used to smuggle content past human readersp, br, a, b, i, em, strong, u, ul, ol, li, h1-h6, table, thead, tbody, tr, td, th, img, div, span, blockquote, pre, codehref and title on <a>, src/alt/width/height on <img>, colspan/rowspan on <td>/<th>https: and mailto: URL schemes. Reject javascript:, data:, vbscript:, and anything elsejavascript:Size limits (reasonable defaults):
| Field | Max size | Rationale | |-------|----------|-----------| | Subject | 1 KB | RFC 5322 has no limit, but anything longer is spam or malformed | | Text body | 100 KB | Sufficient for any legitimate business email | | HTML body | 500 KB | HTML with inline styles can be larger, but 500 KB is generous | | Single attachment | 25 MB | Gmail's limit, a reasonable default | | Total message | 40 MB | SES's limit, most providers are similar |
Use the parsed SPF/DKIM/DMARC results to adjust spam scores:
Auth failure weights:
SPF fail or softfail: +0.3 to phishing score
DKIM fail: +0.3 to phishing score
DMARC fail: +0.4 to phishing score
All three fail: strong quarantine signal
Do not reject solely based on auth failure. Legitimate senders sometimes have misconfigured authentication, especially small businesses. Use auth results as one signal among many.
Pattern categories to check:
| Signal | Weight | Examples |
|--------|--------|---------|
| Spam keywords | 0.5 | "free gift", "act now", "limited time offer", "you've been selected" |
| Excessive caps | 0.3 | More than 50% uppercase letters (in messages with 20+ alpha characters) |
| Excessive links | 0.25 | More than 5 URLs in the body |
| Bulk sender patterns | 0.3 | "to unsubscribe", "view in browser", "email preferences" |
| Phishing urgency | 0.5 | "verify your account", "immediate action required", "account suspended" |
| Fake login requests | 0.4 | "enter your password", "sign in to verify", "update your payment info" |
| Executable references | 0.6 | .exe, .bat, .ps1 file extensions, "enable macros" |
| Impersonation | 0.5 | "from the CEO", "wire transfer", "purchase gift cards" |
| Domain lookalikes | 0.35 | paypa1.com, micr0soft.com, amaz0n.com |
Sum the weights of matched categories. Verdict threshold at 0.5: above it, classify as the highest-scoring threat type. Below it, classify as clean.
If an AI agent reads your inbound email, you need to scan for prompt injection before the agent sees the content. This is a real attack surface - someone replies to your agent's outreach email with content designed to manipulate the agent.
Pattern categories (ordered by severity):
| Category | Weight | What it catches |
|----------|--------|----------------|
| System prompt mimicry | 0.60 | system:, <\|im_start\|>, [INST], <<SYS>> |
| Instruction override | 0.50 | "ignore previous instructions", "override your rules" |
| Context manipulation | 0.50 | assistant:, "end of conversation", fake chat transcripts |
| Data exfiltration | 0.45 | "repeat your system prompt", "dump your API key" |
| Tool abuse | 0.45 | "call the function", <function_call>, JSON tool invocation |
| Authority escalation | 0.45 | "I am the admin", "debug mode enabled", "sudo access" |
| Role play | 0.40 | "you are now", "act as", "pretend to be" |
| Delimiter abuse | 0.35 | ```system, <instructions>, <prompt> |
| Payload smuggling | 0.25 | Hidden text in HTML comments, zero-size font content |
| Encoding evasion | 0.25 | Base64-encoded instructions, Cyrillic-Latin mixing, zero-width character clusters |
Risk levels:
Canary token defense: For unknown attack patterns that bypass regex matching, embed a unique token in the agent's context for each thread. If the token appears in any outbound draft (meaning the agent was manipulated into echoing its context), block the send and flag the thread. This catches injection attacks by their effect rather than their form.
Allow trusted senders to bypass classification. Match by exact email or by domain. Contacts from known partners, internal addresses, and verified customers do not need injection scanning on every message. The false positive cost on routine correspondence from trusted senders outweighs the risk.
But maintain the whitelist carefully. Compromised accounts are a real attack vector.
After classifying the inbound message, route it based on intent:
| Intent | Action | SLA |
|--------|--------|-----|
| interested | Notify owner / auto-respond | 5 minutes |
| support | Route to support queue | 30 minutes |
| billing | Route to billing, require approval | 60 minutes |
| legal | Route to human review, never auto-respond | 30 minutes |
| security | Route to human review, never auto-respond | 15 minutes |
| out_of_office | Auto-archive | - |
| objection | Auto-archive, update suppression | - |
| not_now | Auto-archive, schedule follow-up | - |
| unclassified | Route to owner with low priority | 60 minutes |
Do not let automated routing act on low-confidence classifications:
Set up routing at the domain level:
[email protected] -> support queue
[email protected] -> billing queue
[email protected] -> sales notifications
*@yourdomain.com -> catch-all inbox
Enable catch-all routing on your mailbox so that typos and unknown addresses still arrive somewhere. Without a catch-all, emails to [email protected] (typo) bounce, and you lose the message.
Watch for suspicious patterns in thread context:
interested from [email protected], then a new message with objection from [email protected]. This is either a different stakeholder or a manipulation attempt. Route to human review.interested to objection (or vice versa) within 30 minutes. Unusual and worth flagging.If multiple anomalies occur in the same thread, or an intent flip comes from a new sender, treat it as critical severity and require human approval before any automated action.
Your webhook endpoint should store the raw payload and return 200 within a few seconds. Do all processing asynchronously.
Webhook receives POST
-> Validate payload (signature, required fields)
-> Store raw message to database/queue
-> Return 200
-> [async] Parse content
-> [async] Run safety classification
-> [async] Link to thread
-> [async] Route and notify
If your webhook does parsing, classification, database writes, and third-party calls before returning, you will hit timeouts and trigger retries. Retries create duplicate processing.
Webhook deliveries are at-least-once. You will receive duplicates. Deduplicate using:
MessageID, SendGrid's Message-Id header)Message-ID headerStore processed message IDs and skip duplicates before doing any work.
Count inbound messages toward your tenant's quota. Providers that charge per-message (like Resend) bill for both directions. Even if you do not get billed per inbound, rate-limit to protect against:
Processing inside the webhook handler. Do classification, routing, and notifications asynchronously. If your handler takes 30 seconds, the provider retries, and you process the same message twice.
Not deduplicating. Webhook delivery is at-least-once. If you do not check for duplicate message IDs, you will create duplicate records, send duplicate notifications, and confuse your users.
Trusting Content-Type for body format. Some emails claim text/html but contain plain text. Some claim text/plain but contain HTML tags. Check the actual content, not just the header.
Using subject-line matching for threading. Subject lines change (Re: Re: Fwd: Re: Original), get mangled by email clients, and are trivially spoofable. Use In-Reply-To and References headers. Subject matching is a last resort.
Not sanitizing inbound HTML. Email HTML is untrusted input from the internet. If you display it without sanitizing, you are vulnerable to XSS, tracking pixels, and hidden content attacks. Allowlist tags and attributes, not blocklist.
Stripping quoted replies too aggressively. There is no standard for quote markers. If your stripping logic is too aggressive, you will lose actual message content. Keep the raw message alongside the stripped version.
Ignoring Authentication-Results. The receiving server already checked SPF, DKIM, and DMARC for you. The results are in the headers. Parse them and use them as a signal for spam scoring. Ignoring them means you are throwing away free security data.
Auto-responding to everything. Auto-responses to out-of-office replies create loops. Auto-responses to mailing lists create storms. Auto-responses to spam confirm your address is active. Check intent and sender type before auto-responding. Never auto-respond to messages with the Auto-Submitted header set to anything other than no.
Blocking on auth failure alone. Legitimate senders have misconfigured SPF/DKIM/DMARC all the time, especially small businesses. Use auth results as one signal in a weighted scoring system, not as a binary gate.
Not storing the raw MIME. Even if you parse and extract everything, store the raw message. You will need it for debugging, compliance, and re-processing when your parsing logic improves.
data-ai
Choose and configure an email service provider. Use when setting up email for a new project, comparing providers, migrating between providers, or adding failover.
development
Set up SPF, DKIM, and DMARC email authentication. Use when configuring a new sending domain, debugging spam/rejection issues, adding email providers, or preparing for Google/Yahoo/Microsoft bulk sender requirements.
development
Design and send transactional emails. Use when building password resets, receipts, shipping notifications, account alerts, or separating transactional from marketing streams.
development
Build welcome and activation email sequences. Use when designing signup flows, driving users to key actions, converting trials to paid, or reducing early churn.