skills/normalize-format/SKILL.md
--- name: normalize-format description: Normalizes a single inbox file of any supported format (plain markdown, Claude.ai JSON export, Claude Code JSONL session, Readwise markdown/CSV highlight, transcript with timestamps or speaker labels, link capture) into a clean markdown body plus partial frontmatter (id, title, source block, word_count). Handles format-specific failure modes — JSON content-block arrays, timestamp stripping, per-highlight chunking, URL-vs-commentary separation. Use when ing
npx skillsauth add lyndonkl/claude skills/normalize-formatInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Related skills: Called by ingest-inbox-item as step 1. Upstream of tag-by-topic, score-intuition-density, dedupe-against-corpus.
| Extension | Format | Notes |
|---|---|---|
| .md, .txt | plain markdown | Default; passes through |
| .json | Claude.ai export | Conversation with messages array |
| .jsonl | Claude Code session | Content-block array per response |
| .md (Readwise-shaped) | Readwise export | Highlights + user notes |
| .csv | Readwise CSV | Per-row highlight |
| .vtt, .srt, .md (diarized) | Transcript | May include timestamps + speaker labels |
| .md with URL + commentary | Link capture | User's framing is the signal |
Normalize one file:
- [ ] Step 1: Detect format by extension + first-line sniff
- [ ] Step 2: Apply format-specific parse
- [ ] Step 3: Split long transcripts at topic boundaries (>3000 words)
- [ ] Step 4: Emit [{body, partial_frontmatter}, ...] list (usually one item)
.jsonl with "type":"assistant" → Claude Code session..json with "conversation" / "messages" top-level key → Claude.ai export..md starting with # and Readwise boilerplate (**Highlights first synced by Readwise...**) → Readwise..vtt / .srt, or .md with [HH:MM:SS] timestamp pattern, or lines prefixed with speaker labels like Me: → transcript..md with ≤50 words and a prominent URL → link capture.Plain markdown: pass body through unchanged. Title = first H1 or filename-derived.
Claude.ai JSON: flatten content blocks to markdown. Preserve user/assistant turn labels (**Me:** / **Claude:**). Strip system-reminder blocks. provenance.author: claude, confidence: paraphrased.
Claude Code JSONL: flatten content-block array. Drop tool_use blocks unless the adjacent user message references the tool output. Strip system reminders.
Readwise: split per-book file into one seed per highlight. Body = highlight + user note. Boilerplate stripped. For bare highlights (no user note), set provenance.confidence: quoted, density capped at 3 downstream. For user-annotated highlights, confidence: owned.
Transcript: strip timestamps. Preserve speaker labels as **Speaker:** prefixes. If >3000 words, split at topic shifts — emit multiple outputs sharing parent_source. Target ~1500 words per chunk.
Link capture: separate URL from commentary. Body = user's commentary. Frontmatter adds source.linked_url. If <50 words of commentary, flag low_commentary: true so the scorer caps density.
Split heuristic: paragraph break + topic-vocabulary shift (measured by tag overlap drop across adjacent paragraphs). Each chunk ~1500 words. Preserve parent_source across chunks.
From: ..., Date: ..., Subject: ...) — reclassify as plain markdown or link capture..json file that isn't a Claude export — treat as plain markdown and wrap in code fences.Input (inbox/2026-04-21-claude-bnn.json):
{"conversation":{"name":"BNN variational","messages":[
{"role":"user","content":[{"type":"text","text":"help me intuit why variational inference..."}]},
{"role":"assistant","content":[{"type":"text","text":"Think of it as fitting a simple distribution..."}]}
]}}
Output:
# BNN variational
**Me:** help me intuit why variational inference...
**Claude:** Think of it as fitting a simple distribution...
With partial_frontmatter = {id: 2026-04-21-bnn-variational, title: "BNN variational", source: {type: claude-conversation, ...}, provenance: {author: claude, confidence: paraphrased}}.
WARN | malformed CSV row in <file> line N to changelog.messages key missing, fall back to recursive text extraction; mark confidence: paraphrased regardless.[image: awaiting user annotation] and status: dead with reason image-only..processed/ only on success).[{body, partial_frontmatter}, ...] — always a list, usually of length 1.parent_source.testing
--- name: advisory-edit description: A strict advisory-only editing discipline for a writer who dictates ("speaks out") essays and wants help WITHOUT having their voice changed. The editor directs structure, flags grammar, and suggests strategic language — but never modifies the writer's text unless the writer explicitly says "apply" / "make that change" / "rewrite this." Produces a line-referenced, suggestion-only critique where every item is marked the writer's call. Four passes: structural, l
testing
Provides the house style for analyst-grade strategist writing — third-person register with sparing first-person, no em dashes, no "not X, not Y, not Z" negation cascades, numbered footnote citations rather than inline source parentheticals, specific opinion-signaling phrases, and topic-forward paragraph structure modeled on voice patterns observed in Damodaran's Musings on Markets and Thompson's Stratechery. Use when consolidating working notes into a finished long-form strategist or analyst report that must read as written by a senior human analyst rather than an AI assistant.
testing
Renders a markdown report to a PDF using pandoc with xelatex (11pt serif body, 1-inch margins, numbered footnotes, formal heading hierarchy). Requires a one-time install of pandoc and a LaTeX engine on the user's machine — basictex on macOS or texlive-xetex on Linux. Does not attempt automatic install. Fails loudly with the exact install commands if pandoc or xelatex is missing on the user's PATH. Use when producing a finished strategist or analyst report PDF from a polished markdown source.
testing
Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to *see* a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.