skills/latex-semantic-linebreaks/SKILL.md
Use when editing LaTeX paper sources where prose paragraphs are written as long single lines, OR when starting a new LaTeX paper. Reflows .tex prose to one-sentence-per-line ("ventilated text" / "semantic linebreaks") so Edit() invocations are sentence-precise and git diffs review-friendly. Renders byte-identical PDF. Do NOT recommend latexindent or tex-fmt for this — both fail predictably on math-heavy LaTeX. Activates on phrases like "format the paper", "sentence-per-line", "semantic linebreaks", "ventilated text", "reflow latex", or any time you're editing a .tex file with multi-sentence single-line paragraphs.
npx skillsauth add AMindToThink/claude-code-settings latex-semantic-linebreaksInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LaTeX collapses single newlines in source to spaces in the rendered PDF, so reformatting prose to one-sentence-per-line is a pure-source change with zero rendering impact. The convention is variously called "semantic linebreaks", "ventilated text", or "one sentence per line".
Apply when:
Edit() brittle and git diff paint whole paragraphs red).Do NOT apply mid-paragraph during a single small edit — it inflates that edit's diff. Run as a dedicated formatting commit, separate from semantic changes.
Three off-the-shelf tools were evaluated and rejected. Do not retry them.
| Tool | Verdict |
|---|---|
| latexindent --m oneSentencePerLine (TeX Live) | Produces ~14 false breaks per math-heavy paper. Categories: tikz/factorial ! mistaken for sentence end (orange!70!black, 1/n!); quote-attached ?'' split off; \texttt{.} content broken; escaped-space vs.\ broken; abbrev etc.) split before comma; footnote .1 ambiguous. Tuning knobs (betterFullStop, sentencesBeginWith, sentencesDoNotContain) shift which categories fire but never eliminate them. Matches multiple closed-but-not-fully-fixed bugs in the latexindent.pl tracker. |
| tex-fmt (Rust) | Semantic-linebreaks support is an open feature request (issue #80) as of early 2025. Not implemented. |
| andykuszyk/texfmt (Go) | Archived September 2024. Only does fixed-width reflow, not sentence-per-line. |
If you find yourself trying to install latexindent or tune its config, stop. Use the project-local Python script described below.
A reference implementation lives in references/format_sentences.py of this skill. It is ~150 LoC and:
$...$, $$...$$, \(...\), \[...\]).verbatim, lstlisting, tikzpicture, equation, equation*, align, align*, gather, gather*, eqnarray, eqnarray*, multline, multline*, array, matrix, pmatrix, bmatrix, vmatrix, cases, pgfplots, axis, tabular, tabular*, tabularx.\texttt, \textcolor, \cite, \citep, \citet, \citeyear, \ref, \label, \url, \href, \verb.%-comments (with \% escaped form preserved).e.g., i.e., i.i.d., cf., vs., et al., etc., Fig., Eq., Sec., Tab., Ref., Refs., Eqs., Figs., Tabs., Secs., App., Apps., Alg., No.. Add to ABBREVS if your paper uses others.\\ before any single-char inspection, so \\[1em] (LaTeX linebreak + optional spacing) is not mistaken for \[ display math opener. This was a real bug; see the regression test below.\n only after [.?!], optionally past attached '' or ", then whitespace, then capital letter — provided no abbreviation matches the preceding ~12 chars.The companion test file references/test_format_sentences.py has 24 cases covering every edge case the script handles, including the runaway-skip-region regression. Run the tests before trusting the script in a new project.
# 1. Stage the script in the project (one-time setup).
mkdir -p scripts tests
cp ~/.claude/skills/latex-semantic-linebreaks/references/format_sentences.py \
scripts/format_paper_sentences.py
cp ~/.claude/skills/latex-semantic-linebreaks/references/test_format_sentences.py \
tests/test_format_paper_sentences.py
uv run pytest tests/test_format_paper_sentences.py -v # expect 24 green
# 2. Commit any pending semantic .tex changes FIRST — keep formatting
# in its own commit so reviewers can tell prose changes from reflow.
git add paper/your_paper.tex && git commit -m "..."
# 3. Run the formatter to a sibling file and review the diff.
python scripts/format_paper_sentences.py \
paper/your_paper.tex /tmp/formatted.tex
diff paper/your_paper.tex /tmp/formatted.tex | less # newlines only, no word changes
# 4. Apply, rebuild, and verify byte-identical rendered text.
cp /tmp/formatted.tex paper/your_paper.tex
cd paper && rm -f your_paper.{aux,bbl,blg,fdb_latexmk,fls,log,out} && \
latexmk -pdf your_paper.tex
# 5. Verification — both byte counts MUST match exactly.
pdftotext -nopgbrk paper/your_paper.pdf - | tr -s ' \n\t' ' ' | wc -c
# Compare against pre-format byte count (record this BEFORE step 4).
# 6. Commit format change separately.
git commit -am "format: sentence-per-line reflow (no semantic changes)"
Two checks, both required.
pdftotext -nopgbrk before.pdf - | tr -s ' \n\t' ' ' | wc -c
pdftotext -nopgbrk after.pdf - | tr -s ' \n\t' ' ' | wc -c
These two byte counts must match exactly. A pdftotext -layout line-by-line diff is misleading — column-position differences in extracted text produce tens of false-positive diff lines for purely cosmetic source reflow. The whitespace-collapsed -nopgbrk byte-count comparison is the right test.
.texByte-identical rendering does not guarantee the source got reflowed where you intended. A formatter bug (e.g., a runaway skip region from misparsing \\[) can leave whole paragraphs unformatted while still producing identical PDF output. After running, visually scan a few paragraphs in the .tex — every prose paragraph should have one sentence per line. If any paragraph is still a single long line, the skip-region detection is broken; debug it.
This is not theoretical: the original implementation of this script left ~25 lines unformatted (the abstract and §Motivation opening), and only Rule 1 verification missed it. The user's visual check caught it.
\\[1em] or similar shows up in \title, \author, or \date, ensure the formatter handles \\ escapes — the reference script does.Is the .tex source one-sentence-per-line in the paragraph you're editing?
├── Yes → just edit; don't reformat.
└── No → if you're making >2 prose edits, run the workflow above
first as a separate commit. If you're making one quick edit,
either skip reformatting (acceptable) or reformat just the
paragraph by hand inside the Edit() call.
development
Use when the user asks to check, audit, or improve a website or web project for accessibility (a11y), WCAG compliance, screen reader support, keyboard navigation, color contrast, or alt text. Triggers a plan-mode investigation against the TeachAccess design and code checklists, then implements approved fixes.
development
--- name: make-anonymous-branch description: Use when preparing a research repo for double-blind submission via anonymous.4open.science (ICML/NeurIPS/ICLR/workshop). Builds a single `anon-submission` branch with code+data+paper, scrubs identity leaks (author names, home paths, emails, wandb metadata, PDF author fields), patches LaTeX for pdf.js compatibility, and leaves `main` untouched. Triggers: "make an anonymous branch", "anonymize my repo for X submission", "set up anonymous.4open.science",
development
Translate math (formulas, estimators, algorithms) into code so the implementation faithfully matches what the source actually specifies. Use when writing code from a formula, reviewing an LLM-generated implementation of a formula, debugging a numerical mismatch with a paper, designing a new metric/estimator, or refactoring an existing math-heavy computation. Especially load-bearing whenever aggregation operators (sums, means, expectations, products, geometric means) appear over indices that can be reordered, or whenever the same English label can refer to multiple non-equivalent estimators (e.g. ratio-of-means vs mean-of-ratios, micro-average vs macro-average, sample-weighted vs unweighted). Prevents the failure mode where a code path silently implements the wrong estimator under the same name as the intended one.
development
Use when the user asks to review, find, summarize, or check Claude Code chat transcripts from a past date or time range ("review my chats from May 1st", "what was I working on yesterday", "any unfinished sessions this week"). Reads transcripts under `~/.claude/projects/`, handles local-time vs UTC correctly so late-evening sessions don't get dropped, and flags chats whose last assistant turn looks like an unanswered question.