skills/make-anonymous-branch/SKILL.md
--- name: make-anonymous-branch description: Use when preparing a research repo for double-blind submission via anonymous.4open.science (ICML/NeurIPS/ICLR/workshop). Builds a single `anon-submission` branch with code+data+paper, scrubs identity leaks (author names, home paths, emails, wandb metadata, PDF author fields), patches LaTeX for pdf.js compatibility, and leaves `main` untouched. Triggers: "make an anonymous branch", "anonymize my repo for X submission", "set up anonymous.4open.science",
npx skillsauth add AMindToThink/claude-code-settings skills/make-anonymous-branchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core idea: one branch (anon-submission) carries the entire blinded artefact — code, data, paper, derived results — in-tree. anonymous.4open.science mirrors that branch with .git/ stripped. main stays untouched throughout; all destructive edits live on the anon branch.
This skill is flexible. The phases and gotchas transfer; the exact file paths depend on the project. Adapt, don't blindly follow.
main.Run these BEFORE proposing edits — they shape every later decision:
# 1. Total payload size (excluding things that won't ship)
du -sh results figures investigations data 2>/dev/null
# 2. Largest tracked files (sorted descending)
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 20000000 {print $5, $9}' | sort -rn | head -20
# 3. Confirm main is clean and you know what you're branching from
git status
git log main --oneline -5
anonymous.4open.science mirrors a regular GitHub branch, so GitHub's git-blob limits apply:
Decision tree:
| Situation | Hosting decision | |---|---| | Largest single file < 50 MB and total payload < ~250 MB | In-repo. Simplest path. No LFS, no external host, no second anonymization step. | | Single files in 50–100 MB or total 250 MB–1 GB | In-repo with caveats. Acceptable but: warn user about clone times; consider whether some intermediate artefacts can be regenerated by scripts and dropped from the branch. | | Any single file > 100 MB or total > ~1 GB | External or LFS. git-lfs adds complexity; HF/Zenodo adds a second account to anonymize. Pick the one with less identity surface. | | Sensitive / non-redistributable data | Don't ship. Describe how to obtain it; have reviewers email an anonymized contact (some venues provide one). |
The default to push for: in-repo if it fits. The reason is concrete — every external host (HuggingFace, Zenodo, S3 bucket, Google Drive) adds another account/identity that needs separate anonymization, doubling the leakage surface. If the data is in the repo, anonymizing the repo anonymizes the data.
Run before git checkout -b:
# Find anything that would push you over 50/100 MB on the anon branch
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 50000000 {print $5, $9}'
# Total tracked-file size (rough — doesn't include things you'll add later)
git ls-files | xargs -I{} stat -c %s {} 2>/dev/null | awk '{s+=$1} END {printf "tracked: %.1f MB\n", s/1024/1024}'
# If a wandb/ or build-artefact dir is currently tracked, surface it now:
du -sh $(git ls-files | xargs -I{} dirname {} | sort -u | head -50) 2>/dev/null | sort -h | tail -10
Any single result > 50 MB → revisit the hosting decision; > 100 MB → must use LFS or external.
main, never modify main until the anon branch is shipped..git/, so reviewers don't see commit history via the proxy. But if the public GitHub repo will also be visible (typical), past commits authored by your real GitHub account will leak. Decide whether to squash to a single anon-author commit (defense in depth).The dominant failure mode is missing an identity leak. Dispatch 3 Haiku Explore subagents in parallel — they're cheap, thorough, and find things you'd miss by eye:
investigations/, docs/, notes/, all .md/.py files for: author names, GitHub usernames, institution names, advisor names, home-directory paths (/home/<user>, /Users/<user>), email addresses (@gmail, @<institution>), hostnames, wandb run URLs.tests/, scripts/, src/, or LaTeX \input{} chain depends on it. The classic gotcha: a paper-macros test that hardcodes a path to the named-authors .tex.README.md, pyproject.toml, CLAUDE.md, .gitignore), audit a sample of large JSON files for email/username/hostname/wandb_run_id keys, list figure PDFs that may carry matplotlib Author metadata.See superpowers:dispatching-parallel-agents for how to launch these concurrently.
From the audit, build three lists:
A. git rm — identity-bearing files (typical patterns; project-specific paths):
paper/archive/)paper/historical_papers/, paper/historical_sections/)*.log containing /home/<user>/... stack tracespaper/CLAUDE.md, root CLAUDE.md if they contain personal pathswandb/ directory if tracked — wandb-metadata.json contains your email.DS_Store, __MACOSX/B. git rm — internal-only docs (not identity leaks, but reduce review surface):
original_vs_workshop_audit.md-style internal comparison docsC. In-place scrubs — files that stay but contain identifying lines:
USER_AGENT = "...(<your-email>)" in build scripts → drop the email# Branch from main — DO NOT switch back to main with uncommitted changes
git checkout main && git pull
git checkout -b anon-submission
Pause and verify before any bulk delete. Run git branch --show-current and git status to confirm you're on anon-submission with main's tree intact. Real-session pattern: users sometimes ask to "check the branch" before the first git rm — that's healthy. A bulk delete on the wrong branch is recoverable (the files are still in main's history) but disorienting; better to confirm than rush.
# Apply the deletions (group by category for reviewable diffs)
git rm -r paper/archive paper/historical_papers paper/historical_sections
git rm paper/<named-author-wrapper>.{tex,pdf}
git rm paper/CLAUDE.md paper/<internal-investigation>.md
git rm scripts/<upload-to-hf>.py
git rm -r wandb
git rm <log-with-home-paths>
# ... etc per audit findings
git rm --ignore-unmatch .DS_Store paper/.DS_Store
# Apply in-place edits (use Edit tool on each file)
# - drop email from USER_AGENT
# - rename "<name>" mentions to "the lead author" / "the investigator"
anonymous.4open.science renders PDFs via pdf.js (Mozilla), which handles default LaTeX URW Times Type 1 fonts poorly — visible as smushed-together characters / broken kerning. The fix is two lines in the LaTeX preamble, before microtype and the conference style file:
\usepackage{cmap} % must come BEFORE fontenc
\usepackage[T1]{fontenc} % T1 encoding has glyph-width tables pdf.js handles
cmap is sometimes missing from minimal TeX distributions:
tlmgr install cmap # if you get "File `cmap.sty' not found"
After adding these, page count typically does not change (verify; if it does shift across the venue's main-paper page limit, back out and pick a different fix like \usepackage{newtxtext,newtxmath}).
Visual confirmation is mandatory — page count tells you nothing about kerning. After the anon URL is live, ask the user to open the PDF preview on anonymous.4open.science (or open it yourself if you have a browser tool) and confirm characters aren't smushed. A passing build with bad rendering is the failure mode this fix addresses.
Even with anonymous LaTeX (\icmlauthor{Anonymous Authors}{anon}), hyperref and pdfTeX bake the title, anonymized author string, subject, and creator into PDF metadata. This survives the LaTeX-side anonymization. Scrub via pypdf:
# uv run --with pypdf python <<'EOF'
from pypdf import PdfReader, PdfWriter
from pathlib import Path
pdfs = sorted(set(Path("figures").rglob("*.pdf")) | {Path("paper/<main>.pdf")})
for p in pdfs:
if not p.exists(): continue
r = PdfReader(str(p))
w = PdfWriter(clone_from=r)
w.add_metadata({
"/Author": "", "/Creator": "", "/Title": "", "/Subject": "",
"/Keywords": "", "/Producer": "",
})
tmp = p.with_suffix(".pdf.scrubtmp")
with open(tmp, "wb") as f: w.write(f)
tmp.replace(p)
EOF
Re-run this after every LaTeX rebuild — latexmk writes fresh metadata each pass.
Critical and non-negotiable. The whole point of the branch is that these checks pass.
# 1. Build still works
cd paper && latexmk -pdf -interaction=nonstopmode <main>.tex && cd ..
# 2. Tests still pass (skip slow GPU tests if needed)
uv run pytest
# 3. Final identity-string sweep — all should return ZERO tracked-text hits
git grep -i -n -I -E "<your-name-tokens>|<github-handle>|<institution>|<advisor>|@gmail|@<institution>"
git grep -n -I -E "/home/<user>|/Users/<user>"
# 4. PDF metadata
uv run --with pypdf python -c "
from pypdf import PdfReader
import sys
for p in sys.argv[1:]:
md = PdfReader(p).metadata or {}
bad = {k: v for k, v in md.items() if v and any(s in str(v).lower() for s in ['<your-name-lowercase>','<institution-lowercase>'])}
print(p, '→', 'CLEAN' if not bad else f'LEAKS: {bad}')
" paper/<main>.pdf figures/**/*.pdf
Triage the grep hits:
pdftotext / strings extraction; if no actual rendered text contains the name, ignore.# Use an explicitly anonymous identity (defense in depth — safe even if
# repo is later shared without anonymous.4open.science in front)
git add -A
git -c user.name="Anonymous" -c user.email="[email protected]" \
commit -m "<venue> submission"
# Push to remote (regular GitHub; anonymous.4open.science fetches from there)
git push -u origin anon-submission
# OPTIONAL: full single-commit orphan branch (drops inherited git history
# from main entirely — only matters if reviewers may clone the public repo
# directly, bypassing anonymous.4open.science). Destructive of the branch
# ref; ASK USER before doing this.
anon-submission to GitHub first with placeholder URLs intact (Phase 7).github.com/<user>/<repo>/tree/anon-submission) to anonymous.4open.science.https://anonymous.4open.science/r/<repo>-67E6/ (random 4-character hash suffix).\projectGithubUrl macro to the anon URL. Rebuild, re-scrub metadata, commit, push.main with the real URL in the same \projectGithubUrl slot.The chicken-and-egg trap: you can't put the anon URL into the paper before submitting, and submitting requires the paper. The pattern above breaks the cycle by accepting one round-trip — push placeholder, submit, push real URL.
Define \projectGithubUrl once per branch in the LaTeX preamble (different value per branch). Every footnote, prose mention, and "Released artefacts" paragraph references the macro, never the literal URL. Result:
\input{}'d section files (01_motivation.tex etc.) work on both main and anon-submission — the URL changes by virtue of the per-branch macro definition.Edit calls to both branches when changing footnote prose.main URL-update commit onto anon-submission (or vice versa). Cherry-pick brings the \newcommand{\projectGithubUrl}{...} line, which has the wrong URL for the destination branch. Re-do the prose Edits on the second branch instead; the macro definition stays branch-local.On anon-submission: replace placeholder \projectGithubUrl macro / footnote text with the anon URL. If data ships in-repo, also update the prose where the dataset is described to point at its in-repo path (e.g. "released as a public dataset under results/<dataset>/ in the project repository"). Place the URL in footnote 1 — usually the first paragraph of the body section that mentions code release; reviewers shouldn't have to flip pages to find it. Rebuild → re-scrub PDF metadata → commit (anonymous identity) → push.
On main: same prose updates, but with the real repo URL (https://github.com/<user>/<repo>). Bundle the cmap/T1 fontenc fix into this commit too, so main's PDF benefits from the same pdf.js compatibility. Commit (regular identity), and ask before pushing main — pushing to a public repo is a shared-state action; the user owns that decision.
tests/test_paper_macros.py (or equivalent) often reads the named-authors paper to verify macros resolve. Switch the test to read the anonymous wrapper. Easy to miss because tests pass on main but break on anon-submission after deletion.wandb/ is tracked. Sometimes a developer didn't add wandb/ to .gitignore. The directory contains wandb-metadata.json with your literal email address. Always grep for wandb/ in the audit; git rm -r wandb/ if tracked.git grep for personal paths AFTER all git rms, AFTER all edits, BEFORE pushing.git grep matches in binary files. Compressed image streams in PNGs/PDFs occasionally contain byte sequences that match a name. Verify with strings (PNG) or pdftotext (PDF). If the rendered text doesn't contain the string, it's a false positive.latexmk reporting "All targets up-to-date" after edits. Sometimes a stale .aux makes latexmk skip rebuilding. Use -g flag, or delete *.aux *.bbl *.fdb_latexmk *.fls, or use the project's rebuild-latex.py-style helper if one exists.latexmk writes hyperref-derived metadata on every successful build. Always scrub after the final rebuild, not before, or you'll ship un-scrubbed metadata.huggingface, link withheld, anonymous mirror and update.\footnote{Code and data: \projectGithubUrl} on the first paragraph of the body that mentions code release, not buried in the appendix.\projectHfDatasetUrl macro (or similar) defined for a HuggingFace/Zenodo dataset link that you're no longer using because the data ships in-repo, delete the macro definition rather than leave it pointing at a placeholder. An unused macro implies an external dataset that doesn't exist; reviewers may waste time looking for it. Also grep for stale prose: huggingface, link withheld, anonymous mirror is in supplementary materials, available at HF — replace with the in-repo path or delete.anon-submission, all commits use git -c user.name="Anonymous" -c user.email="[email protected]". On main, the user's regular git identity. Don't forget to switch back: a single Anonymous-authored commit slipping into main is awkward but not catastrophic; the reverse — your real identity in an anon-submission commit — is a leak via git log if the public repo is browsed directly.When triaging files, these regex/glob patterns surface most identity leaks:
| Pattern | Typical leak |
|---|---|
| paper/archive/**, paper/historical_*/** | author names in old wrappers |
| paper/CLAUDE.md, root CLAUDE.md | home paths in build instructions |
| wandb/** | email in metadata, your handle in run URLs |
| **/logs/*.log containing stack traces | full home-directory paths |
| scripts/**/upload_to_hf*.py, **/publish_to_*.py | account names in repo IDs, author names in docstrings |
| paper/citation_*verification*.md, *audit*.md, *workshop_submission*.md | internal review artefacts; not for reviewers |
| figures/**/*.pdf, paper/*.pdf | matplotlib /Author, hyperref /Title |
| **/refs.bib | check, but advisor self-cites are usually fine |
| tests/test_paper*.py | hardcoded paths to named-author wrapper |
To avoid over-zealous edits that break the paper:
refs.bib for papers your advisor co-authored. Third-person cites of advisor's prior work are allowed under standard double-blind rules.fancyhdr.sty, icml*.sty) that contain \f@nch@hf-style internal macro names with substring "hf". These are LaTeX internals, not Hugging Face references.Encountered in real sessions:
tlmgr install cmap — for pdf.js fontenc fixtlmgr install <whatever-the-log-says-is-missing> — TinyTeX is minimalpypdf (via uv run --with pypdf) — for metadata scrub; preferred over exiftool which may not be installedanon-submission @ <your-real-main>
└── one commit (or N small commits): "ICML 2026 workshop submission"
Author: Anonymous <[email protected]>
After the anon URL is in hand, expect ~2 more commits on top updating the URL macro, footnote text, and "Released artefacts" prose.
development
Use when the user asks to check, audit, or improve a website or web project for accessibility (a11y), WCAG compliance, screen reader support, keyboard navigation, color contrast, or alt text. Triggers a plan-mode investigation against the TeachAccess design and code checklists, then implements approved fixes.
development
Translate math (formulas, estimators, algorithms) into code so the implementation faithfully matches what the source actually specifies. Use when writing code from a formula, reviewing an LLM-generated implementation of a formula, debugging a numerical mismatch with a paper, designing a new metric/estimator, or refactoring an existing math-heavy computation. Especially load-bearing whenever aggregation operators (sums, means, expectations, products, geometric means) appear over indices that can be reordered, or whenever the same English label can refer to multiple non-equivalent estimators (e.g. ratio-of-means vs mean-of-ratios, micro-average vs macro-average, sample-weighted vs unweighted). Prevents the failure mode where a code path silently implements the wrong estimator under the same name as the intended one.
development
Use when the user asks to review, find, summarize, or check Claude Code chat transcripts from a past date or time range ("review my chats from May 1st", "what was I working on yesterday", "any unfinished sessions this week"). Reads transcripts under `~/.claude/projects/`, handles local-time vs UTC correctly so late-evening sessions don't get dropped, and flags chats whose last assistant turn looks like an unanswered question.
documentation
Consolidate scattered research notes, logs, experiment outputs, and submodule docs into a single living research paper. Use when the user wants to pull together multiple source documents into one structured paper.