Make Anonymous Branch for Double-Blind Submission

Core idea: one branch (anon-submission) carries the entire blinded artefact — code, data, paper, derived results — in-tree. anonymous.4open.science mirrors that branch with .git/ stripped. main stays untouched throughout; all destructive edits live on the anon branch.

This skill is flexible. The phases and gotchas transfer; the exact file paths depend on the project. Adapt, don't blindly follow.

When this skill is wrong

Repo has > ~1 GB of data, or single files > 100 MB → use git-lfs, or host data externally and link from the paper. anonymous.4open.science inherits GitHub's 50 MB warn / 100 MB hard limits.
Submission venue forbids supplementary code/data → don't blind, don't submit.
Authors are not yet sure whether to submit → don't preemptively delete the named-authors paper version on main.

Phase 0: Pre-flight

Run these BEFORE proposing edits — they shape every later decision:

# 1. Total payload size (excluding things that won't ship)
du -sh results figures investigations data 2>/dev/null

# 2. Largest tracked files (sorted descending)
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 20000000 {print $5, $9}' | sort -rn | head -20

# 3. Confirm main is clean and you know what you're branching from
git status
git log main --oneline -5

Can the data ship in the repo? (Decide first — this gates everything else.)

anonymous.4open.science mirrors a regular GitHub branch, so GitHub's git-blob limits apply:

50 MB: GitHub warns the user on push (still accepted)
100 MB: GitHub hard-blocks the push without git-lfs
~1 GB: practical repo-size ceiling for reviewer clone times; above this, anonymous.4open.science responsiveness also degrades

Decision tree:

| Situation | Hosting decision | |---|---| | Largest single file < 50 MB and total payload < ~250 MB | In-repo. Simplest path. No LFS, no external host, no second anonymization step. | | Single files in 50–100 MB or total 250 MB–1 GB | In-repo with caveats. Acceptable but: warn user about clone times; consider whether some intermediate artefacts can be regenerated by scripts and dropped from the branch. | | Any single file > 100 MB or total > ~1 GB | External or LFS. git-lfs adds complexity; HF/Zenodo adds a second account to anonymize. Pick the one with less identity surface. | | Sensitive / non-redistributable data | Don't ship. Describe how to obtain it; have reviewers email an anonymized contact (some venues provide one). |

The default to push for: in-repo if it fits. The reason is concrete — every external host (HuggingFace, Zenodo, S3 bucket, Google Drive) adds another account/identity that needs separate anonymization, doubling the leakage surface. If the data is in the repo, anonymizing the repo anonymizes the data.

Verify the in-repo decision before committing to it

Run before git checkout -b:

# Find anything that would push you over 50/100 MB on the anon branch
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 50000000 {print $5, $9}'

# Total tracked-file size (rough — doesn't include things you'll add later)
git ls-files | xargs -I{} stat -c %s {} 2>/dev/null | awk '{s+=$1} END {printf "tracked: %.1f MB\n", s/1024/1024}'

# If a wandb/ or build-artefact dir is currently tracked, surface it now:
du -sh $(git ls-files | xargs -I{} dirname {} | sort -u | head -50) 2>/dev/null | sort -h | tail -10

Any single result > 50 MB → revisit the hosting decision; > 100 MB → must use LFS or external.

Other upfront decisions

Branch lineage: branch from main, never modify main until the anon branch is shipped.
Squash needed?: anonymous.4open.science strips .git/, so reviewers don't see commit history via the proxy. But if the public GitHub repo will also be visible (typical), past commits authored by your real GitHub account will leak. Decide whether to squash to a single anon-author commit (defense in depth).

Phase 1: Audit (parallel Haiku Explore subagents)

The dominant failure mode is missing an identity leak. Dispatch 3 Haiku Explore subagents in parallel — they're cheap, thorough, and find things you'd miss by eye:

Identity sweep agent: scan investigations/, docs/, notes/, all .md/.py files for: author names, GitHub usernames, institution names, advisor names, home-directory paths (/home/<user>, /Users/<user>), email addresses (@gmail, @<institution>), hostnames, wandb run URLs.
Build-dependency agent: list every file slated for deletion, then verify nothing in tests/, scripts/, src/, or LaTeX \input{} chain depends on it. The classic gotcha: a paper-macros test that hardcodes a path to the named-authors .tex.
README/JSON/PDF agent: skim root files (README.md, pyproject.toml, CLAUDE.md, .gitignore), audit a sample of large JSON files for email/username/hostname/wandb_run_id keys, list figure PDFs that may carry matplotlib Author metadata.

See superpowers:dispatching-parallel-agents for how to launch these concurrently.

Phase 2: Plan the deletions and edits

From the audit, build three lists:

A. git rm — identity-bearing files (typical patterns; project-specific paths):

Named-authors paper wrapper(s) and their compiled PDFs (often archived, e.g. paper/archive/)
Historical paper versions (often paper/historical_papers/, paper/historical_sections/)
Personal-path files: any *.log containing /home/<user>/... stack traces
Internal working docs: citation audits, paper checklists, todo files
paper/CLAUDE.md, root CLAUDE.md if they contain personal paths
wandb/ directory if tracked — wandb-metadata.json contains your email
Upload-to-HF scripts (often have your HF org name + author names in docstrings)
macOS/IDE noise: .DS_Store, __MACOSX/

B. git rm — internal-only docs (not identity leaks, but reduce review surface):

Citation verification reports, paper-writing checklists, workshop-submission TODOs
original_vs_workshop_audit.md-style internal comparison docs
Personal todo files

C. In-place scrubs — files that stay but contain identifying lines:

USER_AGENT = "...(<your-email>)" in build scripts → drop the email
"Investigator: Claude (under <name>)" in research notes → "under the lead author"
"Raise blockers against <name>" → "with the lead investigator"
Anywhere prose names you/your collaborators (rare in research notes)

Phase 3: Branch creation and surgery

# Branch from main — DO NOT switch back to main with uncommitted changes
git checkout main && git pull
git checkout -b anon-submission

Pause and verify before any bulk delete. Run git branch --show-current and git status to confirm you're on anon-submission with main's tree intact. Real-session pattern: users sometimes ask to "check the branch" before the first git rm — that's healthy. A bulk delete on the wrong branch is recoverable (the files are still in main's history) but disorienting; better to confirm than rush.

# Apply the deletions (group by category for reviewable diffs)
git rm -r paper/archive paper/historical_papers paper/historical_sections
git rm paper/<named-author-wrapper>.{tex,pdf}
git rm paper/CLAUDE.md paper/<internal-investigation>.md
git rm scripts/<upload-to-hf>.py
git rm -r wandb
git rm <log-with-home-paths>
# ... etc per audit findings

git rm --ignore-unmatch .DS_Store paper/.DS_Store

# Apply in-place edits (use Edit tool on each file)
# - drop email from USER_AGENT
# - rename "<name>" mentions to "the lead author" / "the investigator"

Phase 4: pdf.js compatibility (LaTeX papers)

anonymous.4open.science renders PDFs via pdf.js (Mozilla), which handles default LaTeX URW Times Type 1 fonts poorly — visible as smushed-together characters / broken kerning. The fix is two lines in the LaTeX preamble, before microtype and the conference style file:

\usepackage{cmap}          % must come BEFORE fontenc
\usepackage[T1]{fontenc}   % T1 encoding has glyph-width tables pdf.js handles

cmap is sometimes missing from minimal TeX distributions:

tlmgr install cmap   # if you get "File `cmap.sty' not found"

After adding these, page count typically does not change (verify; if it does shift across the venue's main-paper page limit, back out and pick a different fix like \usepackage{newtxtext,newtxmath}).

Visual confirmation is mandatory — page count tells you nothing about kerning. After the anon URL is live, ask the user to open the PDF preview on anonymous.4open.science (or open it yourself if you have a browser tool) and confirm characters aren't smushed. A passing build with bad rendering is the failure mode this fix addresses.

Phase 5: PDF metadata scrub

Even with anonymous LaTeX (\icmlauthor{Anonymous Authors}{anon}), hyperref and pdfTeX bake the title, anonymized author string, subject, and creator into PDF metadata. This survives the LaTeX-side anonymization. Scrub via pypdf:

# uv run --with pypdf python <<'EOF'
from pypdf import PdfReader, PdfWriter
from pathlib import Path

pdfs = sorted(set(Path("figures").rglob("*.pdf")) | {Path("paper/<main>.pdf")})
for p in pdfs:
    if not p.exists(): continue
    r = PdfReader(str(p))
    w = PdfWriter(clone_from=r)
    w.add_metadata({
        "/Author": "", "/Creator": "", "/Title": "", "/Subject": "",
        "/Keywords": "", "/Producer": "",
    })
    tmp = p.with_suffix(".pdf.scrubtmp")
    with open(tmp, "wb") as f: w.write(f)
    tmp.replace(p)
EOF

Re-run this after every LaTeX rebuild — latexmk writes fresh metadata each pass.

Phase 6: Verify

Critical and non-negotiable. The whole point of the branch is that these checks pass.

# 1. Build still works
cd paper && latexmk -pdf -interaction=nonstopmode <main>.tex && cd ..

# 2. Tests still pass (skip slow GPU tests if needed)
uv run pytest

# 3. Final identity-string sweep — all should return ZERO tracked-text hits
git grep -i -n -I -E "<your-name-tokens>|<github-handle>|<institution>|<advisor>|@gmail|@<institution>"
git grep -n -I -E "/home/<user>|/Users/<user>"

# 4. PDF metadata
uv run --with pypdf python -c "
from pypdf import PdfReader
import sys
for p in sys.argv[1:]:
    md = PdfReader(p).metadata or {}
    bad = {k: v for k, v in md.items() if v and any(s in str(v).lower() for s in ['<your-name-lowercase>','<institution-lowercase>'])}
    print(p, '→', 'CLEAN' if not bad else f'LEAKS: {bad}')
" paper/<main>.pdf figures/**/*.pdf

Triage the grep hits:

Identity hits in source files / docs → real leaks, fix.
Identity hits in binary files (PNG, PDF stream bytes) → almost always false positives in compressed image data. Verify via pdftotext / strings extraction; if no actual rendered text contains the name, ignore.
Identity hits in dataset CSVs/JSONLs as part of LLM-generated text (e.g. a Tevet completion mentions "Matthew" as a Bible character) → not a leak, common name in eval data.
Bibliography entries citing advisor's papers → not a leak under most double-blind rules; third-person citations of one's own / advisor's prior work are explicitly allowed.

Phase 7: Commit + push

# Use an explicitly anonymous identity (defense in depth — safe even if
# repo is later shared without anonymous.4open.science in front)
git add -A
git -c user.name="Anonymous" -c user.email="[email protected]" \
    commit -m "<venue> submission"

# Push to remote (regular GitHub; anonymous.4open.science fetches from there)
git push -u origin anon-submission

# OPTIONAL: full single-commit orphan branch (drops inherited git history
# from main entirely — only matters if reviewers may clone the public repo
# directly, bypassing anonymous.4open.science). Destructive of the branch
# ref; ASK USER before doing this.

Phase 8: After receiving the anonymous URL

Order of operations (subtle but matters)

Push anon-submission to GitHub first with placeholder URLs intact (Phase 7).
User submits the GitHub URL (e.g. github.com/<user>/<repo>/tree/anon-submission) to anonymous.4open.science.
anonymous.4open.science returns a stable mirror URL like https://anonymous.4open.science/r/<repo>-67E6/ (random 4-character hash suffix).
NOW you can update the \projectGithubUrl macro to the anon URL. Rebuild, re-scrub metadata, commit, push.
Mirror to main with the real URL in the same \projectGithubUrl slot.

The chicken-and-egg trap: you can't put the anon URL into the paper before submitting, and submitting requires the paper. The pattern above breaks the cycle by accepting one round-trip — push placeholder, submit, push real URL.

The macro-as-unifying-device pattern

Define \projectGithubUrl once per branch in the LaTeX preamble (different value per branch). Every footnote, prose mention, and "Released artefacts" paragraph references the macro, never the literal URL. Result:

The same \input{}'d section files (01_motivation.tex etc.) work on both main and anon-submission — the URL changes by virtue of the per-branch macro definition.
You can re-apply identical Edit calls to both branches when changing footnote prose.
Do not cherry-pick the main URL-update commit onto anon-submission (or vice versa). Cherry-pick brings the \newcommand{\projectGithubUrl}{...} line, which has the wrong URL for the destination branch. Re-do the prose Edits on the second branch instead; the macro definition stays branch-local.

What to update

On anon-submission: replace placeholder \projectGithubUrl macro / footnote text with the anon URL. If data ships in-repo, also update the prose where the dataset is described to point at its in-repo path (e.g. "released as a public dataset under results/<dataset>/ in the project repository"). Place the URL in footnote 1 — usually the first paragraph of the body section that mentions code release; reviewers shouldn't have to flip pages to find it. Rebuild → re-scrub PDF metadata → commit (anonymous identity) → push.
On main: same prose updates, but with the real repo URL (https://github.com/<user>/<repo>). Bundle the cmap/T1 fontenc fix into this commit too, so main's PDF benefits from the same pdf.js compatibility. Commit (regular identity), and ask before pushing main — pushing to a public repo is a shared-state action; the user owns that decision.

Common gotchas (caught in real sessions)

Test files hardcoding the anonymized .tex path. tests/test_paper_macros.py (or equivalent) often reads the named-authors paper to verify macros resolve. Switch the test to read the anonymous wrapper. Easy to miss because tests pass on main but break on anon-submission after deletion.
wandb/ is tracked. Sometimes a developer didn't add wandb/ to .gitignore. The directory contains wandb-metadata.json with your literal email address. Always grep for wandb/ in the audit; git rm -r wandb/ if tracked.
Untracked-then-tracked scripts. A file that was untracked at session start can become tracked between audit and commit. Re-run git grep for personal paths AFTER all git rms, AFTER all edits, BEFORE pushing.
git grep matches in binary files. Compressed image streams in PNGs/PDFs occasionally contain byte sequences that match a name. Verify with strings (PNG) or pdftotext (PDF). If the rendered text doesn't contain the string, it's a false positive.
latexmk reporting "All targets up-to-date" after edits. Sometimes a stale .aux makes latexmk skip rebuilding. Use -g flag, or delete *.aux *.bbl *.fdb_latexmk *.fls, or use the project's rebuild-latex.py-style helper if one exists.
PDF metadata after rebuild. latexmk writes hyperref-derived metadata on every successful build. Always scrub after the final rebuild, not before, or you'll ship un-scrubbed metadata.
Dataset path in paper prose. If you decided to ship data in-repo (instead of HF/Zenodo), the paper's "released artefacts" or "data availability" section probably still says "available at HF / link withheld for double-blind". Search for huggingface, link withheld, anonymous mirror and update.
Footnote 1 placement. Reviewers want to find the code link without flipping pages. Put \footnote{Code and data: \projectGithubUrl} on the first paragraph of the body that mentions code release, not buried in the appendix.
Unused dataset macros after going in-repo. If the paper has a \projectHfDatasetUrl macro (or similar) defined for a HuggingFace/Zenodo dataset link that you're no longer using because the data ships in-repo, delete the macro definition rather than leave it pointing at a placeholder. An unused macro implies an external dataset that doesn't exist; reviewers may waste time looking for it. Also grep for stale prose: huggingface, link withheld, anonymous mirror is in supplementary materials, available at HF — replace with the in-repo path or delete.
Two-identity workflow. On anon-submission, all commits use git -c user.name="Anonymous" -c user.email="[email protected]". On main, the user's regular git identity. Don't forget to switch back: a single Anonymous-authored commit slipping into main is awkward but not catastrophic; the reverse — your real identity in an anon-submission commit — is a leak via git log if the public repo is browsed directly.

File-pattern checklist (project-agnostic)

When triaging files, these regex/glob patterns surface most identity leaks:

| Pattern | Typical leak | |---|---| | paper/archive/**, paper/historical_*/** | author names in old wrappers | | paper/CLAUDE.md, root CLAUDE.md | home paths in build instructions | | wandb/** | email in metadata, your handle in run URLs | | **/logs/*.log containing stack traces | full home-directory paths | | scripts/**/upload_to_hf*.py, **/publish_to_*.py | account names in repo IDs, author names in docstrings | | paper/citation_*verification*.md, *audit*.md, *workshop_submission*.md | internal review artefacts; not for reviewers | | figures/**/*.pdf, paper/*.pdf | matplotlib /Author, hyperref /Title | | **/refs.bib | check, but advisor self-cites are usually fine | | tests/test_paper*.py | hardcoded paths to named-author wrapper |

Reference: what to NOT scrub

To avoid over-zealous edits that break the paper:

Citation entries in refs.bib for papers your advisor co-authored. Third-person cites of advisor's prior work are allowed under standard double-blind rules.
Names of cited authors in prose ("Tevet and Berant's benchmark", "Kirk et al."). These are normal references, not identity leaks.
LLM-generated text in eval datasets that happens to contain a common name matching yours. The reviewer doesn't connect "the LLM generated 'Matthew' as a story character" to "the author's name is Matthew".
Style file internals (fancyhdr.sty, icml*.sty) that contain \f@nch@hf-style internal macro names with substring "hf". These are LaTeX internals, not Hugging Face references.

Reference: useful tlmgr / system packages

Encountered in real sessions:

tlmgr install cmap — for pdf.js fontenc fix
tlmgr install <whatever-the-log-says-is-missing> — TinyTeX is minimal
pypdf (via uv run --with pypdf) — for metadata scrub; preferred over exiftool which may not be installed

Final shape of the branch

anon-submission @ <your-real-main>
└── one commit (or N small commits): "ICML 2026 workshop submission"
    Author: Anonymous <[email protected]>

After the anon URL is in hand, expect ~2 more commits on top updating the URL macro, footnote text, and "Released artefacts" prose.

Make Anonymous Branch for Double-Blind Submission

This skill is flexible. The phases and gotchas transfer; the exact file paths depend on the project. Adapt, don't blindly follow.

When this skill is wrong

Repo has > ~1 GB of data, or single files > 100 MB → use git-lfs, or host data externally and link from the paper. anonymous.4open.science inherits GitHub's 50 MB warn / 100 MB hard limits.
Submission venue forbids supplementary code/data → don't blind, don't submit.
Authors are not yet sure whether to submit → don't preemptively delete the named-authors paper version on main.

Phase 0: Pre-flight

Run these BEFORE proposing edits — they shape every later decision:

# 1. Total payload size (excluding things that won't ship)
du -sh results figures investigations data 2>/dev/null

# 2. Largest tracked files (sorted descending)
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 20000000 {print $5, $9}' | sort -rn | head -20

# 3. Confirm main is clean and you know what you're branching from
git status
git log main --oneline -5

Can the data ship in the repo? (Decide first — this gates everything else.)

anonymous.4open.science mirrors a regular GitHub branch, so GitHub's git-blob limits apply:

50 MB: GitHub warns the user on push (still accepted)
100 MB: GitHub hard-blocks the push without git-lfs
~1 GB: practical repo-size ceiling for reviewer clone times; above this, anonymous.4open.science responsiveness also degrades

Decision tree:

Verify the in-repo decision before committing to it

Run before git checkout -b:

# Find anything that would push you over 50/100 MB on the anon branch
git ls-files | xargs -I{} ls -l {} 2>/dev/null | awk '$5 > 50000000 {print $5, $9}'

# Total tracked-file size (rough — doesn't include things you'll add later)
git ls-files | xargs -I{} stat -c %s {} 2>/dev/null | awk '{s+=$1} END {printf "tracked: %.1f MB\n", s/1024/1024}'

# If a wandb/ or build-artefact dir is currently tracked, surface it now:
du -sh $(git ls-files | xargs -I{} dirname {} | sort -u | head -50) 2>/dev/null | sort -h | tail -10

Any single result > 50 MB → revisit the hosting decision; > 100 MB → must use LFS or external.

Other upfront decisions

Branch lineage: branch from main, never modify main until the anon branch is shipped.
Squash needed?: anonymous.4open.science strips .git/, so reviewers don't see commit history via the proxy. But if the public GitHub repo will also be visible (typical), past commits authored by your real GitHub account will leak. Decide whether to squash to a single anon-author commit (defense in depth).

Phase 1: Audit (parallel Haiku Explore subagents)

The dominant failure mode is missing an identity leak. Dispatch 3 Haiku Explore subagents in parallel — they're cheap, thorough, and find things you'd miss by eye:

Identity sweep agent: scan investigations/, docs/, notes/, all .md/.py files for: author names, GitHub usernames, institution names, advisor names, home-directory paths (/home/<user>, /Users/<user>), email addresses (@gmail, @<institution>), hostnames, wandb run URLs.
Build-dependency agent: list every file slated for deletion, then verify nothing in tests/, scripts/, src/, or LaTeX \input{} chain depends on it. The classic gotcha: a paper-macros test that hardcodes a path to the named-authors .tex.
README/JSON/PDF agent: skim root files (README.md, pyproject.toml, CLAUDE.md, .gitignore), audit a sample of large JSON files for email/username/hostname/wandb_run_id keys, list figure PDFs that may carry matplotlib Author metadata.

See superpowers:dispatching-parallel-agents for how to launch these concurrently.

Phase 2: Plan the deletions and edits

From the audit, build three lists:

A. git rm — identity-bearing files (typical patterns; project-specific paths):

Named-authors paper wrapper(s) and their compiled PDFs (often archived, e.g. paper/archive/)
Historical paper versions (often paper/historical_papers/, paper/historical_sections/)
Personal-path files: any *.log containing /home/<user>/... stack traces
Internal working docs: citation audits, paper checklists, todo files
paper/CLAUDE.md, root CLAUDE.md if they contain personal paths
wandb/ directory if tracked — wandb-metadata.json contains your email
Upload-to-HF scripts (often have your HF org name + author names in docstrings)
macOS/IDE noise: .DS_Store, __MACOSX/

B. git rm — internal-only docs (not identity leaks, but reduce review surface):

Citation verification reports, paper-writing checklists, workshop-submission TODOs
original_vs_workshop_audit.md-style internal comparison docs
Personal todo files

C. In-place scrubs — files that stay but contain identifying lines:

USER_AGENT = "...(<your-email>)" in build scripts → drop the email
"Investigator: Claude (under <name>)" in research notes → "under the lead author"
"Raise blockers against <name>" → "with the lead investigator"
Anywhere prose names you/your collaborators (rare in research notes)

Phase 3: Branch creation and surgery

# Branch from main — DO NOT switch back to main with uncommitted changes
git checkout main && git pull
git checkout -b anon-submission

# Apply the deletions (group by category for reviewable diffs)
git rm -r paper/archive paper/historical_papers paper/historical_sections
git rm paper/<named-author-wrapper>.{tex,pdf}
git rm paper/CLAUDE.md paper/<internal-investigation>.md
git rm scripts/<upload-to-hf>.py
git rm -r wandb
git rm <log-with-home-paths>
# ... etc per audit findings

git rm --ignore-unmatch .DS_Store paper/.DS_Store

# Apply in-place edits (use Edit tool on each file)
# - drop email from USER_AGENT
# - rename "<name>" mentions to "the lead author" / "the investigator"

Phase 4: pdf.js compatibility (LaTeX papers)

\usepackage{cmap}          % must come BEFORE fontenc
\usepackage[T1]{fontenc}   % T1 encoding has glyph-width tables pdf.js handles

cmap is sometimes missing from minimal TeX distributions:

tlmgr install cmap   # if you get "File `cmap.sty' not found"

Phase 5: PDF metadata scrub

# uv run --with pypdf python <<'EOF'
from pypdf import PdfReader, PdfWriter
from pathlib import Path

pdfs = sorted(set(Path("figures").rglob("*.pdf")) | {Path("paper/<main>.pdf")})
for p in pdfs:
    if not p.exists(): continue
    r = PdfReader(str(p))
    w = PdfWriter(clone_from=r)
    w.add_metadata({
        "/Author": "", "/Creator": "", "/Title": "", "/Subject": "",
        "/Keywords": "", "/Producer": "",
    })
    tmp = p.with_suffix(".pdf.scrubtmp")
    with open(tmp, "wb") as f: w.write(f)
    tmp.replace(p)
EOF

Re-run this after every LaTeX rebuild — latexmk writes fresh metadata each pass.

Phase 6: Verify

Critical and non-negotiable. The whole point of the branch is that these checks pass.

# 1. Build still works
cd paper && latexmk -pdf -interaction=nonstopmode <main>.tex && cd ..

# 2. Tests still pass (skip slow GPU tests if needed)
uv run pytest

# 3. Final identity-string sweep — all should return ZERO tracked-text hits
git grep -i -n -I -E "<your-name-tokens>|<github-handle>|<institution>|<advisor>|@gmail|@<institution>"
git grep -n -I -E "/home/<user>|/Users/<user>"

# 4. PDF metadata
uv run --with pypdf python -c "
from pypdf import PdfReader
import sys
for p in sys.argv[1:]:
    md = PdfReader(p).metadata or {}
    bad = {k: v for k, v in md.items() if v and any(s in str(v).lower() for s in ['<your-name-lowercase>','<institution-lowercase>'])}
    print(p, '→', 'CLEAN' if not bad else f'LEAKS: {bad}')
" paper/<main>.pdf figures/**/*.pdf

Triage the grep hits:

Identity hits in source files / docs → real leaks, fix.
Identity hits in binary files (PNG, PDF stream bytes) → almost always false positives in compressed image data. Verify via pdftotext / strings extraction; if no actual rendered text contains the name, ignore.
Identity hits in dataset CSVs/JSONLs as part of LLM-generated text (e.g. a Tevet completion mentions "Matthew" as a Bible character) → not a leak, common name in eval data.
Bibliography entries citing advisor's papers → not a leak under most double-blind rules; third-person citations of one's own / advisor's prior work are explicitly allowed.

Phase 7: Commit + push

# Use an explicitly anonymous identity (defense in depth — safe even if
# repo is later shared without anonymous.4open.science in front)
git add -A
git -c user.name="Anonymous" -c user.email="[email protected]" \
    commit -m "<venue> submission"

# Push to remote (regular GitHub; anonymous.4open.science fetches from there)
git push -u origin anon-submission

# OPTIONAL: full single-commit orphan branch (drops inherited git history
# from main entirely — only matters if reviewers may clone the public repo
# directly, bypassing anonymous.4open.science). Destructive of the branch
# ref; ASK USER before doing this.

Phase 8: After receiving the anonymous URL

Order of operations (subtle but matters)

Push anon-submission to GitHub first with placeholder URLs intact (Phase 7).
User submits the GitHub URL (e.g. github.com/<user>/<repo>/tree/anon-submission) to anonymous.4open.science.
anonymous.4open.science returns a stable mirror URL like https://anonymous.4open.science/r/<repo>-67E6/ (random 4-character hash suffix).
NOW you can update the \projectGithubUrl macro to the anon URL. Rebuild, re-scrub metadata, commit, push.
Mirror to main with the real URL in the same \projectGithubUrl slot.

The macro-as-unifying-device pattern

The same \input{}'d section files (01_motivation.tex etc.) work on both main and anon-submission — the URL changes by virtue of the per-branch macro definition.
You can re-apply identical Edit calls to both branches when changing footnote prose.
Do not cherry-pick the main URL-update commit onto anon-submission (or vice versa). Cherry-pick brings the \newcommand{\projectGithubUrl}{...} line, which has the wrong URL for the destination branch. Re-do the prose Edits on the second branch instead; the macro definition stays branch-local.

What to update

On anon-submission: replace placeholder \projectGithubUrl macro / footnote text with the anon URL. If data ships in-repo, also update the prose where the dataset is described to point at its in-repo path (e.g. "released as a public dataset under results/<dataset>/ in the project repository"). Place the URL in footnote 1 — usually the first paragraph of the body section that mentions code release; reviewers shouldn't have to flip pages to find it. Rebuild → re-scrub PDF metadata → commit (anonymous identity) → push.
On main: same prose updates, but with the real repo URL (https://github.com/<user>/<repo>). Bundle the cmap/T1 fontenc fix into this commit too, so main's PDF benefits from the same pdf.js compatibility. Commit (regular identity), and ask before pushing main — pushing to a public repo is a shared-state action; the user owns that decision.

Common gotchas (caught in real sessions)

Test files hardcoding the anonymized .tex path. tests/test_paper_macros.py (or equivalent) often reads the named-authors paper to verify macros resolve. Switch the test to read the anonymous wrapper. Easy to miss because tests pass on main but break on anon-submission after deletion.
wandb/ is tracked. Sometimes a developer didn't add wandb/ to .gitignore. The directory contains wandb-metadata.json with your literal email address. Always grep for wandb/ in the audit; git rm -r wandb/ if tracked.
Untracked-then-tracked scripts. A file that was untracked at session start can become tracked between audit and commit. Re-run git grep for personal paths AFTER all git rms, AFTER all edits, BEFORE pushing.
git grep matches in binary files. Compressed image streams in PNGs/PDFs occasionally contain byte sequences that match a name. Verify with strings (PNG) or pdftotext (PDF). If the rendered text doesn't contain the string, it's a false positive.
latexmk reporting "All targets up-to-date" after edits. Sometimes a stale .aux makes latexmk skip rebuilding. Use -g flag, or delete *.aux *.bbl *.fdb_latexmk *.fls, or use the project's rebuild-latex.py-style helper if one exists.
PDF metadata after rebuild. latexmk writes hyperref-derived metadata on every successful build. Always scrub after the final rebuild, not before, or you'll ship un-scrubbed metadata.
Dataset path in paper prose. If you decided to ship data in-repo (instead of HF/Zenodo), the paper's "released artefacts" or "data availability" section probably still says "available at HF / link withheld for double-blind". Search for huggingface, link withheld, anonymous mirror and update.
Footnote 1 placement. Reviewers want to find the code link without flipping pages. Put \footnote{Code and data: \projectGithubUrl} on the first paragraph of the body that mentions code release, not buried in the appendix.
Unused dataset macros after going in-repo. If the paper has a \projectHfDatasetUrl macro (or similar) defined for a HuggingFace/Zenodo dataset link that you're no longer using because the data ships in-repo, delete the macro definition rather than leave it pointing at a placeholder. An unused macro implies an external dataset that doesn't exist; reviewers may waste time looking for it. Also grep for stale prose: huggingface, link withheld, anonymous mirror is in supplementary materials, available at HF — replace with the in-repo path or delete.
Two-identity workflow. On anon-submission, all commits use git -c user.name="Anonymous" -c user.email="[email protected]". On main, the user's regular git identity. Don't forget to switch back: a single Anonymous-authored commit slipping into main is awkward but not catastrophic; the reverse — your real identity in an anon-submission commit — is a leak via git log if the public repo is browsed directly.

File-pattern checklist (project-agnostic)

When triaging files, these regex/glob patterns surface most identity leaks:

Reference: what to NOT scrub

To avoid over-zealous edits that break the paper:

Citation entries in refs.bib for papers your advisor co-authored. Third-person cites of advisor's prior work are allowed under standard double-blind rules.
Names of cited authors in prose ("Tevet and Berant's benchmark", "Kirk et al."). These are normal references, not identity leaks.
LLM-generated text in eval datasets that happens to contain a common name matching yours. The reviewer doesn't connect "the LLM generated 'Matthew' as a story character" to "the author's name is Matthew".
Style file internals (fancyhdr.sty, icml*.sty) that contain \f@nch@hf-style internal macro names with substring "hf". These are LaTeX internals, not Hugging Face references.

Reference: useful tlmgr / system packages

Encountered in real sessions:

tlmgr install cmap — for pdf.js fontenc fix
tlmgr install <whatever-the-log-says-is-missing> — TinyTeX is minimal
pypdf (via uv run --with pypdf) — for metadata scrub; preferred over exiftool which may not be installed

Final shape of the branch

anon-submission @ <your-real-main>
└── one commit (or N small commits): "ICML 2026 workshop submission"
    Author: Anonymous <[email protected]>

After the anon URL is in hand, expect ~2 more commits on top updating the URL macro, footnote text, and "Released artefacts" prose.

Adoption

AMindToThink/skills/make-anonymous-branch

$ install --global

Security Scan Results

SKILL.md

Make Anonymous Branch for Double-Blind Submission

When this skill is wrong

Phase 0: Pre-flight

Can the data ship in the repo? (Decide first — this gates everything else.)

Verify the in-repo decision before committing to it

Other upfront decisions

Phase 1: Audit (parallel Haiku Explore subagents)

Phase 2: Plan the deletions and edits

Phase 3: Branch creation and surgery

Phase 4: pdf.js compatibility (LaTeX papers)

Phase 5: PDF metadata scrub

Phase 6: Verify

Phase 7: Commit + push

Phase 8: After receiving the anonymous URL

Order of operations (subtle but matters)

The macro-as-unifying-device pattern

What to update

Common gotchas (caught in real sessions)

File-pattern checklist (project-agnostic)

Reference: what to NOT scrub

Reference: useful tlmgr / system packages

Final shape of the branch

Related Skills

AMindToThink/accessible-website-check

AMindToThink/implement-math

AMindToThink/finding-old-chats

AMindToThink/consolidate-paper

AMindToThink/skills/make-anonymous-branch

$ install --global

Security Scan Results

SKILL.md

Make Anonymous Branch for Double-Blind Submission

When this skill is wrong

Phase 0: Pre-flight

Can the data ship in the repo? (Decide first — this gates everything else.)

Verify the in-repo decision before committing to it

Other upfront decisions

Phase 1: Audit (parallel Haiku Explore subagents)

Phase 2: Plan the deletions and edits

Phase 3: Branch creation and surgery

Phase 4: pdf.js compatibility (LaTeX papers)

Phase 5: PDF metadata scrub

Phase 6: Verify

Phase 7: Commit + push

Phase 8: After receiving the anonymous URL

Order of operations (subtle but matters)

The macro-as-unifying-device pattern

What to update

Common gotchas (caught in real sessions)

File-pattern checklist (project-agnostic)

Reference: what to NOT scrub

Reference: useful tlmgr / system packages

Final shape of the branch

Related Skills

AMindToThink/accessible-website-check

AMindToThink/implement-math

AMindToThink/finding-old-chats

AMindToThink/consolidate-paper