skills/audit-script/SKILL.md
Systematic audit of data analysis scripts for bugs, analytical reasoning, data handling, style, and reproducibility. Includes domain verification phase that researches tools, file formats, and methods to catch domain-specific errors (not just code bugs). Use when auditing a script, reviewing code for correctness, checking for bugs, preparing a script for publication, or when the user says "audit this script", "review this code", "check this for bugs", or "is this script correct". Three modes: thorough (collaborative section-by-section), fast (Claude-driven with discussion), and report-only. Do NOT load for quick one-off questions about a single line or function.
npx skillsauth add musserlab/lab-claude-skills audit-scriptInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill systematically evaluates data analysis scripts for correctness, analytical soundness, and quality. It surfaces bugs, questionable analytical choices, data handling problems, style issues, and reproducibility gaps — producing a structured audit report with severity levels and action items.
Unlike /learn-code (which teaches students to understand code), this skill is for critical
evaluation — finding what's wrong, fragile, or misleading. The user is a collaborator, not a
student. The tone is direct and analytical.
The goal of an audit is NOT to make scripts handle every possible edge case. Data science scripts should be simple, clean, easy to read, and well-annotated. Adding defensive code for hypothetical problems makes scripts harder to read, which is the opposite of what we want.
The audit should:
get(load(f))
is fragile if files contain multiple objects" is useful context; "rewrite to use new.env()"
is over-engineering a one-time scriptContext matters. A one-time conversion script that processes a known, fixed dataset needs different treatment than a reusable pipeline that will see unknown inputs. The audit must calibrate its recommendations to the script's actual role.
When the skill is invoked (via /audit-script or auto-loaded from context):
Check for:
If none, ask: "Which script would you like to audit?"
Read the full script before proceeding.
Use AskUserQuestion:
"Is there anything in particular you're worried about or want me to focus on?"
This lets the user flag known weak points, steer attention to a specific category, or provide context about what the script is supposed to do.
This phase runs in all modes (thorough, fast, report-only) before the code audit begins. Its purpose is to close the gap between code-level review and domain-specific correctness by researching the actual tools, file formats, and analytical methods the script uses — then auditing the code against that verified knowledge rather than relying on background familiarity.
The most dangerous bugs in bioinformatics and data science aren't code bugs — they're misunderstandings of what the tools and data actually do. A script can be syntactically correct, logically clean, and still produce wrong results because the author (or reviewer) didn't know that:
inner_join silently drops unmatched rowsThese are domain assumptions — facts about tools, formats, and methods that the code depends on but doesn't state. The domain verification phase makes them explicit and checks them.
After reading the script, identify every external dependency the code relies on:
For each tool/format/method, use WebSearch and WebFetch to pull the relevant documentation and identify the critical behaviors the code must handle correctly. Focus on:
Produce a Domain Assumptions Checklist — a concrete list of facts that the code depends on, each verified against documentation. Format:
DOMAIN ASSUMPTIONS CHECKLIST
─────────────────────────────
Tool/Format: pysam + BAM
✓ Each multimapped read appears as multiple records (primary + secondary)
✓ Iterating bam.fetch() yields alignments, not reads — must deduplicate by query_name
✓ MAPQ 255 = uniquely mapped (Cell Ranger convention; standard SAM caps at 60)
✓ is_secondary (flag 0x100) vs is_supplementary (flag 0x800) are different categories
? PCR duplicate marking in Cell Ranger BAMs — need to verify
Tool/Format: BED
✓ 0-based, half-open coordinates (start inclusive, end exclusive)
✓ Converting from GFF3 (1-based inclusive): subtract 1 from start, keep end as-is
Method: minimap2 cross-species mapping
✓ -k 10 appropriate for short ncRNAs (default k=15 misses tRNAs)
✓ --secondary=yes needed for multi-copy genes (rRNA arrays)
? Alignment quality thresholds for cross-species mapping — worth checking
Mark each assumption: ✓ (verified against docs), ✗ (contradicted by docs — potential BUG), ? (couldn't verify — flag for manual review).
With the checklist in hand, trace through the code and verify that each assumption is handled correctly. This is where domain verification feeds into the standard audit:
After the audit, recommend that the script include an explicit ASSUMPTIONS block documenting the critical domain assumptions the code depends on. This makes future audits faster and helps students understand what the code takes for granted:
# ASSUMPTIONS (verified against Cell Ranger 9.0 docs, SAM spec v1.6):
# - BAM iteration yields alignments, not reads; we deduplicate via seen_reads set
# - MAPQ 255 = uniquely mapped (Cell Ranger convention, not standard SAM)
# - PCR duplicates are NOT marked in possorted_genome_bam.bam
# - is_secondary and is_supplementary alignments are skipped (primary only)
# - GFF3 coordinates are 1-based inclusive; converted to 0-based for pysam fetch
The depth of domain verification scales with audit mode and script complexity:
Every section of the script is evaluated against these categories. Each finding is tagged with its category and severity.
Before assigning severity, consider:
The user is a co-auditor. Claude does NOT pre-digest the script — both work through it together. The process of finding issues is as valuable as the findings themselves.
Read the script and present:
Ask: "Does this match your understanding of what this script should do?" Mismatches between intent and implementation are a finding category.
Run the full Domain Verification Phase (see above). In thorough mode:
For each logical section:
a. Present the code chunk (~15-20 lines max at a time)
b. User reads and runs it. Encourage the user to:
str(), dim(), head(), summary() in R;
.info(), .head(), .shape, .describe() in Python)c. Claude probes and suggests checks — targeted to what's most likely to go wrong with this specific type of code. Ask questions, suggest diagnostics, and raise concerns adapted to the operation at hand. The user adds domain context and responds. Findings are documented as they emerge.
For data loading/input:
For joins/merges:
anti_join() to see what gets dropped and whether that's acceptable"nrow() before and after"For filtering/subsetting:
For statistical tests/modeling:
For normalization/transformation:
For plotting/output:
The user adds their own observations and domain knowledge throughout. Their context may resolve concerns ("this threshold was chosen because of the experimental design") or raise new ones.
d. Run diagnostics together when something is suspicious:
anti_join() on these two tables and see how many rows don't match"summary() on this column — is the distribution what you'd expect?"e. Document findings — tag with category, severity, lines, and recommendation.
After all individual sections:
Compile findings documented throughout into the structured report format (see below).
Save the report to .claude/audit_reports/ (see "Audit Report Format" for details).
Every 2-3 sections, briefly check in: "How's the depth? Want to go faster or deeper?"
Claude works through the script independently, then discusses findings with the user.
Read the entire script.
Run the full Domain Verification Phase (see above). In fast mode:
Apply the 5-category checklist across all sections:
Full structured report with all findings, including Domain Assumptions Checklist.
Save the report to .claude/audit_reports/ (see "Audit Report Format" for details).
Present findings to the user, ordered by severity (BUG first):
Same as fast mode steps 1-4. No collaborative review. Produces the report and saves it
to .claude/audit_reports/. Findings are marked as "Unreviewed" in the status column.
In report-only mode, domain verification uses background knowledge + targeted web searches. Unknown assumptions are flagged as ? in the checklist for the user to review independently.
Best for: batch auditing multiple scripts, quick quality snapshots, or when the user will review the report in a separate session.
When auditing, Claude should actively run diagnostics (in Claude-driven modes) or suggest them (in collaborative mode):
dim(), nrow() before and after key operationsanti_join() to check unmatched rows on both sidessummary(), hist() for key variables, especially before statistical tests# Script Audit Report: {script_name}
**Date:** {date}
**Script:** {path/to/script}
**Auditor:** Claude Code {+ user name, if collaborative}
**Mode:** {Thorough / Fast / Report only}
## Summary
- **Total findings:** {N}
- **By severity:** {N} BUG, {N} CONCERN, {N} WARNING, {N} NOTE, {N} FYI
- **By category:** {N} Correctness, {N} Analytical, {N} Data Handling, {N} Style, {N} Reproducibility
- **Overall assessment:** {1-2 sentence summary of script quality and most critical issues}
- **Outputs trustworthy?** {Yes / Yes with caveats / No — must rerun after fixes}
{1-2 sentences explaining why. For "Yes with caveats", state which outputs are affected
and whether the impact is minor (e.g., cosmetic label error) or could change
interpretation. For "No", identify which bugs invalidate the outputs.
For scripts that don't produce analytical outputs (utilities, migration scripts, etc.),
state "N/A — script does not produce analytical outputs."}
## Domain Assumptions Checklist
| Tool/Format | Assumption | Verified? | Code Handles? | Finding |
|-------------|-----------|:---------:|:-------------:|---------|
| {tool} | {assumption} | ✓ / ✗ / ? | Yes / No / N/A | {ref or "OK"} |
## Findings
### BUG-1: {Short description}
- **Category:** {Correctness / Analytical / Data Handling / Style / Reproducibility}
- **Section:** {section name}
- **Lines:** {line range}
- **Description:** {What the issue is}
- **Impact:** {What goes wrong because of this}
- **Recommendation:** {How to fix it}
- **Status:** {Open / Discussed — {outcome} / Fixed / Unreviewed}
### CONCERN-1: {Short description}
...
### WARNING-1: {Short description}
...
### NOTE-1: {Short description}
...
### FYI-1: {Short description}
- **Category:** {category}
- **Lines:** {line range}
- **Description:** {What the pattern is and why it's worth knowing about}
- **Why not an action item:** {Why this doesn't need to change in this script's context}
## Sections Reviewed
| Section | Lines | Issues Found | Notes |
|---------|-------|-------------|-------|
| {name} | {range} | BUG-1, WARN-2 | {brief note} |
| {name} | {range} | None | Clean |
## Analytical Decisions Inventory
| Section | Decision | Current Choice | Justification | Alternatives | Risk Level |
|---------|----------|---------------|---------------|-------------|------------|
| ... | ... | ... | ... | ... | ... |
## Action Items
| Priority | Finding | Action | Owner |
|----------|---------|--------|-------|
| 1 | BUG-1 | Fix immediately | {name} |
| 2 | CONCERN-1 | Investigate | {name} |
Always save the report to .claude/audit_reports/{script_name}_audit_report.md in the
project root. Create the .claude/audit_reports/ directory if it doesn't exist. Every audit
must produce a saved report file — this is not optional.
method = "BH" is common doesn't mean it's right for this data. Every default is a choice.When this skill is active:
development
Phylogenetic tree visualization and formatting with ggtree (R) or iTOL (web). Use when rendering a phylogenetic tree as a figure, choosing tree layout, coloring branches or labels by taxonomy, collapsing clades, displaying support values, or adding overlays to a tree. Do NOT load for tree inference (use protein-phylogeny skill) or domain annotation (future separate skill).
development
Configure and manage Claude Code security protections for sensitive files, credentials, and data. Use when the user invokes /security-setup to set up or modify protections against unauthorized file access, credential exposure, or sensitive data leaks.
development
Script organization for data science analysis projects with numbered scripts, data/outs/ directories, and reproducibility conventions. Use when creating new analysis scripts in projects that follow data science conventions (numbered XX_ prefix scripts, outs/ directories, BUILD_INFO.txt). Do NOT load for documentation projects (Quarto books), infrastructure repos, or projects without data/outs/ directory structure.
testing
R renv package management for data science projects. Use when working with renv (renv.lock, renv::restore, renv::snapshot) in R analysis projects. Do NOT load for projects that do not use R or renv.