Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pedrohcgs/audit-reproducibility

Name: audit-reproducibility
Author: pedrohcgs

.claude/skills/audit-reproducibility/SKILL.md

npx skillsauth add pedrohcgs/claude-code-my-workflow audit-reproducibility

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Audit Reproducibility

Compare numeric claims in a manuscript (point estimates, standard errors, p-values, counts) against the actual outputs produced by the analysis pipeline. Report PASS / FAIL per claim against the tolerance thresholds defined in .claude/rules/replication-protocol.md.

Core principle: If the paper says ATT = -1.632 (0.584) and the code produces -1.628 (0.591), we verify — numerically — that the difference is within the documented tolerance. No more "looks close enough" eyeballing.

When to use

Before submission. Catches the "I updated the analysis but forgot to update Table 2" bug.
Before releasing a replication package. Verifies the code actually reproduces the paper.
After a major revision. Ensures the paper still matches the latest code.
Quality-gate in /commit. Pair with a pre-commit invocation on manuscript + analysis changes.

Inputs

$0 — path to the manuscript (.tex, .qmd, .md, .pdf). Required.
$1 — path to the outputs directory. Defaults to scripts/R/_outputs/. Can be _targets/objects/, a Stata .do-file log directory, etc.

Workflow

Phase 0: Pre-flight

Read replication-protocol.md for the tolerance thresholds currently in effect.
Verify the outputs directory exists and is non-empty. If empty or stale (older than the manuscript), prompt the user to re-run their pipeline (e.g., Rscript scripts/R/00_run_all.R) before auditing.
Ensure a sessionInfo.txt or equivalent environment capture exists in the outputs dir.

Phase 1: Extract claims from the manuscript

Parse the manuscript for numeric claims. Patterns to match:

Point-estimate + SE: ATT = -1.632 (0.584), $\beta = 0.342$ (0.091), hat{\tau} = 1.28** with starred significance
Table cells: & -1.632$^{***}$ & 0.584 & in LaTeX table environments
Counts: our sample of 2,847 firms, $N = 2{,}847$
Summary stats: mean = 0.423, SD = 0.087
P-values: p < 0.01, $p = 0.003$

Record each claim as a tuple:

{
  claim_id: "Table2_col3_ATT",
  location: "Table 2, Column 3, row 'Treatment'",
  kind: "point_estimate" | "standard_error" | "p_value" | "count" | "percentage",
  reported_value: -1.632,
  uncertainty: 0.584,              # only for point estimates
  significance_stars: 3,            # 0-3 or None
  raw_context: "the ATT estimate of -1.632 (0.584) indicates..."
}

Write the extracted claims to quality_reports/reproducibility_claims_[manuscript-name].json so the user can review the extraction before audit.

Phase 2: Extract results from outputs

Scan $1 for corresponding values. Priority order:

.rds files — readRDS(path)$coef[["treatment"]] style lookups. Can use Rscript -e "saveRDS(summary(readRDS(...)), '/tmp/audit.rds')" to extract.
.tex tables — parse LaTeX table cells directly; match on column headers + row labels.
.csv summary files — pandas/readr parse, key-value lookup.
.out / .log files (Stata, regress output) — regex extraction.
.json — direct key lookup.

Record each extracted result:

{
  source: "scripts/R/_outputs/results.rds",
  lookup_key: "fit_main$coefficients['treated']",
  value: -1.628,
  uncertainty: 0.591,
  p_value: 0.005
}

Phase 3: Match claims to results

Use fuzzy heuristics when exact labels don't match:

Name similarity ("treatment effect" ~ "ATT" ~ "treated")
Magnitude similarity (if two candidates have values within 10% of the reported, prefer the one with closer SE)
Context hints from the claim's raw_context field (table number, row label, description)

For every claim, produce a match candidate with a confidence score. Claims below 0.7 confidence get flagged as "UNMATCHED — manual review needed" rather than silently passing.

Phase 4: Tolerance check

For each matched claim, apply the thresholds from replication-protocol.md:

| Kind | Tolerance | Example | |---|---|---| | Integers (N, counts) | Exact | 2,847 must equal 2,847 | | Point estimates | abs(reported - computed) < 0.01 | -1.632 vs -1.628 → diff = 0.004 → PASS | | Standard errors | abs(reported - computed) < 0.05 | 0.584 vs 0.591 → diff = 0.007 → PASS | | P-values | Same significance level | p<0.01 and p<0.01 → PASS; p<0.01 and p=0.03 → FAIL | | Percentages | ±0.1pp | 42.3% vs 42.35% → PASS |

Respect any tolerance overrides the user has written into their replication-protocol.md fork (they may loosen for MC noise or tighten for administrative data).

Phase 5: Report

Write quality_reports/reproducibility_audit_[manuscript-name].md:

# Reproducibility Audit: [Manuscript Title]

**Date:** [YYYY-MM-DD]
**Manuscript:** [path]
**Outputs directory:** [path]
**Tolerance source:** .claude/rules/replication-protocol.md

## Summary

| Status | Count |
|---|---|
| PASS | N |
| FAIL (diff > tolerance) | M |
| UNMATCHED (manual review) | K |
| **Overall verdict** | **PASS / FAIL** |

## PASS (all within tolerance)
| Claim | Reported | Computed | Diff | Tolerance |
|---|---|---|---|---|
| Table2_col3_ATT | -1.632 (0.584) | -1.628 (0.591) | 0.004 / 0.007 | 0.01 / 0.05 |

## FAIL (outside tolerance — BLOCKER)
| Claim | Reported | Computed | Diff | Tolerance | Location in paper |
|---|---|---|---|---|---|

## UNMATCHED (manual review)
| Claim | Raw context | Candidate sources |
|---|---|---|

## Environment
[sessionInfo excerpt]

## Next steps
1. Fix any FAIL rows — either update the manuscript or rerun analysis.
2. Review UNMATCHED rows — add explicit lookup keys or widen the search scope.
3. After zero FAILs, the paper is replication-ready.

Exit behavior

All PASS: exit 0, summary printed.
Any FAIL: exit 1, summary printed to stderr. This makes the skill usable as a /commit pre-commit gate — see replication-protocol.md for the enforcement pattern.
UNMATCHED > 0 (with 0 FAIL): exit 0 with warning — user must manually review.

Cross-references

.claude/rules/replication-protocol.md — the tolerance contract.
.claude/skills/review-r/SKILL.md — catches code-style issues; this skill catches NUMERICAL reproducibility.
.claude/skills/review-paper/SKILL.md — content review; pair with this skill for a full pre-submission audit.

What this skill does NOT do

Re-run your analysis. The skill compares CURRENT outputs against manuscript claims. If the outputs are stale, re-run your pipeline first (the pre-flight phase will warn).
Catch wrong specifications. A regression that compiles cleanly and produces a reproducible -1.632 is reproducible. Whether -1.632 is the RIGHT estimand is a review-paper / domain-reviewer question.
Check external package versions. The sessionInfo.txt capture lets a reviewer see the env; pinning versions is on the user (via renv.lock or a DESCRIPTION file).

pedrohcgs/audit-reproducibility

.claude/skills/audit-reproducibility/SKILL.md

Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.

920 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add pedrohcgs/claude-code-my-workflow audit-reproducibility

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 10:08 AM8.2s1 file scanned

SKILL.md

name:: audit-reproducibility
description:: Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.
argument-hint:: [manuscript path] [outputs-dir] (outputs-dir defaults to scripts/R/_outputs/)
allowed-tools:: ["Read", "Grep", "Glob", "Write", "Bash", "Task"]
effort:: high

Audit Reproducibility

When to use

Before submission. Catches the "I updated the analysis but forgot to update Table 2" bug.
Before releasing a replication package. Verifies the code actually reproduces the paper.
After a major revision. Ensures the paper still matches the latest code.
Quality-gate in /commit. Pair with a pre-commit invocation on manuscript + analysis changes.

Inputs

$0 — path to the manuscript (.tex, .qmd, .md, .pdf). Required.
$1 — path to the outputs directory. Defaults to scripts/R/_outputs/. Can be _targets/objects/, a Stata .do-file log directory, etc.

Workflow

Phase 0: Pre-flight

Read replication-protocol.md for the tolerance thresholds currently in effect.
Verify the outputs directory exists and is non-empty. If empty or stale (older than the manuscript), prompt the user to re-run their pipeline (e.g., Rscript scripts/R/00_run_all.R) before auditing.
Ensure a sessionInfo.txt or equivalent environment capture exists in the outputs dir.

Phase 1: Extract claims from the manuscript

Parse the manuscript for numeric claims. Patterns to match:

Point-estimate + SE: ATT = -1.632 (0.584), $\beta = 0.342$ (0.091), hat{\tau} = 1.28** with starred significance
Table cells: & -1.632$^{***}$ & 0.584 & in LaTeX table environments
Counts: our sample of 2,847 firms, $N = 2{,}847$
Summary stats: mean = 0.423, SD = 0.087
P-values: p < 0.01, $p = 0.003$

Record each claim as a tuple:

{
  claim_id: "Table2_col3_ATT",
  location: "Table 2, Column 3, row 'Treatment'",
  kind: "point_estimate" | "standard_error" | "p_value" | "count" | "percentage",
  reported_value: -1.632,
  uncertainty: 0.584,              # only for point estimates
  significance_stars: 3,            # 0-3 or None
  raw_context: "the ATT estimate of -1.632 (0.584) indicates..."
}

Write the extracted claims to quality_reports/reproducibility_claims_[manuscript-name].json so the user can review the extraction before audit.

Phase 2: Extract results from outputs

Scan $1 for corresponding values. Priority order:

.rds files — readRDS(path)$coef[["treatment"]] style lookups. Can use Rscript -e "saveRDS(summary(readRDS(...)), '/tmp/audit.rds')" to extract.
.tex tables — parse LaTeX table cells directly; match on column headers + row labels.
.csv summary files — pandas/readr parse, key-value lookup.
.out / .log files (Stata, regress output) — regex extraction.
.json — direct key lookup.

Record each extracted result:

{
  source: "scripts/R/_outputs/results.rds",
  lookup_key: "fit_main$coefficients['treated']",
  value: -1.628,
  uncertainty: 0.591,
  p_value: 0.005
}

Phase 3: Match claims to results

Use fuzzy heuristics when exact labels don't match:

Name similarity ("treatment effect" ~ "ATT" ~ "treated")
Magnitude similarity (if two candidates have values within 10% of the reported, prefer the one with closer SE)
Context hints from the claim's raw_context field (table number, row label, description)

For every claim, produce a match candidate with a confidence score. Claims below 0.7 confidence get flagged as "UNMATCHED — manual review needed" rather than silently passing.

Phase 4: Tolerance check

For each matched claim, apply the thresholds from replication-protocol.md:

Respect any tolerance overrides the user has written into their replication-protocol.md fork (they may loosen for MC noise or tighten for administrative data).

Phase 5: Report

Write quality_reports/reproducibility_audit_[manuscript-name].md:

# Reproducibility Audit: [Manuscript Title]

**Date:** [YYYY-MM-DD]
**Manuscript:** [path]
**Outputs directory:** [path]
**Tolerance source:** .claude/rules/replication-protocol.md

## Summary

| Status | Count |
|---|---|
| PASS | N |
| FAIL (diff > tolerance) | M |
| UNMATCHED (manual review) | K |
| **Overall verdict** | **PASS / FAIL** |

## PASS (all within tolerance)
| Claim | Reported | Computed | Diff | Tolerance |
|---|---|---|---|---|
| Table2_col3_ATT | -1.632 (0.584) | -1.628 (0.591) | 0.004 / 0.007 | 0.01 / 0.05 |

## FAIL (outside tolerance — BLOCKER)
| Claim | Reported | Computed | Diff | Tolerance | Location in paper |
|---|---|---|---|---|---|

## UNMATCHED (manual review)
| Claim | Raw context | Candidate sources |
|---|---|---|

## Environment
[sessionInfo excerpt]

## Next steps
1. Fix any FAIL rows — either update the manuscript or rerun analysis.
2. Review UNMATCHED rows — add explicit lookup keys or widen the search scope.
3. After zero FAILs, the paper is replication-ready.

Exit behavior

All PASS: exit 0, summary printed.
Any FAIL: exit 1, summary printed to stderr. This makes the skill usable as a /commit pre-commit gate — see replication-protocol.md for the enforcement pattern.
UNMATCHED > 0 (with 0 FAIL): exit 0 with warning — user must manually review.

Cross-references

.claude/rules/replication-protocol.md — the tolerance contract.
.claude/skills/review-r/SKILL.md — catches code-style issues; this skill catches NUMERICAL reproducibility.
.claude/skills/review-paper/SKILL.md — content review; pair with this skill for a full pre-submission audit.

What this skill does NOT do

Re-run your analysis. The skill compares CURRENT outputs against manuscript claims. If the outputs are stale, re-run your pipeline first (the pre-flight phase will warn).
Catch wrong specifications. A regression that compiles cleanly and produces a reproducible -1.632 is reproducible. Whether -1.632 is the RIGHT estimand is a review-paper / domain-reviewer question.
Check external package versions. The sessionInfo.txt capture lets a reviewer see the env; pinning versions is on the user (via renv.lock or a DESCRIPTION file).

Related Skills

pedrohcgs/commit

testing

VerifiedTrustedCommunity

Stage, commit, push, open a PR, and merge to main. Use ONLY on explicit commit intent — user says "commit", "ship it", "push this", "open a PR", "merge to main", "let's commit this", or prefixes with `/commit`. Do NOT auto-invoke on vague end-of-task phrases ("we're done", "wrap up") — those require explicit confirmation first. Runs the standard commit-PR-merge cycle; never force-pushes or skips hooks.

929SKILL.mdUpdated Apr 15, 2026

pedrohcgs/visual-audit

testing

VerifiedTrustedCommunity

Perform adversarial visual audit of Quarto or Beamer slides checking for overflow, font consistency, box fatigue, and layout issues.

920SKILL.mdUpdated Apr 15, 2026

pedrohcgs/visual-audit

pedrohcgs/validate-bib

testing

VerifiedTrustedCommunity

Validate bibliography entries against citations in all lecture files. Structural checks (missing/unused entries, malformed fields) by default; `--semantic` adds citation-drift detection, DOI verification, and style-consistency checks.

920SKILL.mdUpdated Apr 15, 2026

pedrohcgs/validate-bib

pedrohcgs/translate-to-quarto

testing

VerifiedTrustedCommunity

Translate Beamer LaTeX to Quarto RevealJS. Multi-phase workflow with TikZ extraction and QA.

920SKILL.mdUpdated Apr 15, 2026

pedrohcgs/translate-to-quarto

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pedrohcgs/claude-code-my-workflow.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-my-workflow/.claude/skills/audit-reproducibility ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pedrohcgs/claude-code-my-workflow

920 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT