skills/source-verify/SKILL.md
This skill should be used when the user asks to 'verify sources', 'check citations', 'fact-check footnotes', 'verify quotes', 'source check', 'are my cites real', 'check for hallucinated citations', or needs to verify that citations in a legal/academic manuscript actually exist and are accurate. Also use when the user wants to check that quoted text actually appears in the cited source.
npx skillsauth add edwinhu/workflows source-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify that citations in a manuscript are real, accurate, and that quoted text actually appears in the source. Operates as a domain-specific audit-fix-loop: extract citations, run checks, score, fix, re-check. (See the audit-fix-loop skill for the canonical doctrine — auditor≠fixer, substrate-gate, anti-grind; this skill's substrate is citation/quote resolution.)
Announce: "Using source-verify to check citations against Paperpile and source documents."
┌──────────────────────────────────────────────────────────┐
│ CHECK 1: EXISTENCE (mechanical) │
│ Does this cited work exist in paperpile.bib? │
│ → grep paperpile.bib for author + title + year │
│ → If not found: flag as UNVERIFIED (may still exist) │
└──────────────────────────────────────────────────────────┘
│
Exists in bib?
▼
┌──────────────────────────────────────────────────────────┐
│ CHECK 2: ACCURACY (mechanical) │
│ Are the citation fields correct? │
│ → Compare volume, issue, pages, year against bibtex │
│ → Flag any mismatches as FIELD_ERROR │
└──────────────────────────────────────────────────────────┘
│
Fields correct?
▼
┌──────────────────────────────────────────────────────────┐
│ CHECK 3: QUOTE VERIFICATION (outsourced to RAG) │
│ Does the quoted text appear in the source? │
│ → readwise chat: "Verify this exact quote from [Author] │
│ → rga against downloaded PDF (fallback) │
│ → NLM generate-chat against notebook (fallback) │
│ → Flag QUOTE_NOT_FOUND or QUOTE_MISMATCH │
└──────────────────────────────────────────────────────────┘
│
Quotes verified?
▼
┌──────────────────────────────────────────────────────────┐
│ CHECK 4: CLAIM GROUNDING (outsourced to RAG) │
│ Does the cited source actually support the claim? │
│ → readwise chat or NLM generate-chat │
│ → These systems answer ONLY from source text │
│ → Flag UNSUPPORTED or CONTRADICTED │
└──────────────────────────────────────────────────────────┘
Checks 1-2 run on every invocation (fast, mechanical). Check 3 runs when the footnote contains a direct quote. Check 4 runs only when the user explicitly requests claim grounding.
The key insight: verification must use external ground truth, never the agent's own memory. Two systems provide this:
┌─────────────────────────────────────────────────────────┐
│ VERIFIER HIERARCHY — QUOTES (try in order) │
│ │
│ 1. Readwise highlights — User highlighted the passage. │
│ readwise-custom highlights --search "quote fragment" │
│ If found: verified against actual source text. │
│ Fastest — no download, no LLM. │
│ │
│ 2. rga (local) — Download PDF from Drive, search. │
│ Deterministic text search inside PDFs. │
│ No pdftotext needed — rga extracts text internally. │
│ │
│ 3. NLM chat — Add paper to a verification │
│ notebook, then query. │
│ "Find this passage in [source]: '…'" │
│ Grounded in NLM's ingested sources. │
│ Best for: OCR issues, paraphrased quotes. │
│ │
│ NEVER: Agent's own memory/training data. │
│ That is the hallucination source, not a verifier. │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ VERIFIER — CLAIM GROUNDING │
│ │
│ NLM chat only — requires semantic understanding. │
│ Readwise highlights are too fragmentary for claims. │
│ rga is too literal for "does source support claim?" │
└─────────────────────────────────────────────────────────┘
Why Readwise first for quotes: The user highlights important passages while reading. If a quote in the manuscript matches a Readwise highlight, it was captured directly from the source — strongest possible verification, zero cost.
Why NLM for claims: Claim grounding requires understanding what the source argues, not just matching strings. NLM chat answers from ingested full text with semantic comprehension.
| Source Type | Checks Available | Ground Truth |
|-------------|-----------------|-------------|
| Journal articles | 1, 2, 3, 4 | paperpile.bib + Drive PDFs + Readwise highlights |
| Books / book chapters | 1, 2, 3, 4 | paperpile.bib + Drive PDFs |
| Working papers | 1, 2, 3, 4 | paperpile.bib + Drive PDFs |
| SEC releases / regulations | 1 (partial) | May be in bib as MISC entries |
| Federal case citations | 1 (existence only) | WRDS fjc_litigation.civil (13.5M cases) + audit_corp_legal |
| Statutes | NOT COVERED | No ground truth database wired up |
The FJC Integrated Database (fjc_litigation.civil) contains all federal civil cases with plaintiff/defendant names, docket numbers, filing dates, districts, and nature-of-suit codes. Use it to verify that cited cases actually exist:
# Connect via SSH tunnel to WRDS
ssh wrds "echo \"SELECT plaintiff, defendant, docket, district, filedate
FROM fjc_litigation.civil
WHERE plaintiff ILIKE '%smith%' AND defendant ILIKE '%jones%'
AND filedate BETWEEN '2018-01-01' AND '2020-12-31'
LIMIT 10;\" | psql -h wrds-pgdata.wharton.upenn.edu -p 9737 -d wrds"
Matching strategy for case cites:
fjc_litigation.civil with ILIKE on both fieldsfiledatenos = 850 (nature of suit: securities/commodities)Audit Analytics (audit_corp_legal.f14_lit_legal_case) provides corporate legal cases with settlement amounts, exposure dates, and docket numbers — useful for verifying securities litigation specifically.
Limitations:
casename field is sparsely populated — match on plaintiff/defendant insteadStatutes are not yet verifiable — no structured database wired up. Flag as SKIPPED_NO_GROUND_TRUTH.
NEVER skip Checks 1-2 to jump straight to LLM-based verification.
Mechanical checks against paperpile.bib are deterministic and free. They catch the most common hallucinations (invented papers, wrong volume/pages) without any LLM judgment. Running LLM checks without running mechanical checks first is wasting expensive calls on problems a grep would catch.
Skipping the paperpile.bib check is NOT HELPFUL — the user publishes with unverified citations that may be hallucinated. </EXTREMELY-IMPORTANT>
The bibtex file lives on Google Drive and must be downloaded fresh each run:
# Download paperpile.bib from Drive
gws drive files get --account [email protected] \
--params '{"fileId": "1yxibJLr1-kF_gcf3UA6QlU5ulXgR50kX", "alt": "media"}' \
-o /tmp/paperpile.bib
Always download fresh — the user may have added new papers since the last run.
If the manuscript is DOCX, extract footnotes to structured text first. Use the bluebook-audit extraction infrastructure or python-docx:
# Quick extraction via python-docx
uv run --with python-docx python3 -c "
import docx, json, sys
doc = docx.Document(sys.argv[1])
fns = []
for i, fn in enumerate(doc.part.element.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}footnote')):
fn_id = fn.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id')
if fn_id and int(fn_id) > 0:
text = ' '.join(p.text or '' for p in fn.iter('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t'))
fns.append({'id': int(fn_id), 'text': text.strip()})
json.dump(fns, sys.stdout, indent=2)
" manuscript.docx > /tmp/footnotes.json
For plain text or markdown manuscripts, extract lines that look like footnotes (numbered references at the bottom or inline citations).
For each footnote, extract the cited author(s) and title, then search paperpile.bib:
# Search by author surname
rg -i "author.*Egan" /tmp/paperpile.bib
# Search by title fragment
rg -i "conflicting interests" /tmp/paperpile.bib
# Search by bibtex key pattern
rg "Egan2022" /tmp/paperpile.bib
Footnotes use Bluebook-abbreviated journal names; bibtex uses different abbreviations. Match on author surname + year first, then confirm with title keywords.
| Footnote Says | Bibtex Has | Match On |
|--------------|-----------|----------|
| Egan, Ge & Tang | author = {Egan, Mark and Ge, Shan and Tang, Johnny} | First author surname + year |
| 35 Rev. Fin. Stud. 5334 | journaltitle = {Rev. Fin. Stud.}, volume = {35}, pages = {5334--5386} | Volume + first page |
| (2022) | date = {2022-08-24} | Year extracted from date |
| Result | Classification | Severity |
|--------|---------------|----------|
| Found in bib, fields match | VERIFIED | — |
| Found in bib, fields mismatch | FIELD_ERROR | HIGH |
| Not found in bib, but is a federal case | Check WRDS fjc_litigation.civil | — |
| Not found in bib, but is a statute | SKIPPED_NO_GROUND_TRUTH | INFO |
| Not found in bib, is a paper/article | UNVERIFIED | CRITICAL |
For each VERIFIED citation, compare footnote fields against bibtex:
| Footnote Field | Bibtex Field | Common Errors |
|---------------|-------------|---------------|
| Volume number | volume | Transposed digits |
| Issue/number | issue | Often omitted in footnote (OK) |
| Starting page | pages (before --) | Wrong page, off by one |
| Year | date (extract year) | Wrong year |
| Journal name | journaltitle | Abbreviation mismatch (check, don't auto-flag) |
Journal abbreviation mismatches are tricky — Bluebook and bibtex use different abbreviation conventions. Flag only when the journal name is clearly wrong (e.g., completely different journal), not when it's a different valid abbreviation of the same journal.
For footnotes that contain direct quotes (text in quotation marks attributed to a source):
If the user highlighted the quoted passage in Readwise, it's already verified against the actual source text:
# Search for the quote text in highlights
readwise-custom highlights --search "quoted text fragment" --limit 10
# Or vector search with author filter
readwise readwise-search-highlights --vector-search-term "quoted text fragment" \
--full-text-queries '[{"field_name": "document_author", "search_term": "Egan"}]'
A matching highlight confirms the quote exists in the source — Readwise captured it directly from the original document. Check that the highlight comes from the correct source (match author/title).
If the quote isn't in Readwise highlights, download the PDF from Drive and search:
# Find the PDF using the bibtex `file` field
# e.g., file = {All Papers/E/Egan et al. 2022 - Conflicting Interests...pdf}
gws drive files list --account [email protected] \
--params '{"q": "name contains \"Egan et al. 2022 - Conflicting\" and mimeType = \"application/pdf\"", "fields": "files(id,name)", "pageSize": 1}'
# Download by file ID
gws drive files get --account [email protected] \
--params '{"fileId": "<ID>", "alt": "media"}' -o /tmp/source.pdf
# Search for the quote (rga extracts PDF text internally — no pdftotext needed)
rga "the exact quoted text" /tmp/source.pdf
If rga finds it, the quote is QUOTE_VERIFIED — deterministic, no LLM needed.
If rga doesn't find an exact match (OCR issues, scanned PDF, or the quote is slightly paraphrased), add the paper to a verification NLM notebook and ask:
# Create a verification notebook (once per project)
nlm create "Source Verification"
# Add the paper
nlm add <notebook-id> /tmp/source.pdf
# Ask NLM to find the quote
nlm generate-chat <notebook-id> "Find this exact passage in the source: '[quoted text]'. Does it appear verbatim? If the wording differs, show the actual text from the source."
NLM chat is grounded in the ingested PDF — it can handle OCR artifacts and minor wording differences that trip up exact string matching.
| Result | Classification | Severity |
|--------|---------------|----------|
| Found in Readwise highlights | QUOTE_VERIFIED | — |
| Exact match found via rga | QUOTE_VERIFIED | — |
| NLM confirms match (minor OCR/wording diffs) | QUOTE_VERIFIED | — |
| NLM finds similar but different text | QUOTE_MISMATCH | MEDIUM |
| Not found in any tier | QUOTE_NOT_FOUND | CRITICAL |
| Source PDF not on Drive | QUOTE_UNCHECKED | INFO |
Only run when user explicitly requests deep verification. This checks whether the cited source actually supports the claim being made in the text (not just that the citation exists).
Claim grounding requires semantic understanding — NLM chat is the right tool because it answers only from ingested sources.
nlm generate-chat <notebook-id> "The manuscript claims: '[claim from text]' and cites [Author (Year)]. Does the source support this claim? Answer: SUPPORTED, PARTIALLY_SUPPORTED, UNSUPPORTED, or CONTRADICTED. Provide the relevant passage from the source."
Requires: An NLM notebook with the cited sources loaded. If not already set up:
nlm create "Source Verification — [Project]"nlm add <notebook-id> /tmp/source.pdfnlm research "[topic]" --notebook <notebook-id> --source driveCross-source claims: When a claim synthesizes multiple papers, add all cited sources to the same notebook. NLM can then cross-reference them in a single query. For synthesized claims that draw on web-accessible sources (news, reports, public data), the deep-research skill is available as a supplementary check -- use it after NLM grounding to verify claims against broader web evidence. NLM remains the primary tool for source-specific grounding.
| Result | Classification | Severity |
|--------|---------------|----------|
| Source supports claim | CLAIM_SUPPORTED | — |
| Source partially supports | CLAIM_PARTIAL | MEDIUM |
| Source doesn't address claim | CLAIM_UNSUPPORTED | HIGH |
| Source contradicts claim | CLAIM_CONTRADICTED | CRITICAL |
Score = verified items / total checkable items, scaled to 0-10.
checkable = total_footnotes - SKIPPED_NO_GROUND_TRUTH
verified = VERIFIED + QUOTE_VERIFIED + CLAIM_SUPPORTED
score = (verified / checkable) * 10
Items with any finding (FIELD_ERROR, QUOTE_NOT_FOUND, UNVERIFIED, etc.) count against the score. QUOTE_UNCHECKED (source PDF unavailable) counts as checkable but not verified — it's an unresolved question, not a pass.
Default threshold: 9.5/10 (95% of checkable citations verified).
This skill is a domain-specific scorer for the audit-fix pattern. Its substrate is deterministic and binary — every citation either resolves against the bib or it doesn't; every quote either verifies against its source or it doesn't. So gate on the substrate directly, not a 0-10 composite (which here is just a noisy restatement of the same counts). The evaluator reads the finding counts from the transcript and refires until they hit zero.
/goal Source-verify [manuscript] is complete when .planning/SCORES.md shows ALL citations
resolved against paperpile.bib, ALL quotes verified via Readwise/rga/NLM, and ZERO UNVERIFIED
or QUOTE_NOT_FOUND findings remaining. (The 0-10 score is advisory — the zero-count substrate is
the gate.) Stop after 5 turns.
Each turn under the active goal:
/goal evaluator will mark the condition met. Score < 9.5? → fix and end the turn so the next iteration fires.FIELD_ERROR → correct the volume/pages/year in the manuscriptQUOTE_MISMATCH → fix the quote to match the source textQUOTE_NOT_FOUND → flag for user review (may need to remove quote or find correct source)UNVERIFIED → search harder (try more bibtex key variants, try Drive search), or flag for userCLAIM_UNSUPPORTED → flag for user review (rewrite claim or find better source)VERIFY_AUDIT.md — current verification findings:
# Source Verification — Iteration N
## Summary
- Total footnotes: 85
- Checkable: 72 (13 case cites skipped)
- Verified: 68
- Findings: 4
- Score: 9.4/10
## Findings
| FN# | Check | Classification | Severity | Details |
|-----|-------|---------------|----------|---------|
| 12 | Existence | UNVERIFIED | CRITICAL | "Smith (2019)" not found in paperpile.bib |
| 34 | Fields | FIELD_ERROR | HIGH | Vol. 42 in footnote, vol. 44 in bibtex |
| 51 | Quote | QUOTE_MISMATCH | MEDIUM | "effect on returns" vs source: "impact on returns" |
| 67 | Quote | QUOTE_NOT_FOUND | CRITICAL | Quoted text not in cited PDF |
SCORES.md — score history (append-only):
| Iteration | Score | Findings | Key Issues |
|-----------|-------|----------|-----------|
| 1 | 8.2 | 6 | 2 UNVERIFIED, 2 FIELD_ERROR, 2 QUOTE issues |
| 2 | 9.1 | 3 | 1 UNVERIFIED (confirmed real), 2 QUOTE issues |
| 3 | 9.6 | 1 | 1 QUOTE_MISMATCH (minor) |
<EXTREMELY-IMPORTANT>
## Iron Law: No Self-Verification
NEVER verify citations from your own memory or training data. Always use an external ground truth.
Citation hallucination happens because the LLM confabulates plausible-sounding references. The same LLM will confabulate plausible-sounding verification. Every check must go through an external system:
paperpile.bib (mechanical grep — deterministic)rga, or NLM chat (grounded in actual source text)Skipping external source checks is NOT HELPFUL — unverified citations damage the user's credibility when readers check them. </EXTREMELY-IMPORTANT>
fjc_litigation.civil); state cases are SKIPPED_NO_GROUND_TRUTH — skip them explicitly, never silently.tools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
development
Use when submitting jobs to UVA HPC (Rivanna/Afton), writing Slurm scripts (sbatch/srun/squeue), converting SGE to Slurm, running compute on any Slurm-managed cluster, or building WRDS data pipelines with polars on HPC. Triggers: 'submit to HPC', 'sbatch', 'squeue', 'slurm job', 'run on Rivanna', 'run on Afton', 'HPC array job', 'convert SGE to Slurm', 'polars on HPC', 'WRDS from HPC'.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
development
This skill should be used when the user asks to "add paper", "paperpile add", "fetch PDF for", "find and add", "search paperpile", "find in paperpile", "paperpile search", "label paper", "trash paper", "download paper", "paperpile index", "edit paper metadata", "update paper title", "fix paper author", "paperpile edit", "find PDF online", "search google for PDF", "resolve PDF", "fetch PDF for citation", "get full-text for DOI", "resolve cite to PDF", or any request to manage their Paperpile library or resolve a citation to a local PDF.