.claude/skills/process-pdf/SKILL.md
Process a PDF through the full extraction → synthesis → spec → review pipeline. $ARGUMENTS is the strategy_id. The PDF must already be at strategies/$ARGUMENTS/input/source.pdf --- ## Stage 1: Extract (two-pass) Run: ```bash ./tools/run-extract.sh python tools/ingest.py strategies/$ARGUMENTS/input/source.pdf --strategy-id $ARGUMENTS ``` This produces: - `extract/raw.md` — raw OCR/layout extraction (pass 1) - `artifacts/01-extraction.md` — tagged structured extraction (pass 2) Verify both f
npx skillsauth add rockandrolla13/quant-research-workflow .claude/skills/process-pdfInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process a PDF through the full extraction → synthesis → spec → review pipeline.
$ARGUMENTS is the strategy_id. The PDF must already be at strategies/$ARGUMENTS/input/source.pdf
Run:
./tools/run-extract.sh python tools/ingest.py strategies/$ARGUMENTS/input/source.pdf --strategy-id $ARGUMENTS
This produces:
extract/raw.md — raw OCR/layout extraction (pass 1)artifacts/01-extraction.md — tagged structured extraction (pass 2)Verify both files exist. If artifacts/01-extraction.md is missing (tagging failed),
read extract/raw.md instead and proceed — but warn the user that tags are missing.
Read artifacts/01-extraction.md (or extract/raw.md as fallback).
Verify it contains tagged elements: grep for [SIG:, [EQ:, [PARAM:.
If fewer than 5 tags total, the extraction quality is low — warn the user.
Update: .pipeline_state.yaml → stage: extract, status: complete
Read the templates:
templates/synthesis_template.md — for the required sections and formatRead artifacts/01-extraction.md in full.
Write synth/strategy.md following the synthesis template. Required sections:
Write synth/formula.md following the synthesis template. Required sections:
Cross-reference: every [EQ:n] tag from extraction must appear in formula.md. Every [SIG:n] tag must map to a component in strategy.md Signal Decomposition.
Also write artifacts/02-synthesis.md as a combined artifact:
# Synthesis: $ARGUMENTS
## Core Claim
<from strategy.md>
## Method Decomposition
<from strategy.md Signal Decomposition>
## Dependency Graph
<from strategy.md>
## Implementation Decisions
<from strategy.md>
## What's Missing
<from strategy.md>
Update: .pipeline_state.yaml → stage: synth, status: complete
Read the templates:
templates/spec_template.md — for spec.yaml structure and rulesRead synth/strategy.md and synth/formula.md.
Write spec/SPEC.md (human-readable implementation spec).
Write spec/spec.yaml (machine-readable, Codex builds from this).
Key rules:
provenance field: paper, inferred, or design_choiceformula_latex must match an equation in formula.mdholdout_touched: falseAlso write artifacts/03-spec.md:
# Spec: $ARGUMENTS
## Modules
<module table from spec.yaml>
## Data Flow
<mermaid diagram: input → signals → portfolio → backtest → validation>
## Open Questions
<any design_choice items that need user confirmation>
## Provenance Summary
- From paper: N functions
- Inferred: N functions
- Design choice: N functions
Update: .pipeline_state.yaml → stage: spec, status: complete
Run:
./tools/run-extract.sh python tools/validate_spec.py strategies/$ARGUMENTS/spec/spec.yaml
If it fails, read the error list, fix spec.yaml, re-run. Loop until exit 0. Do NOT present the spec to the user until the gate passes.
Run:
./tools/run-extract.sh python tools/call_gemini.py --mode review \
--spec strategies/$ARGUMENTS/spec/spec.yaml \
--formula strategies/$ARGUMENTS/synth/formula.md \
--output strategies/$ARGUMENTS/spec/review.md
Read spec/review.md.
If any FAIL items: fix spec.yaml, re-run validate_spec.py, re-run Gemini review. Loop.
If PASS WITH WARNINGS: fix straightforward warnings, or note them for the user.
Show the user:
Ask: "Ready to lock the spec? After lock, Codex builds from this."
Stop here. User must approve. Then run:
git tag spec-$ARGUMENTS-v1.0
tools
Review the spec for strategy $ARGUMENTS by calling Gemini, then fix any issues. ## Step 1: Validate gate first ```bash .venv-extract/bin/python tools/validate_spec.py strategies/$ARGUMENTS/spec/spec.yaml ``` If this fails, fix spec.yaml until it passes. Do not call Gemini on an invalid spec. ## Step 2: Call Gemini ```bash .venv-extract/bin/python tools/call_gemini.py --mode review \ --spec strategies/$ARGUMENTS/spec/spec.yaml \ --formula strategies/$ARGUMENTS/synth/formula.md \ --outp
development
Review Codex's implementation of strategy $ARGUMENTS against the locked spec. ## Step 1: Load the contract Read strategies/$ARGUMENTS/spec/spec.yaml — this is the source of truth. ## Step 2: Check function signatures For each module in spec.yaml modules[]: - Open strategies/$ARGUMENTS/repo/src/$ARGUMENTS/{module.filename} - Verify every function signature matches spec.yaml EXACTLY - Flag any missing functions, wrong argument names, wrong types, wrong defaults ## Step 3: Check test coverage
tools
Show the pipeline status for strategy $ARGUMENTS. ```bash .venv-extract/bin/python tools/update_state.py strategies/$ARGUMENTS --status ``` If the status script doesn't exist yet, check manually: 1. **extract:** Does strategies/$ARGUMENTS/extract/raw.md exist and have content? 2. **synth:** Do strategies/$ARGUMENTS/synth/strategy.md and formula.md exist? 3. **spec:** Does strategies/$ARGUMENTS/spec/spec.yaml exist? Run validate_spec.py on it. 4. **review:** Does strategies/$ARGUMENTS/spec/rev
tools
Scaffold a new strategy directory for $ARGUMENTS. If tools/init_strategy.sh exists: ```bash bash tools/init_strategy.sh $ARGUMENTS ``` Otherwise, create the structure manually: ```bash mkdir -p strategies/$ARGUMENTS/{input,extract/images,artifacts,synth,spec,repo/src/$ARGUMENTS,repo/tests,repo/notebooks,tex} touch strategies/$ARGUMENTS/repo/src/$ARGUMENTS/__init__.py touch strategies/$ARGUMENTS/repo/tests/__init__.py ``` Then create strategies/$ARGUMENTS/input/meta.json: ```json { "strategy