skills/index-manager/SKILL.md
Manages MPEP index lifecycle including downloads, building, maintenance, and optimization.
npx skillsauth add robthepcguy/claude-patent-creator index-managerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert system for managing MPEP search index lifecycle: PDF downloads, index building, maintenance, updates, optimization.
FOR CLAUDE: All dependencies installed, system operational.
Building/rebuilding MPEP index, corruption/missing files, optimization, adding content, troubleshooting.
PDFs Not Present -> Download (2-5 min, 500MB)
-> Extract & Parse (500MB data)
-> Generate Embeddings (5-10 min GPU, 35-65 min CPU)
-> Build FAISS + BM25 Indexes
-> Index Ready (mcp_server/index/)
-> Maintenance (Verify -> Optimize -> Update)
Check Status:
ls pdfs/ # Should show mpep-*.pdf, consolidated_laws.pdf, consolidated_rules.pdf
Download PDFs:
patent-creator download-mpep
# Or: python install.py (Select "Download MPEP PDFs")
Verify Integrity:
python -c "
import fitz
from pathlib import Path
for pdf in Path('pdfs').glob('*.pdf'):
try:
doc = fitz.open(pdf)
print(f'[OK] {pdf.name}: {len(doc)} pages')
doc.close()
except Exception as e:
print(f'[X] {pdf.name}: ERROR - {e}')
"
patent-creator rebuild-index
# Or: python mcp_server/server.py --rebuild-index
Timeline:
Total: 5-15 min (GPU) or 35-65 min (CPU)
Custom Build:
from mcp_server.mpep_search import MPEPIndex
index = MPEPIndex(use_hyde=False)
index.build_index(
chunk_size=500,
overlap=50,
batch_size=32 # Reduce to 16/8 if OOM
)
# Check files
ls -lh mcp_server/index/
# Expected: mpep_index.faiss (~150MB), mpep_metadata.json (~80MB), mpep_bm25.pkl (~60MB)
# Verify health
patent-creator health
# Should show: [OK] MPEP Index: Ready (12,543 chunks)
# Manual test
python -c "
from mcp_server.mpep_search import MPEPIndex
index = MPEPIndex()
print(f'Chunks: {len(index.chunks)}')
results = index.search('claim definiteness', top_k=3)
print(f'Search results: {len(results)}')
"
When to Rebuild:
Rebuild Process:
# Backup (optional)
cp -r mcp_server/index mcp_server/index_backup_$(date +%Y%m%d)
# Rebuild
patent-creator rebuild-index
# Verify
patent-creator health
# Remove backup if successful
rm -rf mcp_server/index_backup_*
# Download new PDF
wget https://www.uspto.gov/web/offices/pac/mpep/mpep-2900.pdf -O pdfs/mpep-2900.pdf
# Rebuild (includes new section)
patent-creator rebuild-index
Note: Incremental updates not supported. Full rebuild required.
| Command | Purpose |
|---------|---------|
| patent-creator download-mpep | Download MPEP PDFs |
| patent-creator rebuild-index | Build/rebuild search index |
| patent-creator health | Check index health |
| ls -lh mcp_server/index/ | View index files |
Best Practices:
databases
Systematic 7-step methodology for comprehensive patent prior art searches and patentability assessments using BigQuery and CPC classification
tools
Search 100M+ patents via the MCP server's BigQuery tools. No standalone scripts; everything goes through the MCP tools registered by the patent-creator server.
development
Fast, cloud-based patent searching across 100 million+ worldwide patents using Google BigQuery - keyword search, CPC classification, patent details retrieval
tools
Diagnoses and resolves MCP server registration failures, GPU detection, BigQuery authentication, index build failures, import errors, search quality issues, and performance problems.