ocr-super-surya/SKILL.md
GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract.
npx skillsauth add aktsmm/agent-skills ocr-super-suryaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
GPU-optimized OCR using Surya.
| Feature | Description | | ------------- | --------------------------------------- | | Accuracy | 2x better than Tesseract (0.97 vs 0.88) | | GPU | PyTorch-based, CUDA optimized | | Languages | 90+ including CJK | | Layout | Document layout, table recognition |
# 1. Check GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
# 2. Install (with CUDA if GPU available)
pip install surya-ocr
# If CUDA=False but you have GPU, reinstall PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
OneDrive 配下のフォルダでは uv のハードリンクが失敗するため、以下の手順を使う:
# キャッシュをOneDrive外に設定
$env:UV_CACHE_DIR = "C:\Temp\uv_cache"
# 仮想環境をOneDrive外に作成
uv venv C:\Users\<USERNAME>\ocr_env --python 3.12
# surya-ocrをインストール(link-mode=copy でハードリンクを回避)
uv pip install surya-ocr --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy
# transformers 5.x は非互換 → 4.x を強制
uv pip install "transformers<5.0" --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy
# CLI
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt
# Or use surya directly
surya_ocr image.png --output_dir ./results
import sys, io
# Windows CP932エンコードエラー対策
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor
image = Image.open("document.png").convert("RGB")
found_pred = FoundationPredictor()
rec_pred = RecognitionPredictor(found_pred) # v0.13+ : FoundationPredictor必須
det_pred = DetectionPredictor()
# v0.17.x以降: langs引数は廃止 → 渡さないこと
for page in rec_pred([image], det_predictor=det_pred):
for line in page.text_lines:
if line.text.strip():
print(line.text)
API変更履歴 (v0.17.x):
RecognitionPredictor(foundation_predictor)-FoundationPredictorが必須引数に変更__call__()からlangs引数が削除(自動検出に変更)
| Variable | Default | Description |
| ------------------------ | ------- | --------------------- |
| RECOGNITION_BATCH_SIZE | 512 | Reduce for lower VRAM |
| DETECTOR_BATCH_SIZE | 36 | Reduce if OOM |
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png
| Script | Description |
| ----------------------- | ----------------------------------------- |
| scripts/ocr_helper.py | Helper with OOM auto-retry, batch support |
| エラー | 原因 | 対処 |
| ------ | ---- | ---- |
| RecognitionPredictor.__init__() missing 1 required positional argument: 'foundation_predictor' | v0.13+ でAPIが変更 | found_pred = FoundationPredictor() を作成して引数に渡す |
| TypeError: __call__() got an unexpected keyword argument 'langs' | v0.17.x で langs 引数廃止 | langs 引数を削除する |
| AttributeError: 'SuryaDecoderConfig' object has no attribute 'pad_token_id' | transformers 5.x との非互換 | pip install "transformers<5.0" でダウングレード |
| failed to hardlink file ... OneDrive (uv, os error 396) | OneDrive のハードリンク制限 | --link-mode=copy を付けてインストール+UV_CACHE_DIR をOneDrive外に設定 |
| UnicodeEncodeError: 'cp932' codec can't encode character | Windows のCP932デフォルトエンコード | sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') を先頭に追加 |
development
Generate draw.io editable diagrams (.drawio, .drawio.svg) from text, images, or Excel. Orchestrates 3-agent workflow (Analysis → Manifest → SVG generation) with quality gates. Use when creating architecture diagrams, flowcharts, sequence diagrams, or converting existing images to editable format. Supports Azure/AWS cloud icons.
data-ai
Set up a reusable book-writing workspace with AI agents, instructions, prompts, and scripts. Use when creating a new book or technical writing project, bootstrapping a manuscript repository, or preparing a Markdown + Re:VIEW + PDF workflow. Triggers on "book writing workspace", "technical book project", "執筆ワークスペース", "book manuscript repo", and "Re:VIEW workspace".
documentation
Create, review, and update Prompt and agents and workflows. Covers 5 workflow patterns, agent delegation, Handoffs, Context Engineering. Use for any .agent.md file work or multi-agent system design. Triggers on 'agent workflow', 'create agent', 'ワークフロー設計'.
tools
Guide for creating VS Code extensions from scratch to Marketplace publication. Use when: (1) Creating a new VS Code extension, (2) Adding commands, keybindings, or settings to an extension, (3) Publishing to VS Code Marketplace, (4) Troubleshooting extension activation or packaging issues, (5) Building TreeView or Webview UI, (6) Setting up extension tests.