ppocrv5/SKILL.md
Use this skill when users need to extract text from images, PDFs, or documents. Supports URLs and local files, with adaptive quality modes. Returns structured JSON containing recognized text, confidence scores, and quality metrics.
npx skillsauth add atxinsky/skills ppocrv5Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Invoke this skill in the following situations:
Do not use this skill in the following situations:
Identify the input source:
--file-url parameter--file-path parameter--file-pathExecute OCR:
python scripts/ocr_caller.py --file-url "URL provided by user" --pretty
Or for local files:
python scripts/ocr_caller.py --file-path "file path" --pretty
Parse JSON response:
ok field: true means success, false means errorresult.full_text contains all recognized textquality.quality_score indicates recognition confidence (0.0-1.0)ok is false, display error.messagePresent results to user:
result.pages[].items[] to get line-by-line dataAlways use --mode auto (default) unless the user explicitly requests otherwise:
| User Request | Use Mode | Command Flag |
|--------------|----------|--------------|
| Default/unspecified | Auto (adaptive) | --mode auto (or omit) |
| "Quick recognition" / "fast" | Fast | --mode fast |
| "High precision" / "accurate" | Quality | --mode quality |
Auto mode (recommended): Automatically tries 1-3 times, progressively increasing correction levels, returning the best result.
Mode 1: Simple URL OCR
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Mode 2: Local File OCR
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Mode 3: Fast Mode for Clear Images
python scripts/ocr_caller.py --file-url "URL" --mode fast --pretty
The script outputs JSON structure as follows:
{
"ok": true,
"result": {
"full_text": "All recognized text here...",
"pages": [...]
},
"quality": {
"quality_score": 0.85,
"text_items": 42
}
}
Key fields to extract:
result.full_text: Complete text for the userquality.quality_score: 0.72+ is good, <0.5 is poorerror.message: If ok is false, provides error descriptionIf the user has not configured API credentials, run:
python scripts/configure.py
This will prompt for:
API_URL: Paddle AI Studio endpointPADDLE_OCR_TOKEN: User's access tokenConfiguration is saved to the .env file, only needs to be configured once.
Configuration missing:
Error: API_URL not configured
→ Run python scripts/configure.py
Authentication failed (403):
error_code: PROVIDER_AUTH_ERROR
→ Token is invalid, reconfigure with correct credentials
Quota exceeded (429):
error_code: PROVIDER_QUOTA_EXCEEDED
→ Daily API quota exhausted, inform user to wait or upgrade
No text detected:
quality_score: 0.0, text_items: 0
→ Image may be blank, corrupted, or contain no text
When presenting results to users, consider the quality score:
| Quality Score | Explanation to User | |---------------|---------------------| | 0.90 - 1.00 | Excellent recognition quality | | 0.72 - 0.89 | Good recognition quality (default target) | | 0.50 - 0.71 | Fair recognition quality, may have some errors | | 0.00 - 0.49 | Poor recognition quality or no text detected |
If quality is below 0.5, mention to the user and suggest:
--mode quality for better accuracyUse only when explicitly requested by the user:
Include raw provider response (for debugging):
python scripts/ocr_caller.py --file-url "URL" --return-raw-provider
Request visualization (show detection regions):
python scripts/ocr_caller.py --file-url "URL" --visualize
Adjust auto mode parameters:
python scripts/ocr_caller.py --file-url "URL" \
--max-attempts 2 \
--quality-target 0.80 \
--budget-ms 20000
For in-depth understanding of the OCR system, refer to:
references/agent_policy.md - Auto mode strategy and quality scoringreferences/normalized_schema.md - Complete output schema specificationreferences/provider_api.md - Provider API contract detailsLoad these reference documents into context when:
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and API connectivity.
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment
development
Use when you have a spec or requirements for a multi-step task, before touching code
documentation
Create detailed implementation plan with bite-sized tasks