skills/casedev/ocr/SKILL.md
Processes documents through case.dev OCR for text and table extraction. Supports PDF and image files up to 500MB with page-level and word-level output. Use when the user mentions "OCR", "text extraction", "scan document", "digitize", "extract text from PDF", or needs word-level positional data from documents.
npx skillsauth add casemark/skills ocrInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production-grade document OCR with table extraction and word-level positional data. Processes PDFs and images (PNG, JPG, TIFF, BMP, WEBP) up to 500MB.
Requires the casedev CLI. See setup skill for installation and auth.
casedev ocr process --document-url "https://example.com/contract.pdf" --json
Flags: --document-url (required), --document-id (optional tag), --engine (override).
Returns a job ID and initial status.
casedev ocr status JOB_ID --json
Statuses: queued -> processing -> completed or failed.
casedev ocr watch JOB_ID --json
Flags: --interval (default: 3s), --timeout (default: 900s).
casedev ocr words --vault VAULT_ID --object OBJECT_ID --json
Requires the document to be in a vault with completed OCR ingestion.
Flags: --page (specific page), --word-start, --word-end (index range).
Returns per-page word arrays with text, word index, and confidence scores.
# 1. Upload (triggers automatic ingestion + OCR)
casedev vault object upload ./scanned-contract.pdf --vault VAULT_ID --json
# 2. Check ingestion status
casedev vault object list --vault VAULT_ID --json
# 3. Get word-level data
casedev ocr words --vault VAULT_ID --object OBJECT_ID --json
casedev ocr process --document-url "https://storage.example.com/doc.pdf" --json
casedev ocr watch JOB_ID --json
"Invalid file type for OCR": Only PDFs and images supported. Check content type with casedev vault object list.
Job stuck in "processing": Increase timeout with --timeout 1800. Large documents (100+ pages) take longer.
"OCR job failed": Document may be corrupted or unsupported. Re-upload and retry.
development
name: automated-contract-summary language: en description: Generates structured executive summaries of contracts using ML — captures key terms, party obligations, risk allocations, and compliance requirements in a standardized format. Optimized for high-volume review where speed and consistency matter. tags: - summarization - agreement - corporate --- # Automated Contract Summarization Produces standardized executive summaries of contracts using machine learning, capturing essential term
tools
Extracts regulatory obligations from dense regulations across jurisdictions. Breaks down multi-level regulations into clear article-level obligations, classifies applicability to a business, and prioritizes by risk level. Use when translating regulations into actionable compliance requirements.
development
Continuously monitors regulatory landscapes for changes relevant to a specific business. Ingests global regulatory updates, filters by relevance, summarizes impact, and produces an actionable change advisory. Use when tracking regulatory developments affecting a particular product or market.
testing
Compares an organization's existing compliance controls, policies, and procedures against extracted regulatory obligations to identify coverage gaps. Produces a remediation plan with prioritized actions. Use when assessing compliance maturity or preparing for regulatory audits.