/SKILL.md
PDF/PPT/Image -> Markdown OCR with Zhipu GLM-OCR, strict verification, and failure-safe fallback.
npx skillsauth add sunrisever/glm-ocr glm-ocrInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the task involves:
input/ — source files waiting for OCRoutput/ — Markdown output, extracted images, and any _failed_segments/*.failed.json_cache/ppt_pdf/ — cached PDFs converted from PPT/PPTXocr.py — main OCR pipelineverify_ocr.py — acceptance checkaudit_ocr_integrity.py — deep integrity auditreference_book_metadata.py — textbook directory page, page offset, and QR resource generationbackfill_reference_book_directory_pages.py — batch refresh textbook metadatarerun_pdf_segments.py — rerun only failed or suspicious page rangesduplicate_image_reviewer.py — local UI for duplicate/similar image reviewclean_junk_images.py — duplicate audit, similarity search, purge, and legacy size-clean fallbackmarkdown_cleanup.py / repair_math_delimiters.py — OCR-side Markdown and LaTeX delimiter cleanupKNOWLEDGE_PIPELINE.md — source-library -> OCR intermediate -> Obsidian note workflowinput/.python ocr.py.python verify_ocr.py and python audit_ocr_integrity.py.目录页.md, page offsets, and QR metadata via reference_book_metadata.py or backfill_reference_book_directory_pages.py._failed_segments/*.failed.json.segment PDF upload -> per-page image OCR -> native PDF text fallback._failed_segments/*.failed.json.1301 contentFilter: split the segment first, rerun the blocked page separately, then use a secondary OCR / vision path only for the blocked page if needed.segment_*.md: do not delete until ranged .md coverage and content have been compared.input/, output/, and downstream library names in sync.$ 2x+1 $: clean them at OCR output time, not later in the note-writing stage.images/.AI visual supplementation (non-GLM-OCR output) or equivalent instead of pretending it came from the main OCR pipeline.OCR_AUDIT_POLICY.md._cache/ppt_pdf/, not in output/.verify_ocr.py is the minimum acceptance gate.audit_ocr_integrity.py should be used whenever you need confidence that there is no legacy mixed output or silent corruption.KNOWLEDGE_PIPELINE.md.data-ai
Example TaskFlow authoring pattern for inbox triage. Use when messages need different treatment based on intent, with some routes notifying immediately, some waiting on outside answers, and others rolling into a later summary.
data-ai
Example TaskFlow authoring pattern for inbox triage. Use when messages need different treatment based on intent, with some routes notifying immediately, some waiting on outside answers, and others rolling into a later summary.
data-ai
OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
data-ai
OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.