/SKILL.md
Convert PDF, DOCX, XLSX, and text files to clean, structured Markdown. CJK-friendly, table-friendly, privacy-first.
npx skillsauth add notoriouslab/doc-cleaner doc-cleanerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert documents (PDF, DOCX, XLSX, TXT) to clean, structured Markdown.
python3 {baseDir}/cleaner.py --input "{{file_path}}" --ai none
python3 {baseDir}/cleaner.py --input "{{file_path}}" --ai gemini
python3 {baseDir}/cleaner.py --input "{{file_path}}" --ai groq
python3 {baseDir}/cleaner.py --input "{{directory}}" --ai none --output-dir "{{output_dir}}"
python3 {baseDir}/cleaner.py --input "{{file_path}}" --dry-run --verbose
python3 {baseDir}/cleaner.py --input "{{file_path}}" --ai none --summary
The --summary flag prints a JSON summary to stdout after processing:
{"version":"1.0.0","total":3,"success":2,"failed":1,"files":[{"file":"report.pdf","output":"./output/report.md","status":"ok"},{"file":"scan.pdf","output":null,"status":"no_content"},{"file":"data.xlsx","output":"./output/data.md","status":"ok"}]}
| Flag | Description |
|---|---|
| --input, -i | File or directory to process (required, non-recursive) |
| --output-dir, -o | Output directory (default: ./output) |
| --ai | gemini, groq, ollama, or none (default: from config or gemini) |
| --password | PDF decryption password |
| --config | Path to config JSON |
| --summary | Print JSON summary to stdout after processing |
| --dry-run | Preview without writing files |
| --verbose | Enable debug logging |
PDF (native, scanned, encrypted), DOCX, XLSX, XLS, CSV, TXT, MD
| Code | Meaning | |---|---| | 0 | All files processed successfully | | 1 | Some files failed (partial success) | | 2 | No processable files found or config error |
./output/ relative to current directorygemini, groq, or ollama) gives much better results--ai none requires zero API keys and zero network accessdocumentation
Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
documentation
Maintain the OpenClaw memory wiki vault with deterministic pages, managed blocks, and source-backed updates.
documentation
Feishu knowledge base navigation. Activate when user mentions knowledge base, wiki, or wiki links.
documentation
Feishu permission management for documents and files. Activate when user mentions sharing, permissions, collaborators.