skills/pdf-text-extractor/SKILL.md
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
npx skillsauth add pr-e/openclaw-master-skills pdf-text-extractorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Vernox Utility Skill - Perfect for document digitization.
PDF-Text-Extractor is a zero-dependency tool for extracting text content from PDF files. Supports both embedded text extraction (for text-based PDFs) and OCR (for scanned documents).
clawhub install pdf-text-extractor
const result = await extractText({
pdfPath: './document.pdf',
options: {
outputFormat: 'text',
ocr: true,
language: 'eng'
}
});
console.log(result.text);
console.log(`Pages: ${result.pages}`);
console.log(`Words: ${result.wordCount}`);
const results = await extractBatch({
pdfFiles: [
'./document1.pdf',
'./document2.pdf',
'./document3.pdf'
],
options: {
outputFormat: 'json',
ocr: true
}
});
console.log(`Extracted ${results.length} PDFs`);
const result = await extractText({
pdfPath: './scanned-document.pdf',
options: {
ocr: true,
language: 'eng',
ocrQuality: 'high'
}
});
// OCR will be used (scanned document detected)
extractTextExtract text content from a single PDF file.
Parameters:
pdfPath (string, required): Path to PDF fileoptions (object, optional): Extraction options
outputFormat (string): 'text' | 'json' | 'markdown' | 'html'ocr (boolean): Enable OCR for scanned docslanguage (string): OCR language code ('eng', 'spa', 'fra', 'deu')preserveFormatting (boolean): Keep headings/structureminConfidence (number): Minimum OCR confidence score (0-100)Returns:
text (string): Extracted text contentpages (number): Number of pages processedwordCount (number): Total word countcharCount (number): Total character countlanguage (string): Detected languagemetadata (object): PDF metadata (title, author, creation date)method (string): 'text' or 'ocr' (extraction method)extractBatchExtract text from multiple PDF files at once.
Parameters:
pdfFiles (array, required): Array of PDF file pathsoptions (object, optional): Same as extractTextReturns:
results (array): Array of extraction resultstotalPages (number): Total pages across all PDFssuccessCount (number): Successfully extractedfailureCount (number): Failed extractionserrors (array): Error details for failurescountWordsCount words in extracted text.
Parameters:
text (string, required): Text to countoptions (object, optional):
minWordLength (number): Minimum characters per word (default: 3)excludeNumbers (boolean): Don't count numbers as wordscountByPage (boolean): Return word count per pageReturns:
wordCount (number): Total word countcharCount (number): Total character countpageCounts (array): Word count per pageaverageWordsPerPage (number): Average words per pagedetectLanguageDetect the language of extracted text.
Parameters:
text (string, required): Text to analyzeminConfidence (number): Minimum confidence for detectionReturns:
language (string): Detected language codelanguageName (string): Full language nameconfidence (number): Confidence score (0-100)config.json:{
"ocr": {
"enabled": true,
"defaultLanguage": "eng",
"quality": "medium",
"languages": ["eng", "spa", "fra", "deu"]
},
"output": {
"defaultFormat": "text",
"preserveFormatting": true,
"includeMetadata": true
},
"batch": {
"maxConcurrent": 3,
"timeoutSeconds": 30
}
}
const invoice = await extractText('./invoice.pdf');
console.log(invoice.text);
// "INVOICE #12345 Date: 2026-02-04..."
const contract = await extractText('./scanned-contract.pdf', {
ocr: true,
language: 'eng',
ocrQuality: 'high'
});
console.log(contract.text);
// "AGREEMENT This contract between..."
const docs = await extractBatch([
'./doc1.pdf',
'./doc2.pdf',
'./doc3.pdf',
'./doc4.pdf'
]);
console.log(`Processed ${docs.successCount}/${docs.results.length} documents`);
MIT
Extract text from PDFs. Fast, accurate, zero dependencies. 🔮
development
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
devops
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
content-media
# youtube-auto-captions - YouTube 自动字幕 ## 描述 自动为 YouTube 视频生成字幕,支持多语言翻译、时间轴校准。提升视频可访问性和 SEO。 ## 定价 - **按次收费**: ¥9/次 - 每视频最长 60 分钟 - 支持 50+ 语言 ## 用法 ```bash # 生成字幕 /youtube-auto-captions --video <video_id> --lang zh # 翻译字幕 /youtube-auto-captions --video <video_id> --translate en,ja,ko # 批量处理 /youtube-auto-captions --playlist <playlist_id> --lang zh # 导出字幕 /youtube-auto-captions --video <video_id> --export srt ``` ## 技能目录 `~/.openclaw/workspace/skills/youtube-auto-captions/` ## 作者 张 sir #
development
YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).