Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

tusosos/pdf-analyzer

Name: pdf-analyzer
Author: tusosos

skills/pdf-analyzer/SKILL.md

npx skillsauth add tusosos/manus-knowledge-base pdf-analyzer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

PDF Analyzer

Overview

Extract text, tables, and structured data from PDF files and convert them into usable formats. This skill handles text extraction, table detection, metadata reading, and output formatting for single or multi-page PDFs.

Instructions

When a user asks you to analyze, read, parse, or extract data from a PDF file, follow these steps:

Step 1: Identify the PDF and goal

Determine the file path and what the user wants extracted:

Full text: All readable text from every page
Tables: Structured tabular data
Metadata: Title, author, creation date, page count
Specific sections: Targeted content from certain pages
Summary: A condensed version of the document contents

Step 2: Choose the extraction method

Write a Python script using one of these libraries (prefer pdfplumber for tables, PyMuPDF for speed):

For text extraction:

import pdfplumber

def extract_text(pdf_path):
    text_by_page = []
    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages):
            text = page.extract_text()
            if text:
                text_by_page.append({"page": i + 1, "text": text.strip()})
    return text_by_page

For table extraction:

import pdfplumber
import csv

def extract_tables(pdf_path, output_csv=None):
    all_tables = []
    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages):
            tables = page.extract_tables()
            for table in tables:
                headers = table[0]
                rows = table[1:]
                all_tables.append({
                    "page": i + 1,
                    "headers": headers,
                    "rows": rows
                })
    if output_csv and all_tables:
        with open(output_csv, "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(all_tables[0]["headers"])
            for table in all_tables:
                writer.writerows(table["rows"])
    return all_tables

For metadata:

import pdfplumber

def extract_metadata(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        return {
            "pages": len(pdf.pages),
            "metadata": pdf.metadata
        }

Step 3: Run the script and format output

Execute the script, then present results in the format the user needs (plain text, JSON, CSV, markdown table, or summary).

Step 4: Handle errors gracefully

If extraction fails, try these fallback approaches:

Switch from pdfplumber to PyMuPDF (fitz)
For scanned PDFs, suggest OCR with pytesseract
For encrypted PDFs, inform the user a password is needed

Examples

Example 1: Extract all tables from an invoice PDF

User request: "Pull the line items table from invoice.pdf and save as CSV"

Actions taken:

Open invoice.pdf with pdfplumber
Detect tables on each page
Extract headers and rows
Write to invoice_items.csv

Output:

Found 1 table on page 1 with 5 columns: Item, Quantity, Unit Price, Tax, Total
Extracted 12 rows. Saved to invoice_items.csv.

Preview:
| Item            | Quantity | Unit Price | Tax   | Total   |
|-----------------|----------|------------|-------|---------|
| Widget A        | 100      | $2.50      | $25.00| $275.00 |
| Widget B        | 50       | $4.00      | $20.00| $220.00 |

Example 2: Summarize a research paper

User request: "Summarize the key findings from paper.pdf"

Actions taken:

Extract full text from all pages
Identify abstract, introduction, results, and conclusion sections
Produce a structured summary

Output:

Document: paper.pdf (14 pages)

Title: Effects of Sleep Duration on Cognitive Performance
Authors: Smith et al., 2024

Key Findings:
- Participants sleeping < 6 hours showed 23% slower reaction times
- Optimal cognitive performance observed at 7-8 hours of sleep
- No significant benefit observed beyond 9 hours

Methods: Randomized controlled trial, n=240, over 12 weeks
Conclusion: 7-8 hours of sleep optimizes cognitive performance in adults

Guidelines

Always check if the PDF is readable before attempting extraction. Some PDFs are image-only and require OCR.
For large PDFs (100+ pages), process in batches and show progress.
When extracting tables, validate that column counts are consistent across rows. Merged cells often cause misalignment.
Preserve the original page numbers in output so the user can cross-reference.
If a PDF has both text and scanned pages, extract text where available and flag scanned pages for OCR.
Never assume table headers. Always use the first row unless the user specifies otherwise.
For multi-column layouts (academic papers), extract text in reading order, not left-to-right across columns.

tusosos/pdf-analyzer

skills/pdf-analyzer/SKILL.md

Extract text, tables, metadata, and structured data from PDF files. Use when a user asks to read a PDF, parse a PDF, extract data from a PDF, summarize a PDF document, pull tables from a PDF, or convert PDF content to structured formats like JSON or CSV. Handles single and multi-page documents, scanned PDFs, and PDFs with complex table layouts.

development

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add tusosos/manus-knowledge-base pdf-analyzer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 2:39 AM120.1s1 file scanned

SKILL.md

name:: pdf-analyzer
description:: >-
license:: Apache-2.0
compatibility:: Requires Python 3.9+ with pdfplumber or PyMuPDF (fitz) installed
author:: terminal-skills
version:: 1.0.0
category:: documents
tags:: ["pdf", "document", "extraction", "tables", "parsing"]

PDF Analyzer

Overview

Instructions

When a user asks you to analyze, read, parse, or extract data from a PDF file, follow these steps:

Step 1: Identify the PDF and goal

Determine the file path and what the user wants extracted:

Full text: All readable text from every page
Tables: Structured tabular data
Metadata: Title, author, creation date, page count
Specific sections: Targeted content from certain pages
Summary: A condensed version of the document contents

Step 2: Choose the extraction method

Write a Python script using one of these libraries (prefer pdfplumber for tables, PyMuPDF for speed):

For text extraction:

import pdfplumber

def extract_text(pdf_path):
    text_by_page = []
    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages):
            text = page.extract_text()
            if text:
                text_by_page.append({"page": i + 1, "text": text.strip()})
    return text_by_page

For table extraction:

import pdfplumber
import csv

def extract_tables(pdf_path, output_csv=None):
    all_tables = []
    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages):
            tables = page.extract_tables()
            for table in tables:
                headers = table[0]
                rows = table[1:]
                all_tables.append({
                    "page": i + 1,
                    "headers": headers,
                    "rows": rows
                })
    if output_csv and all_tables:
        with open(output_csv, "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(all_tables[0]["headers"])
            for table in all_tables:
                writer.writerows(table["rows"])
    return all_tables

For metadata:

import pdfplumber

def extract_metadata(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        return {
            "pages": len(pdf.pages),
            "metadata": pdf.metadata
        }

Step 3: Run the script and format output

Execute the script, then present results in the format the user needs (plain text, JSON, CSV, markdown table, or summary).

Step 4: Handle errors gracefully

If extraction fails, try these fallback approaches:

Switch from pdfplumber to PyMuPDF (fitz)
For scanned PDFs, suggest OCR with pytesseract
For encrypted PDFs, inform the user a password is needed

Examples

Example 1: Extract all tables from an invoice PDF

User request: "Pull the line items table from invoice.pdf and save as CSV"

Actions taken:

Open invoice.pdf with pdfplumber
Detect tables on each page
Extract headers and rows
Write to invoice_items.csv

Output:

Found 1 table on page 1 with 5 columns: Item, Quantity, Unit Price, Tax, Total
Extracted 12 rows. Saved to invoice_items.csv.

Preview:
| Item            | Quantity | Unit Price | Tax   | Total   |
|-----------------|----------|------------|-------|---------|
| Widget A        | 100      | $2.50      | $25.00| $275.00 |
| Widget B        | 50       | $4.00      | $20.00| $220.00 |

Example 2: Summarize a research paper

User request: "Summarize the key findings from paper.pdf"

Actions taken:

Extract full text from all pages
Identify abstract, introduction, results, and conclusion sections
Produce a structured summary

Output:

Document: paper.pdf (14 pages)

Title: Effects of Sleep Duration on Cognitive Performance
Authors: Smith et al., 2024

Key Findings:
- Participants sleeping < 6 hours showed 23% slower reaction times
- Optimal cognitive performance observed at 7-8 hours of sleep
- No significant benefit observed beyond 9 hours

Methods: Randomized controlled trial, n=240, over 12 weeks
Conclusion: 7-8 hours of sleep optimizes cognitive performance in adults

Guidelines

Always check if the PDF is readable before attempting extraction. Some PDFs are image-only and require OCR.
For large PDFs (100+ pages), process in batches and show progress.
When extracting tables, validate that column counts are consistent across rows. Merged cells often cause misalignment.
Preserve the original page numbers in output so the user can cross-reference.
If a PDF has both text and scanned pages, extract text where available and flag scanned pages for OCR.
Never assume table headers. Always use the first row unless the user specifies otherwise.
For multi-column layouts (academic papers), extract text in reading order, not left-to-right across columns.

Related Skills

tusosos/yt-dlp

tools

VerifiedTrustedCommunity

Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.

SKILL.mdUpdated Apr 21, 2026

tusosos/youtube-downloader

development

VerifiedTrustedCommunity

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

SKILL.mdUpdated Apr 21, 2026

tusosos/youtube-downloader

tusosos/xlsx

development

VerifiedTrustedCommunity

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

SKILL.mdUpdated Apr 21, 2026

tusosos/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec or requirements for a multi-step task, before touching code

SKILL.mdUpdated Apr 21, 2026

tusosos/writing-plans

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/tusosos/manus-knowledge-base.git

# Copy into Claude Code skills folder (global)
cp -r manus-knowledge-base/skills/pdf-analyzer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

tusosos/manus-knowledge-base

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT