df1ef9f0-3138-4b76-8be9-a0e40bc4ccef/claude-plugin/skills/data-aggregation/document-parser/SKILL.md
Parse and extract structured data from PDF, XLSX, DOCX, and CSV files in the workspace. Handles financial statements, LP reports, fund documents, data rooms, and other PE/VC materials. Outputs structured facts with full provenance metadata.
npx skillsauth add ganoro/equiforte-workspaces-local-2 df1ef9f0-3138-4b76-8be9-a0e40bc4ccef/claude-plugin/skills/data-aggregation/document-parserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extract structured data from documents in the workspace. This skill is called during the AGGREGATE phase of the ACSR lifecycle.
| Format | Method | Key Libraries |
|--------|--------|--------------|
| PDF | Python: pdfplumber or PyMuPDF | Tables, text blocks, page-level extraction |
| XLSX | Python: openpyxl | Named ranges, multiple sheets, formulas |
| DOCX | Python: python-docx | Paragraphs, tables, headers |
| CSV | Python: pandas | Tabular data |
find /workspace -type f \( -name "*.pdf" -o -name "*.xlsx" -o -name "*.docx" -o -name "*.csv" \) | head -100
List all parseable files. Present the inventory to guide prioritization.
Before parsing, classify the document type:
Write a Python extraction script tailored to the document type. Example for PDF:
import pdfplumber
import json
def extract_pdf(filepath):
results = []
with pdfplumber.open(filepath) as pdf:
for i, page in enumerate(pdf.pages):
# Extract tables
tables = page.extract_tables()
for table in tables:
results.append({
"type": "table",
"page": i + 1,
"data": table
})
# Extract text blocks
text = page.extract_text()
if text:
results.append({
"type": "text",
"page": i + 1,
"content": text
})
return results
For each extracted fact, create a provenance record:
## Extracted: [description]
- **Value**: [value]
- **Source**: [filename], page [N] / sheet [name] / section [heading]
- **As-of Date**: [date if identifiable]
- **Confidence**: VERIFIED if from audited source, REPORTED otherwise
- **Raw Text**: "[exact text from document]"
Save extracted data to _research/sources/source-NNN.md with:
Look for: beginning balance, contributions, distributions, management fees, carried interest, ending balance, IRR, TVPI, DPI, RVPI.
Look for: NAV, portfolio company updates, cash flow summary, performance metrics, market commentary.
Look for: revenue, EBITDA, net income, total assets, total debt, cash, working capital. Note the reporting period and currency.
Look for: methodology (DCF, comparables, precedent transactions), key assumptions (discount rate, exit multiple, growth rate), fair value conclusion, ASC 820 level classification.
tools
MANDATORY workspace file-writing rules for the containerized agent environment. Triggers whenever files are created, written, saved, or generated. All deliverables MUST go to output/ subdirectory. Absolute paths like /home/user/ will fail.
tools
Perform fair value analysis for PE/VC portfolio companies — comparable company analysis, precedent transactions, DCF modeling, ASC 820 / IFRS 13 fair value hierarchy classification, and valuation reconciliation. Ensures compliance with accounting standards.
development
Use this skill whenever a user wants to research a portfolio company and generate an ILPA-compliant Portfolio Company Metrics (PortCo) report. Triggers include: any mention of "ILPA", "portfolio company report", "PortCo template", "PE fund reporting", "GP reporting", "LP reporting", "buyout company metrics", "growth equity metrics", or when a user asks to "research a company for private equity", "compute KPIs for a portco", "generate fund performance metrics", or "fill out a portfolio company template". Also triggers when a user provides a company name and asks for financial analysis in a PE/VC context, including requests for EBITDA multiples, IRR, MOIC, ownership metrics, or debt analysis. Output formats supported: Excel (.xlsx), PowerPoint (.pptx), or PDF report.
tools
Generate Excel workbooks (XLSX) from structured financial data using openpyxl. Creates formatted spreadsheets with multiple tabs for fund performance, portfolio data, cash flows, and waterfall calculations. Includes formulas, conditional formatting, and charts. Always read the design-system skill first.