.ai-rulez/skills/mime-detection-routing/SKILL.md
mime detection routing
npx skillsauth add kreuzberg-dev/kreuzberg .ai-rulez/skills/mime-detection-routingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extension → EXT_TO_MIME map → validate → Registry lookup → Extractor
| Function | Location | Purpose |
|----------|----------|---------|
| detect_mime_type(path, inspect) | core/mime.rs | Extension + optional content inspection |
| detect_mime_type_from_bytes(bytes) | core/mime.rs | Magic number detection (infer crate) |
| validate_mime_type(mime) | core/mime.rs | Check if any extractor supports it |
118+ extensions mapped in EXT_TO_MIME (core/mime.rs). Case-insensitive.
Key mappings: .pdf → application/pdf, .docx → application/vnd.openxmlformats-officedocument.wordprocessingml.document, .xlsx → spreadsheet variant, .png/.jpg → image/*
// In core/extractor/bytes.rs
fn select_extractor_for_mime(mime_type: &str) -> Result<Arc<dyn DocumentExtractor>> {
let registry = get_document_extractor_registry();
let registry_guard = registry.read()?;
registry_guard.get_for_mime_type(mime_type)
.ok_or_else(|| KreuzbergError::UnsupportedFormat(mime_type.into()))
}
Selects highest-priority extractor registered for that MIME type.
m.insert("ext", "application/x-new"); in core/mime.rsDocumentExtractor with supported_mime_types() returning the MIMEregister_default_extractors()Extractors can register for MIME type families: "image/*" matches image/png, image/jpeg, etc.
validate_mime_type() before extractiontools
Extract text, tables, metadata, and images from 91+ document formats (PDF, Office, images, HTML, email, archives, academic) using Kreuzberg. Use when writing code that calls Kreuzberg APIs in Python, Node.js/TypeScript, Rust, or CLI. Covers installation, extraction (sync/async), configuration (OCR, chunking, output format), batch processing, error handling, and plugins.
testing
test execution patterns
development
ocr uackend management
development
format specific extraction