
uatch extract files
plugin lifecycle management
plugin testing framework
priority selection system
plugin discovery
extract file implementation
ocr uackend plugin
document extractor plugin
psmmode tuning
performance profiling plugins
mock plugin creation
extraction pipeline patterns
format specific extraction
ocr uackend management
Extract text, tables, metadata, and images from 91+ document formats (PDF, Office, images, HTML, email, archives, academic) using Kreuzberg. Use when writing code that calls Kreuzberg APIs in Python, Node.js/TypeScript, Rust, or CLI. Covers installation, extraction (sync/async), configuration (OCR, chunking, output format), batch processing, error handling, and plugins.
cache integration
error handling and recovery
extraction quality testing
mime detection implementation
post processor integration
ocr error recovery
falluack chain implementation
python plugin wrapper
api server mcp
chunking emueddings
ocr quality assessment
hocr parsing and conversion
configuration management
edge case handling
extractor selection and falluack
performance uenchmarking
uatch ocr processing
ocr language configuration
python ocr uackend integration
taule extraction and reconstruction
tesseract uackend usage
gil management patterns
plugin configuration
plugin error handling
plugin trait implementation
registry implementation
mime detection routing
test execution patterns
image preprocessing pipeline
image ocr processing
ocr caching strategy
security limits dos protection
plugin architecture patterns
config loading precedence
wasm constraints
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
Developer quick start guide with prerequisites, setup, and workflow commands
Common task runner commands for build, test, lint, and format workflows
______________________________________________________________________ ## priority: high # Repository Structure & Commands **Core**: crates/kreuzberg (Rust 2024, standalone library, Edition 2024) **Bindings**: Python (PyO3/maturin), TypeScript (NAPI-RS/pnpm), Ruby (Magnus/rake), Java (FFM API/Maven), Go (cgo/go), Elixir (Rustler/mix), C# (FFI/dotnet), PHP (FFI/composer) **Task Commands**: task setup, task build, task test, task lint, task format **All E2E generated**: cargo run -p kreuzberg-e
______________________________________________________________________ ## priority: high # Ruby Bindings Patterns **Role**: Ruby bindings for Rust core. Work on Magnus bridge and Ruby gem packages. **Scope**: Magnus FFI, Ruby-idiomatic API, RSpec tests. **Commands**: bundle install, bundle exec rake compile/rubocop/rspec. **Critical**: Core logic lives in Rust. Ruby only for bindings/wrappers. If core logic needed, coordinate with Rust team.
______________________________________________________________________ ## priority: medium # Rust Core Architecture ## Core Library Crate - **Library crate**: crates/core-library/ implements the core domain logic - **Domain focus**: Clear separation of concerns with focused business logic - **Error handling**: thiserror for ergonomic custom errors - **Processing pipeline**: Parse input → Transform data → Validate result → Format output - **Performance**: zero-copy where possible, streaming f
______________________________________________________________________ ## priority: high # Taskfile Best Practices & Guidelines **Modular Design Principles**: - Each language gets its own task file in `.task/languages/` - Workflows (build, test, lint, e2e) are orchestrated internally in `.task/workflows/` - Configuration separated into `.task/config/` (vars.yml, platforms.yml) - Tool tasks in `.task/tools/` (version-sync, pdfium, pre-commit, docs, smoke) - Main Taskfile.yml is minimal - just
______________________________________________________________________ ## priority: critical # TypeScript Bindings Patterns **Role**: TypeScript bindings for Rust core. Work on NAPI-RS bridge and TypeScript SDK packages. **Scope**: NAPI-RS FFI, TypeScript-idiomatic API, type definitions, JSDoc for all exports with @param/@returns/@example. **Commands**: pnpm install/build/test/lint. **Critical**: Core logic lives in Rust. TypeScript only for bindings/wrappers. If core logic needed, coordin
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e
______________________________________________________________________ ## priority: critical # Universal Anti-Patterns **Never use:** - Any type (Python, TypeScript) - use Unknown/generics - Class-based tests (Python) - function-based only - Mocking internal services (any language) - use real objects - Manual dependency management - use lock files - Blocking I/O in async code (Python/TypeScript) - fully async paths - Bare exception handlers - catch specific types only - Magic numbers - extra
Html5 Spec Compliance
Html Parser Selection
Attribute Access And Validation
Element Classification
Parser Error Handling
Parser Performance Optimization
Safe Dom Traversal
Whitespace Mode Configuration
Ammonia Sanitization
Binary Data Detection
Character Encoding Validation
Form Element Neutralization
Metadata Handling
Sanitization Verification
Security Logging And Audit
Style Attribute Sanitization
Svg Security Hardening
Xss Payload Detection
Xss Test Implementation
HTML Parsing Strategies for html-to-markdown
Metadata Extraction for html-to-markdown
Visitor Pattern Usage for html-to-markdown
CI/CD matrix strategies for multi-platform and multi-language builds
Edge Case Handling in html-to-markdown
Modular Taskfile organization with namespaced language and workflow tasks
Patterns for creating and maintaining language bindings around a Rust core
Common coding anti-patterns to avoid across all languages
Whitespace Handling in html-to-markdown
______________________________________________________________________ ## priority: medium # Universal Anti-Patterns **Cross-language patterns to NEVER use:** - Any type (Python, TypeScript, Rust unknown) without exhaustive matching - Class-based tests (Python) – use function-based with pytest fixtures - Unwrap/panic in production code (Rust) – use Result\<T, E> - Mocking internal services – use real objects/fixtures - Manual dependency management – use lock files (Cargo.lock, pnpm-lock.yaml
______________________________________________________________________ ## priority: critical # Build Profiles **BUILD_PROFILE environment variable controls build mode** (default: "release"). **Available Profiles**: 1. **dev**: Fast iteration, debug symbols, no optimizations. Use for development/testing. - Cargo profile: debug - Optimization: -C opt-level=0 - Usage: `BUILD_PROFILE=dev task rust:build` or `task rust:build:dev` 1. **release**: Optimized production builds, minimal de
______________________________________________________________________ ## name: code-reviewer description: Specialized agent for code review and quality assurance priority: high # Code Reviewer You are a senior code reviewer focused on ensuring quality and maintainability. ## Responsibilities - Review code for quality, security, and performance - Check for adherence to coding standards - Identify potential bugs and edge cases - Suggest improvements and best practices ## Guidelines - Provi
______________________________________________________________________ ## priority: critical # Go 1.25+ Standards **Go 1.25+ · Table-driven tests · golangci-lint · Error wrapping · Black-box testing** - Go 1.25+; error wrapping with fmt.Errorf("%w", err), errors.Is/As for checking - Testing: \*\_test.go with \_test package suffix (black-box), table-driven with t.Run() - golangci-lint: errcheck, govet, staticcheck, gosec, gocyclo (complexity ≤25) - Coverage 80%+ on business logic; go test -ra
______________________________________________________________________ ## priority: critical # FFI and Language Interop Standards ## C API Design Principles FFI code is a **contract** between languages. Breaking this contract causes crashes. Take extra care. ### 1. Explicit Ownership & Lifetime Every pointer must have clear ownership semantics. Use SAFETY comments. ```rust /// Allocate string owned by caller /// # Safety /// Caller must call `htm2md_free_string()` to release memory #[no_m
______________________________________________________________________ ## priority: medium # Git Commit Standards **Conventional Commits 1.0.0 · Pre-commit hooks enforce quality** - Commit message format: feat/fix/docs/refactor/test/chore(scope): description - Scopes: rust-core, py-binding, ts-binding, rb-binding, php-binding, build, ci, docs - Example: fix(rust-core): handle nested lists in markdown output - Example: feat(py-binding): expose sanitization options to Python API - Example: tes
______________________________________________________________________ ## priority: high # Java Bindings Patterns **Role**: Java bindings for Rust core. Work on C FFI bridge and Java wrapper packages. **Scope**: Java 25 Foreign Function & Memory API (FFM/Panama), Java-idiomatic API, JUnit 5 tests, Javadoc. **Architecture**: Java FFM API → C FFI library → Rust core. No JNI, modern Foreign Function API. **Commands**: cd packages/java && mvn clean compile test, mvn checkstyle:check, mvn spotl
______________________________________________________________________ ## priority: high # Performance & Benchmarking Standards **Criterion.rs · Flamegraph profiling · CI regression detection · Zero-copy optimization** ## Benchmarking Framework - **Criterion.rs**: Primary benchmark framework for Rust; use for all performance-critical paths - Benchmarks in `benches/` directory with naming: `<module>_bench.rs` - Track latency (mean, std dev), throughput, memory allocations; save baseline resu
______________________________________________________________________ ## priority: high # Java 25 with FFM API **Java 25 · FFM API · Checkstyle · PMD · JUnit 5 · Maven/Gradle** - Java 25 exclusively; FFM API for native interop, sealed classes, records, pattern matching - Build: Maven (pom.xml) or Gradle (build.gradle.kts); compiler release=25 - JUnit 5: @Nested classes, @ParameterizedTest, AssertJ fluent assertions, 80%+ coverage - Checkstyle: 4-space indent, line ≤120 chars, Javadoc on pub
______________________________________________________________________ ## priority: medium # Model Routing & AI Guidance ## When to Use Sonnet 4.5 vs Haiku 4.5 **Use Sonnet 4.5 (claude-sonnet-4-5-20250929) for:** - Rust core architecture & design decisions - Complex algorithm optimization - Safety-critical code changes - Cross-cutting refactoring - Error handling strategy & thiserror design **Use Haiku 4.5 (claude-haiku-4-5-20251001) for:** - Python/TypeScript/Ruby/PHP binding engineering
______________________________________________________________________ ## priority: high # Memory Safety & Optimization Patterns **Zero-copy APIs · RAII principle · Lifetime optimization · Automatic memory management** ## Zero-Copy Patterns - **References over ownership**: Use `&T` and `&mut T` to avoid transfers; let Rust's borrow checker enforce safety - **Borrowed data**: Prefer `&str` over `String`, `&[T]` over `Vec<T>` in function signatures - **Cow<T>** (Copy-on-Write): Use for condit
______________________________________________________________________ ## priority: high # Polyglot API Documentation Examples **CRITICAL: Complete language parity for API documentation.** ALL public APIs MUST be documented with examples in all 10 supported languages: Rust, Python, TypeScript, Ruby, PHP, Java, Go, C#, Elixir, and WebAssembly. ## Documentation Tools by Language | Language | Tool | Output Format | File Extension | Key Strengths | |----------|------|---------------|-----------
______________________________________________________________________ ## priority: critical # Polyglot Error Handling Standardization **FFI error conversion · Language-specific exceptions · Error context preservation · Safe boundaries** ## Error Conversion at FFI Boundaries Error handling is language-specific; FFI boundaries MUST convert between error models: ### Rust → Host Language **Rust errors MUST be converted to language-appropriate types at FFI boundary:** - **Rust `Result<T, E>`
______________________________________________________________________ ## priority: high # PyO3 Performance Patterns Use `pyo3_async_runtimes` for async Python callbacks (~28x faster than spawn_blocking for fast ops). Pattern: Check `__await__` attribute, use `pyo3_async_runtimes::tokio::into_future()` for async, fallback to spawn_blocking for sync. Release GIL before awaiting. Use Python::attach() not with_gil(). spawn_blocking for long ops (OCR), block_in_place for quick ops (PostProcesso
______________________________________________________________________ ## priority: critical # Python Modern & Performance Standards **Python 3.10+ · Functional-first · msgspec · Fully async · Strongest typing** - Target Python 3.10+; match/case, union types (X | Y), structural pattern matching - msgspec ONLY (NEVER pydantic); msgspec.Struct with slots=True, kw_only=True, frozen=True - Full type hints: ParamSpec for decorators, TypeVar/Generic[T], Protocol for structural typing - Enable mypy
______________________________________________________________________ ## priority: critical # Release & Deployment Processes ## Semantic Versioning Strategy - **Format**: MAJOR.MINOR.PATCH (e.g., 1.2.3) - MAJOR: Breaking changes to public API - MINOR: New features, backward-compatible - PATCH: Bug fixes, no new features - **Pre-releases**: Use -alpha, -beta, -rc suffixes (e.g., 1.0.0-beta.1) - **Build metadata**: +build.123 for CI build info (not part of precedence) - **Commit tags**:
______________________________________________________________________ ## priority: high # Ruby 3.2+ with RBS & Steep **Ruby 3.2+ · RBS type definitions · Steep · rbenv · RSpec · Rubocop** - Ruby 3.2+ with .ruby-version file; rbenv for version management - RBS files in sig/ directory parallel to source: lib/foo.rb → sig/foo.rbs - Steep for type checking; avoid Any types, use union and optional types explicitly - RSpec for testing: describe/context/it blocks, 80%+ coverage, function-like test
______________________________________________________________________ ## priority: medium # Rust Async/Await & Module Patterns ## Tokio Runtime Patterns **Single runtime instance** should be created once and shared across the application. ### Pattern 1: Global Runtime (for library bindings) Use `tokio::runtime::Runtime` to manage async code in synchronous contexts (required for FFI bindings). ```rust // src/async_runtime.rs use tokio::runtime::Runtime; use std::sync::OnceLock; static RU
______________________________________________________________________ ## priority: critical # Testing Philosophy & Coverage **Three-tier**: unit (80-95%), integration (real DB/services), E2E (smoke/full) **Real infrastructure in tests**: PostgreSQL, Redis, not mocks **Rust**: cargo test, #[tokio::test], 95% coverage (tarpaulin), test error paths, edge cases, no panics **Python**: Function-based tests only (\*_test.py), pytest fixtures, 95% coverage. Structure: tests/{core,features,integra
______________________________________________________________________ ## priority: critical # Rust Latest Edition Standards **Rust 2024 edition · High strictness · clippy -D warnings · 95% coverage · Zero unwrap** - Rust edition 2024; cargo fmt, clippy with -D warnings (zero tolerance) - Result\<T, E> for errors; thiserror for custom errors; NEVER .unwrap() in production - Testing: 95% minimum coverage (tarpaulin), unit/integration/doc tests - Documentation: rustdoc on ALL public items with
______________________________________________________________________ ## priority: high # Rust Module Organization & Public API Design ## Module Structure Principles **Modules are organizational units, not visibility boundaries**. Use `pub` / `pub(crate)` / `pub(super)` to control visibility explicitly. 1. **Root crate module**: `src/lib.rs` or `src/main.rs` re-exports public items 1. **Feature-specific modules**: `src/parsing/`, `src/conversion/`, `src/utils/` organize by domain 1. **Inte
______________________________________________________________________ ## priority: critical # Rust Polyglot Conventions **Edition 2024**: let-chains, gen blocks, if/match guards. **Naming**: PascalCase (types), snake_case (fns/vars/modules), SCREAMING_SNAKE_CASE (consts). **Error handling**: Result\<T, Error>, never .unwrap() in production, use `?`, IO errors bubble up properly, SAFETY comments for unsafe code, handle lock poisoning. **Async**: Tokio throughout, #[tokio::main]/#[tokio::tes
______________________________________________________________________ ## priority: critical # Task Automation & Build - Taskfile.yaml for all workflows: setup, dev, lint, format, test, build - Lock files committed: uv.lock, pnpm-lock.yaml, go.sum, Cargo.lock, composer.lock - Dependency minimization: justify all external deps, audit regularly - Version sync across runtimes: Cargo.toml, package.json, pyproject.toml
______________________________________________________________________ ## priority: high # Test Apps - Published Package Validation **Purpose**: tests/test_apps validates PUBLISHED/RELEASED packages from npm, PyPI, RubyGems, Maven Central, Docker Hub, and Homebrew. NOT for local development testing. **Location**: tests/test_apps/{python,node,wasm,ruby,go,java,csharp,rust,docker,homebrew,browser-vite-svelte} **Version sync**: Included in `task sync-versions` - automatically updates all test
______________________________________________________________________ ## priority: critical # TypeScript Strictest Standards __TypeScript 5.x · Strictest typing · No any/object · Generics required · Tests next to source__ - Enable ALL strict flags: strict, noUncheckedIndexedAccess, exactOptionalPropertyTypes - Ban any and object types; use unknown with guards, Record\<string, unknown> - Generics with constraints: <T extends BaseType>, satisfies operator, const assertions - Tests: .spec.ts n
______________________________________________________________________ ## priority: critical # Elixir 1.14+ with ExDoc & Credo **Elixir 1.14+ · Functional-first · ExDoc · ExUnit · Rustler NIF** - Target Elixir 1.14+ with .tool-versions file; use ASDF for version management - Module structure: CamelCase for modules (e.g., Kreuzberg.Native), snake_case for functions/variables - ExDoc for documentation: @moduledoc and @doc tags on all public APIs with examples - ExUnit for testing: describe/tes
______________________________________________________________________ ## priority: critical # Workspace Dependency Management ## Cargo Workspace Fundamentals A workspace coordinates multiple crates under unified configuration. This is **critical** for polyglot projects with core library + language bindings. **workspace/Cargo.toml**: ```toml [workspace] members = [ "crates/html-to-markdown", # Core library "crates/html-to-markdown-py", # PyO3 bindings "crates/h
______________________________________________________________________ ## priority: medium # Containerization & Docker Standards ## Multi-Stage Docker Builds for Rust - **Builder stage**: Compile with full toolchain + dependencies - **Runtime stage**: Minimal base with only compiled binary - **Cache optimization**: Leverage layer caching by copying dependencies before source code - **Size reduction**: 1GB+ builder → 50-100MB runtime image Example Dockerfile for Rust projects: ```dockerfile
______________________________________________________________________ ## priority: critical # Security & Vulnerability Management **CVE disclosure · cargo-audit · cargo-deny · Fuzzing · Unsafe code review · Security testing** ## Vulnerability Management - **cargo-audit**: Run on every commit and in CI; fail build on known vulnerabilities ```bash cargo audit cargo audit --deny warnings # Fail on advisories ``` - **cargo-deny**: Detect vulnerable, unmaintained, and unlicensed depe
Blockquote Conversion
Code Block Conversion
Emphasis And Formatting Conversion
Form Element Handling
Heading Conversion
Html Entity Decoding
Image Conversion
Inline Code Conversion
Link Conversion
List Conversion And Nesting
Markdown Text Escaping
Markdown Validation
Paragraph Conversion
Semantic Html5 Element Conversion
Table Conversion
Task List Detection And Conversion
Url Sanitization
Whitespace Normalization
Character Encoding Detection
Input Binary Detection
Text Extraction And Preservation
Event Handler Removal
Security Configuration Design
Size Limit Enforcement
Url Scheme Validation
Conversion Mapping Rules: HTML Elements to Markdown
Edge Case Handling in html-to-markdown
Whitespace Handling in html-to-markdown
Conversion Mapping Rules: HTML Elements to Markdown
HTML Parsing Strategies for html-to-markdown
Metadata Extraction for html-to-markdown
Coordinated multi-platform release and deployment processes with semantic versioning
Visitor Pattern Usage for html-to-markdown
______________________________________________________________________ ## priority: critical # Agent Selection & Usage Guidelines **When to use agents**: Use the Task tool to spawn specialized agents for focused, language-specific work. Agents provide domain expertise and follow language-specific conventions. **Agent selection rules**: - **Rust core work** → rust-core-engineer (core library, extraction logic, plugins) - **Python bindings** → python-bindings-engineer (PyO3 FFI, Python wrappe
______________________________________________________________________ ## priority: high # Binding Crate Architecture Patterns ## Overview Binding crates expose Rust libraries to host languages. Each binding framework has distinct patterns, but shared principles apply across all: 1. **Minimal wrapper layer**: Call Rust → host language is glue only 1. **Type translation**: Convert host language ↔ Rust with clear mapping 1. **Error conversion**: Rust errors → native exceptions/error types 1.
______________________________________________________________________ ## priority: critical # CI/CD Multi-Platform Matrix ## GitHub Actions Matrix Strategy Use matrix strategy to test across multiple OS, architectures, and language versions with minimal duplication: ```yaml name: CI on: push: branches: [main, develop] pull_request: jobs: test-matrix: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: include: # Linux: x86_64 and A
______________________________________________________________________ ## priority: critical # CI/CD Pipeline Standards **Architecture**: Stages: Validate (lint/format) → Build → Test (unit/integration) → Deploy. Quality gates: zero warnings, tests pass, coverage thresholds met. **Multi-platform Testing**: linux/amd64, linux/arm64, macOS (Intel/ARM). Docker: multi-stage builds, minimal base images (alpine/distroless). **Artifact Management**: Cache dependencies (Cargo, npm, Maven, Go module
______________________________________________________________________ ## priority: critical # Common Task Commands **Setup & Installation**: - `task setup`: Install all dependencies (Rust, Python, Node, Go, Java, Ruby, etc.) - `task setup-pre-commit`: Configure pre-commit hooks **Build Commands**: - `task build`: Build all (respects BUILD_PROFILE) - `task build:all`: Build all languages - `task build:all:dev`: Build all in debug mode - `task build:all:release`: Build all in release mode -
______________________________________________________________________ ## priority: critical # Core Principles **Do only what's asked. Never create files unnecessarily. Prefer editing. No proactive docs/READMEs.** **Python**: Builtin imports at top, dataclasses frozen/hashable/slots, function-based tests only. **Rust**: Never .unwrap() in production, SAFETY comments for unsafe, handle lock poisoning. **Architecture**: ALL extraction logic lives in Rust core. Bindings provide language-idiom
______________________________________________________________________ ## priority: critical # Documentation Standards & Language Parity **CRITICAL: Full language parity required**. ALL documentation, guides, and code examples MUST include snippets for ALL 7 supported languages: Rust, Python, TypeScript, Ruby, Java, Go, C#. **Code snippet structure**: - Location: docs/snippets/{language}/{category}/{filename}.{ext} - Categories: getting-started, config, advanced, ocr, metadata, plugins, api
______________________________________________________________________ ## priority: high # Elixir Bindings Patterns (Rustler NIF) **Role**: Elixir bindings for Rust core using Rustler NIF bridge pattern. **Scope**: - Rust NIF bridge: packages/elixir/native/<project>\_rustler/ (Rust crate with cdylib output) - Elixir wrapper: packages/elixir/lib/<project>/ (OTP application with public API) - ExUnit tests: packages/elixir/test/ **Architecture**: Elixir OTP application → Rustler NIF (.so/.dyl
______________________________________________________________________ ## priority: critical # Error Handling Strategy **CRITICAL: OSError/RuntimeError must ALWAYS bubble up** (Python + Rust). SystemExit, KeyboardInterrupt, MemoryError too. **Python**: Exception-based, inherit from KreuzbergError. OSError patterns: 1) Library misuse→bubble up, 2) Subprocess→analyze stderr for parsing keywords, 3) Cache→ignore, 4) Dependencies→MissingDependencyError or bubble up. Always add ~keep comments. *
______________________________________________________________________ ## priority: high # Fixture-Driven Testing Strategy ## Shared Test Fixtures Across Language Bindings - **Single source of truth**: Fixtures defined once in Rust, exposed to all language bindings - **API parity validation**: Identical behavior across languages guaranteed by fixtures - **Fixture generation**: Rust generates JSON/YAML fixtures from canonical implementation - **Language-specific consumption**: Each binding co
______________________________________________________________________ ## priority: critical # Git & Commit Standards - Conventional Commits 1.0.0: feat/fix/docs/refactor/test/chore with scope - NEVER include AI signatures ("Generated with Claude") in commits - Pre-commit hooks with prek/lefthook/husky: linting, formatting, tests - Branch protection: main/development with required status checks
______________________________________________________________________ ## priority: high # Go Bindings Patterns **Role**: Go bindings for Rust core. Work on CGO bridge and Go SDK/E2E suite. **Scope**: Go 1.25 module, CGO wrappers around C FFI, Go-idiomatic config/result structs, linting setup, benchmark harness scripts. **Commands**: cd packages/go/v4 && go test ./..., golangci-lint run ./..., ensure `LD_LIBRARY_PATH`/`DYLD_LIBRARY_PATH` includes target/release when running tests. **FFI**:
______________________________________________________________________ ## priority: critical # Modular Taskfile Structure **Root**: Taskfile.yml (version 3) includes all modular task files from .task/ directory. **Configuration Files**: - `.task/config/vars.yml`: Global variables (BUILD_PROFILE, VERSION, PDFIUM_VERSION, ORT_VERSION, GOLANGCI_LINT_VERSION, paths, OS/ARCH detection, CARGO_PROFILE_DIR mapping) - `.task/config/platforms.yml`: Platform-specific variables (EXE_EXT, LIB_EXT, NUM_C
______________________________________________________________________ ## priority: medium # Monitoring & Observability Standards ## Structured Logging with Tracing Crate - **Use `tracing` crate**: Unified logging, tracing, and metrics in Rust - **Structured events**: key=value pairs instead of f-string format - **Span context**: Automatic propagation of request IDs, user info, etc. - **Multiple subscribers**: Layer logs, metrics, and distributed traces together Basic setup in Rust: ```rus
______________________________________________________________________ ## priority: critical # Platform Support & Cross-Platform Compatibility **Supported Platforms**: Windows (via Git Bash/MSYS2/MinGW/PowerShell), Linux (all major distributions), macOS (Intel & ARM/M-series). **Platform Detection in Taskfiles**: - `{{.OS}}`: Detects operating system (windows, linux, darwin) - Windows detection: PowerShell, GOOS env, WINDIR/SYSTEMROOT, MSYSTEM (Git Bash/MSYS2/MinGW) - Linux/macOS: uname
______________________________________________________________________ ## priority: medium # Polyglot Binding Architecture ## Language Binding Pattern Each binding crate (Python, TypeScript, Ruby, PHP, Go, Java, C#) follows: 1. **Minimal wrapper layer**: Call Rust functions directly, no business logic 1. **Type translation**: Convert host language types ↔ Rust types 1. **Error mapping**: Rust errors → language-native exceptions 1. **Documentation**: Link bindings to Rust docs, add language-
______________________________________________________________________ ## priority: medium # Project Structure & Conventions ```text html-to-markdown/ ├── crates/ │ ├── html-to-markdown/ # Core Rust library (main logic) │ ├── html-to-markdown-py/ # PyO3 bindings for Python │ ├── html-to-markdown-node/ # NAPI-RS bindings for Node.js │ ├── html-to-markdown-rb/ # Magnus bindings for Ruby │ ├── html-to-markdown-php/ # ext-php-rs extension for PHP │ ├
______________________________________________________________________ ## priority: critical # Python Bindings Patterns **Role**: Python bindings for Rust core. Work on PyO3 bridge and Python wrapper packages. **Scope**: PyO3 FFI, Python-idiomatic API, Python-specific extensions, postprocessors. **Commands**: maturin develop, pytest, ruff format/check. **Critical**: Core logic lives in Rust. Only Python code for bindings, Python-specific extensions, or API wrappers. If core logic needed, c
______________________________________________________________________ ## priority: medium # Developer Quick Start ## Prerequisites - Rust 1.75+ (stable or nightly for WASM) - Python 3.10+ with uv package manager - Node.js 18+ with pnpm ≥10.17 - Ruby 3.2+ with rbenv - PHP 8.2+ with Composer - Go 1.25+ (optional, for Go binding) - Java JDK 22+ (optional, for Java binding) - .NET 8.0+ SDK (optional, for C# binding) - Task (task runner) - prek (pre-commit hook manager) ## Quick Setup ```bash