.remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/workspace-dependency-management/SKILL.md
______________________________________________________________________ ## priority: critical # Workspace Dependency Management ## Cargo Workspace Fundamentals A workspace coordinates multiple crates under unified configuration. This is **critical** for polyglot projects with core library + language bindings. **workspace/Cargo.toml**: ```toml [workspace] members = [ "crates/html-to-markdown", # Core library "crates/html-to-markdown-py", # PyO3 bindings "crates/h
npx skillsauth add kreuzberg-dev/html-to-markdown .remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/workspace-dependency-managementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A workspace coordinates multiple crates under unified configuration. This is critical for polyglot projects with core library + language bindings.
workspace/Cargo.toml:
[workspace]
members = [
"crates/html-to-markdown", # Core library
"crates/html-to-markdown-py", # PyO3 bindings
"crates/html-to-markdown-node", # NAPI-RS bindings
"crates/html-to-markdown-rb", # Magnus bindings
"crates/html-to-markdown-php", # PHP extension
"crates/html-to-markdown-wasm", # WebAssembly
"crates/html-to-markdown-ffi", # C FFI library
"crates/html-to-markdown-cli", # CLI binary
]
resolver = "2" # Always use v2 for modern dependency resolution
[workspace.package]
version = "0.5.0" # Single source of truth
authors = ["Team"]
edition = "2021"
rust-version = "1.70"
Golden Rule: Core library and all bindings must have the same version number.
Problem: Manual version updates across 8+ Cargo.toml files leads to inconsistency.
Solution: Use workspace.package version inheritance + sync script.
In each crate's Cargo.toml:
[package]
name = "html-to-markdown-py"
version.workspace = true # Inherit from workspace
authors.workspace = true
edition.workspace = true
rust-version.workspace = true
Version sync script (scripts/sync_versions.py):
#!/usr/bin/env python3
import tomllib
import toml
from pathlib import Path
def sync_versions(workspace_root: Path, new_version: str):
"""Sync version across all crates in workspace"""
workspace_toml = workspace_root / "Cargo.toml"
# Update workspace version
with open(workspace_toml, "rb") as f:
data = tomllib.load(f)
data["workspace"]["package"]["version"] = new_version
with open(workspace_toml, "w") as f:
toml.dump(data, f)
# Update all crates (non-workspace members)
for crate_dir in (workspace_root / "crates").iterdir():
if crate_dir.is_dir():
crate_toml = crate_dir / "Cargo.toml"
if crate_toml.exists():
with open(crate_toml, "rb") as f:
data = tomllib.load(f)
if "version" in data.get("package", {}):
data["package"]["version"] = new_version
with open(crate_toml, "w") as f:
toml.dump(data, f)
print(f"Synced all crates to version {new_version}")
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: sync_versions.py <new_version>")
sys.exit(1)
sync_versions(Path.cwd(), sys.argv[1])
Usage:
./scripts/sync_versions.py 0.6.0
cargo update -w # Update workspace lockfile
git add -A && git commit -m "chore: bump version to 0.6.0"
Binding crates depend on core via path dependency:
# crates/html-to-markdown-py/Cargo.toml
[dependencies]
html-to-markdown = { path = "../html-to-markdown", version = "0.5.0" }
pyo3 = { version = "0.20", features = ["extension-module"] }
Why version constraint + path?
Define in workspace.package:
[workspace.package]
rust-version = "1.70"
Update workflow:
rust-version in workspace Cargo.tomlcargo +1.70 test to verifyCI workflow for MSRV:
- name: Test MSRV
run: |
rustup install 1.70
cargo +1.70 test --all-features
Be explicit with version ranges:
# BAD: Too permissive
pyo3 = "*"
tokio = "1"
# GOOD: Explicit ranges
pyo3 = "0.20" # Patch updates OK
tokio = "1.35" # Patch updates OK (1.35.x)
thiserror = "1.0" # Conservative
# Exact versions for unstable features
napi = "= 2.13.0"
Prevent duplicate dependency trees by centralizing versions:
[workspace.dependencies]
tokio = { version = "1.35", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
thiserror = "1.0"
tracing = "0.1"
Each crate imports from workspace:
[dependencies]
tokio = { workspace = true, features = ["rt-multi-thread"] }
serde = { workspace = true }
Commit Cargo.lock for reproducible builds:
git add Cargo.lock
git commit -m "chore: update lockfile"
This ensures:
Common pitfall: Forgetting to add crate to workspace members.
Verify workspace integrity:
cargo metadata --format-version 1 | jq '.workspace_members'
Should list all crates. If missing:
[workspace]
members = [
"crates/html-to-markdown",
"crates/html-to-markdown-py",
# ... add missing member here
]
Problem: Circular dependencies between crates in workspace.
Solution: Clearly defined dependency graph.
html-to-markdown (core library)
├── html-to-markdown-py (depends on core)
├── html-to-markdown-node (depends on core)
├── html-to-markdown-rb (depends on core)
├── html-to-markdown-ffi (depends on core)
└── html-to-markdown-cli (depends on core)
Bad structure (avoid):
- html-to-markdown depends on html-to-markdown-py
- html-to-markdown-py depends on html-to-markdown
# Circular!
# Build single crate
cargo build -p html-to-markdown-py
# Build all members
cargo build --all
# Build all but exclude certain platform bindings
cargo build --all --exclude html-to-markdown-wasm
# Test single member
cargo test -p html-to-markdown
Workspace-level features for conditional compilation:
[workspace.dependencies]
tokio = { version = "1.35", optional = true, features = ["full"] }
# In core library Cargo.toml
[package]
features = ["default"]
[features]
default = ["sync"]
async-runtime = ["tokio", "dep:tokio"]
ffi = []
Binding crates enable needed features:
# html-to-markdown-py/Cargo.toml
[dependencies]
html-to-markdown = { path = "../html-to-markdown", version = "0.5.0", features = ["sync"] }
Initial setup:
cargo new --lib crates/html-to-markdown
cargo new --lib crates/html-to-markdown-py
cargo new --lib crates/html-to-markdown-node
# ... etc
# Create workspace at root
echo '[workspace]
members = ["crates/*"]
resolver = "2"
[workspace.package]
version = "0.5.0"
' > Cargo.toml
Sync versions for 0.6.0 release:
./scripts/sync_versions.py 0.6.0
cargo test --all # Verify all members still work
cargo build --release --all
git add -A && git commit -m "chore: bump version to 0.6.0"
Mismatched versions across crates:
# BAD: Different versions
# html-to-markdown/Cargo.toml: version = "0.5.0"
# html-to-markdown-py/Cargo.toml: version = "0.4.9"
# GOOD: Use workspace.package inheritance
version.workspace = true
Circular dependencies:
# BAD: Core depends on binding
# html-to-markdown/Cargo.toml:
html-to-markdown-py = { path = "../html-to-markdown-py" }
# GOOD: Only bindings depend on core
Uncommitted Cargo.lock:
# BAD: Cargo.lock in .gitignore
# GOOD: Commit for reproducibility
git add Cargo.lock
Too many nested workspaces:
# BAD: Nested workspaces confuse resolution
# /Cargo.toml: workspace with members
# /crates/sub/Cargo.toml: another workspace!
# GOOD: Single root workspace
tools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e