.remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/performance-and-benchmarking-standards/SKILL.md
______________________________________________________________________ ## priority: high # Performance & Benchmarking Standards **Criterion.rs · Flamegraph profiling · CI regression detection · Zero-copy optimization** ## Benchmarking Framework - **Criterion.rs**: Primary benchmark framework for Rust; use for all performance-critical paths - Benchmarks in `benches/` directory with naming: `<module>_bench.rs` - Track latency (mean, std dev), throughput, memory allocations; save baseline resu
npx skillsauth add kreuzberg-dev/html-to-markdown .remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/performance-and-benchmarking-standardsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Criterion.rs · Flamegraph profiling · CI regression detection · Zero-copy optimization
benches/ directory with naming: <module>_bench.rscargo flamegraph --bin myapp -- <args> for CPU profiling; analyze hotspotsperf record and perf report; use with debug = true in release buildsMIRIFLAGS=-Zmiri-preemption-rate=0 cargo +nightly miri test for UB detectionCow<T>, Arc<T> for shared data; avoid cloning#[bench] and measure heap allocations&str over String; use String::with_capacity() when growingvalgrind --tool=massif (Linux) or Instruments (macOS)use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_parsing(c: &mut Criterion) {
c.bench_function("parse_json_1kb", |b| {
b.iter(|| {
let data = black_box(r#"{"key": "value"}"#);
serde_json::from_str::<Value>(data)
})
});
}
criterion_group!(benches, bench_parsing);
criterion_main!(benches);
// GOOD: Use Cow for conditional ownership
fn process_data(input: &[u8]) -> Cow<'_, [u8]> {
if needs_modification(input) {
Cow::Owned(modify(input.to_vec()))
} else {
Cow::Borrowed(input)
}
}
// GOOD: Share large data with Arc
use std::sync::Arc;
let data = Arc::new(expensive_computation());
let clone1 = Arc::clone(&data); // Cheap, reference counted
let clone2 = Arc::clone(&data);
struct Parser {
buffer: Vec<u8>,
}
impl Parser {
fn new() -> Self {
Parser {
buffer: Vec::with_capacity(4096), // Pre-allocate
}
}
fn parse(&mut self, input: &[u8]) -> Result<()> {
self.buffer.clear(); // Reuse, don't reallocate
self.buffer.extend_from_slice(input);
// Process buffer
Ok(())
}
}
for x in items { let copy = x.clone(); } is wasteful; use referencess += &other is O(n²); use Vec<String> + join() insteadcargo bench --no-run to PR checks to catch compilation errors earlycargo-criterion for stable baselines across CI runs.criterion/ directory in repotools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e