.remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/monitoring-observability-standards/SKILL.md
______________________________________________________________________ ## priority: medium # Monitoring & Observability Standards ## Structured Logging with Tracing Crate - **Use `tracing` crate**: Unified logging, tracing, and metrics in Rust - **Structured events**: key=value pairs instead of f-string format - **Span context**: Automatic propagation of request IDs, user info, etc. - **Multiple subscribers**: Layer logs, metrics, and distributed traces together Basic setup in Rust: ```rus
npx skillsauth add kreuzberg-dev/html-to-markdown .remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/monitoring-observability-standardsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
tracing crate: Unified logging, tracing, and metrics in RustBasic setup in Rust:
use tracing::{debug, info, warn, error, instrument, span, Level};
use tracing_subscriber::{fmt, prelude::*};
#[tokio::main]
async fn main() {
// Initialize tracing with multiple layers
tracing_subscriber::registry()
.with(
fmt::layer()
.with_writer(std::io::stdout)
.with_target(true)
.with_thread_ids(true)
.with_level(true)
.json() // Structured JSON output for log aggregation
)
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "info".parse().unwrap())
)
.init();
info!("Application starting", version = env!("CARGO_PKG_VERSION"));
// Run application
run().await;
}
#[instrument(skip(client), fields(user_id = %user_id))]
async fn process_request(user_id: i64, client: &HttpClient) -> Result<String> {
debug!("Processing user request");
let response = client.get(&format!("/users/{}", user_id))
.await
.map_err(|e| {
error!(error = ?e, "Failed to fetch user");
e
})?;
info!(status = response.status().as_u16(), "Request successful");
Ok(response.text().await?)
}
#[instrument] macro on functionsuse tracing::{Span, span, Level};
#[instrument(skip(db, cache))]
async fn fetch_user_data(
user_id: i64,
db: &Database,
cache: &Cache,
) -> Result<User> {
// Span created automatically from function name
// Fields: user_id added to span context
let cache_span = span!(Level::DEBUG, "cache_lookup", user_id = %user_id);
let _enter = cache_span.enter();
if let Some(user) = cache.get(user_id).await {
debug!("Cache hit");
return Ok(user);
}
drop(_enter); // Exit cache span
// Switch to database lookup
let db_span = span!(Level::DEBUG, "db_lookup", user_id = %user_id);
let _enter = db_span.enter();
let user = db.query_user(user_id).await?;
cache.set(user_id, user.clone()).await;
info!("User loaded from database");
Ok(user)
}
Best practices:
// ERROR: User made invalid request or system failure
error!(error = ?e, "Failed to process payment");
// WARN: Degraded service or unusual state
warn!(deprecated_field = true, "Using deprecated API");
// INFO: Progress milestones
info!(user_count = count, "Database migration completed");
// DEBUG: Entry/exit and important values
debug!("Entering user validation");
debug!(validation_result = ?result, "Validation complete");
// TRACE: Detailed loops and assignments (usually disabled)
trace!(index = i, value = ?item, "Processing item");
Use structlog for structured logging in Python:
import structlog
from typing import Any
# Configure structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
],
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
cache_logger_on_first_use=True,
)
logger = structlog.get_logger()
def process_request(user_id: int) -> dict[str, Any]:
logger.msg("Processing user request", user_id=user_id)
try:
result = fetch_user_data(user_id)
logger.msg("Request successful", user_id=user_id, status=200)
return result
except Exception as e:
logger.exception("Request failed", user_id=user_id, error=str(e))
raise
Rust with prometheus crate:
use prometheus::{
Counter, CounterVec, Gauge, Histogram, HistogramVec, Registry,
};
use once_cell::sync::Lazy;
// Define metrics
pub static REQUEST_COUNTER: Lazy<CounterVec> = Lazy::new(|| {
CounterVec::new(
prometheus::Opts::new("http_requests_total", "Total HTTP requests"),
&["method", "status"],
).unwrap()
});
pub static REQUEST_DURATION: Lazy<HistogramVec> = Lazy::new(|| {
HistogramVec::new(
prometheus::HistogramOpts::new(
"http_request_duration_seconds",
"HTTP request duration in seconds"
),
&["method", "path"],
).unwrap()
});
pub static ACTIVE_CONNECTIONS: Lazy<Gauge> = Lazy::new(|| {
Gauge::new("connections_active", "Active connections").unwrap()
});
// Use metrics in handler
#[instrument(skip(db))]
async fn handle_request(method: &str, path: &str, db: &Database) -> Result<Response> {
let timer = REQUEST_DURATION.with_label_values(&[method, path]).start_timer();
ACTIVE_CONNECTIONS.inc();
let result = process_request(db).await;
ACTIVE_CONNECTIONS.dec();
let status = if result.is_ok() { "200" } else { "500" };
REQUEST_COUNTER.with_label_values(&[method, status]).inc();
timer.observe_duration();
result
}
Prometheus scrape config (prometheus.yml):
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'myapp'
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- source_labels: [__address__]
target_label: instance
use ddtrace::Tracer;
let tracer = Tracer::new();
#[instrument(skip(db))]
async fn fetch_data(db: &Database) {
// Automatically traced and sent to Datadog
}
use newrelic::{Config, App};
let config = Config::new("My App", "license-key");
let app = App::new(config)?;
let txn = app.start_transaction("fetch_user");
// Work happens here
txn.notice_error(error);
use opentelemetry_gcp::trace::CloudTraceExporter;
use opentelemetry_sdk::trace::TracerProvider;
let exporter = CloudTraceExporter::new();
let provider = TracerProvider::builder()
.with_batch_exporter(exporter)
.build();
let tracer = provider.tracer("myapp");
Always expose a /health endpoint for orchestrators:
#[get("/health")]
async fn health(db: &State<Database>) -> Json<HealthStatus> {
let db_healthy = db.ping().await.is_ok();
let status = if db_healthy { "healthy" } else { "unhealthy" };
Json(HealthStatus {
status,
checks: json!({
"database": db_healthy,
"uptime_seconds": uptime(),
}),
})
}
// Kubernetes liveness probe
# pod.yaml
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
Propagate trace IDs across services using W3C Trace Context:
use http::HeaderMap;
use opentelemetry::api::TraceContextPropagator;
async fn call_downstream_service(
headers: &HeaderMap,
client: &HttpClient,
) -> Result<Response> {
// Extract parent trace context from incoming request
let propagator = TraceContextPropagator::new();
let parent_context = propagator.extract(headers);
let span = span!(
parent: &parent_context,
tracing::Level::DEBUG,
"downstream_call",
service = "user-service"
);
let _enter = span.enter();
// Inject trace context into outgoing request
let mut outgoing_headers = HeaderMap::new();
propagator.inject_context(&parent_context, &mut outgoing_headers);
client.get("/api/users")
.headers(outgoing_headers)
.send()
.await
}
f-string logging: Use structured key=value instead
// BAD
info!("User {} processed with status {}", user_id, status);
// GOOD
info!("User processed", user_id = user_id, status = status);
No span context: Always wrap request handling in spans with request ID
Synchronous logging in hot path: Use async subscribers in high-throughput services
Hardcoded log levels: Respect environment variable configuration
Logging sensitive data: Never log passwords, tokens, PII without redaction
No metrics: Always instrument critical paths (requests, errors, latency)
High cardinality labels: Avoid unbounded label values (user_id in label)
// BAD: Unbounded cardinality
REQUEST_COUNTER.with_label_values(&[method, &user_id.to_string()]).inc();
// GOOD: Fixed cardinality
REQUEST_COUNTER.with_label_values(&[method, "success"]).inc();
No health checks: Orchestrators can't detect unhealthy instances
Sampling off: Use tail-based sampling in production for cost
tools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e