.remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/containerization-docker-standards/SKILL.md
______________________________________________________________________ ## priority: medium # Containerization & Docker Standards ## Multi-Stage Docker Builds for Rust - **Builder stage**: Compile with full toolchain + dependencies - **Runtime stage**: Minimal base with only compiled binary - **Cache optimization**: Leverage layer caching by copying dependencies before source code - **Size reduction**: 1GB+ builder → 50-100MB runtime image Example Dockerfile for Rust projects: ```dockerfile
npx skillsauth add kreuzberg-dev/html-to-markdown .remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/containerization-docker-standardsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Example Dockerfile for Rust projects:
# ============================================================================
# Stage 1: Builder
# ============================================================================
FROM rust:1.75-alpine AS builder
WORKDIR /build
# Install build dependencies
RUN apk add --no-cache \
musl-dev \
pkgconfig \
openssl-dev \
libssl3
# Copy manifests and lock files
COPY Cargo.toml Cargo.lock ./
COPY crates/ ./crates/
# Build with optimizations (single monolithic binary)
RUN RUSTFLAGS="-C target-cpu=native -C lto=fat -C codegen-units=1" \
cargo build --release --target x86_64-unknown-linux-musl
# ============================================================================
# Stage 2: Runtime
# ============================================================================
FROM alpine:3.18
# Install minimal runtime dependencies (no build tools)
RUN apk add --no-cache \
libssl3 \
ca-certificates \
tini
# Create non-root user
RUN addgroup -g 1000 appgroup && \
adduser -D -u 1000 -G appgroup appuser
WORKDIR /app
# Copy binary from builder
COPY --from=builder --chown=appuser:appgroup \
/build/target/x86_64-unknown-linux-musl/release/html-to-markdown \
/app/html-to-markdown
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD /app/html-to-markdown --version || exit 1
USER appuser
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["/app/html-to-markdown"]
Integration in CI:
name: Container Security
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t myapp:latest .
- name: Scan with Trivy
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:latest'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Fail on high severity
run: |
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image --severity HIGH,CRITICAL myapp:latest
Alpine Linux (5MB base):
alpine:3.18 with clear version pinDistroless (Google; ~20MB):
Scratch (0 bytes):
--target x86_64-unknown-linux-muslExample using distroless:
FROM rust:1.75 AS builder
WORKDIR /build
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl
FROM gcr.io/distroless/base-debian12
COPY --from=builder /build/target/x86_64-unknown-linux-musl/release/myapp /app
ENTRYPOINT ["/app"]
Example using scratch:
FROM rust:1.75 AS builder
WORKDIR /build
COPY . .
RUN RUSTFLAGS="-C target-feature=+crt-static" \
cargo build --release --target x86_64-unknown-linux-musl
FROM scratch
COPY --from=builder /build/target/x86_64-unknown-linux-musl/release/myapp /
ENTRYPOINT ["/myapp"]
Order matters: Put stable layers first
# GOOD: Dependency layer cached until Cargo.lock changes
FROM rust:1.75 AS builder
WORKDIR /build
COPY Cargo.toml Cargo.lock ./
RUN cargo fetch # Downloads all dependencies
COPY src/ ./src/
RUN cargo build --release
# BAD: Rebuilds dependencies on every source change
COPY . .
RUN cargo build --release
BuildKit inline cache: More efficient than standard Docker
docker buildx build --push \
--cache-from=type=registry,ref=myregistry/myapp:cache \
--cache-to=type=registry,ref=myregistry/myapp:cache,mode=max \
-t myregistry/myapp:latest .
Mount cache volumes (BuildKit):
# Use cached cargo registry across builds
RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/build/target \
cargo build --release
- name: Build and push
uses: docker/build-push-action@v5
with:
push: ${{ github.event_name == 'push' }}
tags: myregistry/myapp:latest
cache-from: type=gha
cache-to: type=gha,mode=max
myregistry/myapp:1.2.3myregistry/myapp:latest (use sparingly, tag after release)myregistry/myapp:main, myregistry/myapp:developmyregistry/myapp:abc1234f (for traceability)# Tag one build with multiple tags
docker build -t myapp:1.2.3 -t myapp:latest -t myapp:main .
docker push myapp:1.2.3 && docker push myapp:latest && docker push myapp:main
Example docker-compose.yml:
version: '3.9'
services:
app:
build:
context: .
dockerfile: Dockerfile.dev
environment:
RUST_LOG: debug
DATABASE_URL: postgresql://user:pass@postgres:5432/db
ports:
- "3000:3000"
depends_on:
postgres:
condition: service_healthy
volumes:
- .:/app
command: cargo run
postgres:
image: postgres:15-alpine
environment:
POSTGRES_PASSWORD: password
POSTGRES_DB: db
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
tini or dumb-init as PID 1--read-only when possibledocker build --secret mytokentools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e