Python Library Usage Guidelines

Overview

Standardized library choices for all Python projects. Ensures consistency, avoids reinventing the wheel, and enforces preferred tools over alternatives.

General Rule: Always prefer existing library functions over custom implementations.

Library Selection Policy:

The first entry in each list is the preferred option.
Alternatives may be used only if they offer specific advantages or required functionality.
Propose new functions to our internal libraries (lib_*) if needed — we can change them.

Quick Reference Table

| Category | Preferred Library |----------------------|---------------------- | CLI exit/traceback | lib_cli_exit_tools | Logging (apps) | lib_log_rich | Logging (libs) | logging stdlib | Terminal output | rich | TUI | textual | Domain models | dataclasses | Boundary validation | pydantic | Enums | IntEnum | TOML | rtoml | JSON | orjson | YAML | ruamel.yaml | HTTP | httpx2 | Compression (stream) | isal | Compression (store) | libdeflate | Paths | pathlib.Path | .env files | python-dotenv | Config management | lib_layered_config | Database (ODBC) | pyodbc | Database (MySQL) | mysql-connector | ORM / query builder | SQLAlchemy | Testing | pytest | Date/time | datetime | Subprocess | subprocess.run | Type checking | mypy | Do NOT Use | -------------------|--------------------------------------| | manual sys.exit | | print(), bare logging | | lib_log_rich | | colorama (fallback only) | | curses | | dict, Pydantic | | manual parsing | / StrEnum | plain Enum, magic strings | | tomllib, tomli | | json stdlib | | PyYAML | | requests, httpx | | gzip stdlib | | gzip stdlib | | os.path | | manual parsing | | manual argparse + env | | raw ODBC bindings | -python / SQLAlchemy | PyMySQL, mysqlclient | | custom ORM, raw SQL for complex apps | | unittest | + zoneinfo | pytz | | os.system | | no type checking |

CLI Tools

lib_cli_exit_tools — for handling CLI exit codes, tracebacks, and related utilities. Propose new functions if needed.

Logging

| Context | Library | Configuration | |------------------------|--------------------|-------------------------------------------------| | Applications | lib_log_rich | Full structured logging with rich output | | Libraries/Packages | logging (stdlib) | logging.getLogger(__name__) + NullHandler() |

Why the distinction:

Libraries must not configure logging (leave that to the consuming application)
Libraries install NullHandler to prevent "no handler found" warnings
Applications control output format, level, and destination via lib_log_rich

Library pattern:

# src/<pkg>/__init__.py
import logging
logging.getLogger(__name__).addHandler(logging.NullHandler())

Application pattern (basic):

import lib_log_rich
from lib_log_rich.runtime import attach_std_logging

# Initialize with RuntimeConfig
lib_log_rich.init(lib_log_rich.RuntimeConfig(
    service="my-app",
    environment="dev",
    console_level="INFO",      # INFO and above to console
    backend_level="WARNING",   # WARNING and above to journald/EventLog
))

# Bridge stdlib logging so existing logging.* calls work
attach_std_logging()

# Use lib_log_rich's LoggerProxy for new code
logger = lib_log_rich.getLogger(__name__)
logger.info("Application started", extra={"port": 8080})

# Or use stdlib logging (bridged automatically)
import logging
logging.info("This also works!")

# Clean shutdown (flushes queues, closes adapters)
lib_log_rich.shutdown()

Application pattern (with lib_layered_config):

# logging_setup.py - centralized logging initialization
from lib_layered_config import Config
import lib_log_rich.config
import lib_log_rich.runtime

def init_logging(config: Config) -> None:
    """Initialize logging from layered config [lib_log_rich] section."""
    if not lib_log_rich.runtime.is_initialised():
        lib_log_rich.config.enable_dotenv()  # optional: load LOG_* from .env
        log_config = dict(config.get("lib_log_rich", default={}))
        log_config.setdefault("service", "my-app")
        log_config.setdefault("environment", "prod")
        runtime_config = lib_log_rich.runtime.RuntimeConfig(**log_config)
        lib_log_rich.runtime.init(runtime_config)
        lib_log_rich.runtime.attach_std_logging()

Key RuntimeConfig options:

service (required): Logical service name
environment (required): Deployment label (dev/stage/prod)
console_level: Threshold for Rich console (default: INFO)
backend_level: Threshold for journald/EventLog (default: WARNING)
queue_enabled: Background queue for async delivery (default: True)
enable_journald/enable_eventlog/enable_graylog: Backend adapters

Shutdown (required for clean exit):

# Synchronous shutdown - flushes queues, drains adapters, clears state
lib_log_rich.shutdown()

# Async shutdown - for asyncio applications
await lib_log_rich.shutdown_async()

Important: Always call shutdown() before process exit to ensure:

Background queue is drained (pending log events are delivered)
Graylog/journald/EventLog adapters flush their buffers
Global runtime state is cleared

Safe to call multiple times after initialization.

Propose new functions to lib_log_rich if needed — we can change that library.

Colored Terminal Output

rich — preferred for beautiful, formatted console output.
colorama — may be used if rich is unavailable or not suitable.

Text-based User Interfaces (TUI)

textual — for building rich, interactive TUIs in the terminal.

Data Models: Dataclasses vs Pydantic

Rule: Prefer structured types (dataclasses or Pydantic) over dicts. Data flowing between modules must use typed models, not raw dicts.

Decision Matrix

| Use Case | Recommended | Why | |---------------------------------|-------------------------|--------------------------------| | Domain entities & value objects | dataclasses | Keep domain dependency-free | | Internal DTOs (trusted data) | dataclasses | Lightweight, no parsing needed | | External input validation | pydantic.BaseModel | Type coercion, sanitization | | JSON/dict serialization | pydantic.BaseModel | Built-in .model_dump() | | Configuration from env/files | pydantic.BaseSettings | Validation + env parsing | | Performance-critical paths | dataclasses | Lower overhead |

Preference Order

dataclasses (stdlib) — default for domain/application layer.
- Always use @dataclass(frozen=True, slots=True) for immutability and efficiency.
- Use for internal data that doesn't need validation or format conversion.
pydantic.BaseModel — use at boundaries where data enters/exits the system.
- Parsing untrusted input (HTTP requests, CLI args, config files).
- Type coercion needed (string -> int, string -> datetime).
- Serialization to JSON, dict, or other formats required.
- Use model_config = ConfigDict(frozen=True) for immutability.
pydantic.dataclasses — when you want dataclass syntax with Pydantic validation.
- Useful for gradual migration from stdlib dataclasses.

Dict Usage Policy

Default: Use typed structures (dataclass, Pydantic) for all data.

Dicts are acceptable only when ALL of these apply:

Truly dynamic/schema-less data (e.g., arbitrary JSON passthrough with no business logic)
Fewer than 3 keys
No cross-module flow (contained within a single function/helper)
No business logic operates on the dict

Never acceptable:

Passing dict[str, Any] between modules or layers
Using dict keys in conditionals or business logic
Dict as function return type for structured data

When in doubt: Convert to a typed structure at the boundary.

Anti-patterns

Passing dict[str, Any] between modules
Using Pydantic in the domain layer
Using dataclasses for untrusted external input
Mixing dict-based and model-based data in the same flow

Data Flow Pattern

External Input -> Pydantic (validate) -> Dataclass (domain logic) -> Pydantic (serialize) -> External Output

Enums: Prefer IntEnum

Rule: Replace string literals representing categories, statuses, modes, event types, or command names with Enum classes. Prefer IntEnum over Enum where possible.

Why IntEnum?

| Feature | Enum | IntEnum | |------------------------|---------------------------------|------------------------------------| | JSON serialization | Requires custom encoder | Works natively (serializes as int) | | Database storage | Needs explicit conversion | Stores as integer directly | | Comparison with int | Status.ACTIVE == 1 -> False | Status.ACTIVE == 1 -> True | | Arithmetic operations | Not supported | Supported (inherits from int) | | Pydantic compatibility | Works | Works + auto-coerces from int |

When to use each

| Use Case | Recommended | |-------------------------------------------------|---------------------| | Status codes, error codes, log levels | IntEnum | | Database-backed enumerations | IntEnum | | API response codes | IntEnum | | Purely semantic categories (no numeric meaning) | Enum or StrEnum | | String-based identifiers (e.g., HTTP methods) | StrEnum |

Example

from enum import IntEnum, StrEnum, auto

# Preferred for numeric categories
class TaskStatus(IntEnum):
    PENDING = 0
    RUNNING = 1
    COMPLETED = 2
    FAILED = 3

# Works seamlessly with JSON, databases, comparisons
assert TaskStatus.RUNNING == 1  # True
assert TaskStatus.RUNNING > TaskStatus.PENDING  # True

# Use StrEnum for string-based identifiers
class HttpMethod(StrEnum):
    GET = auto()
    POST = auto()
    PUT = auto()
    DELETE = auto()

Anti-patterns

Using plain Enum when values need JSON serialization
Magic strings: if status == "pending" instead of if status == Status.PENDING
Magic numbers: if code == 1 instead of if code == ErrorCode.INVALID_INPUT

Serialization Formats

TOML

Use rtoml for all TOML parsing operations.

Preferred over stdlib tomllib for consistency across Python versions
Preferred over tomli (third-party read-only)
Provides both read and write capabilities

import rtoml

# Reading from file
config = rtoml.load(path)          # path as str or Path

# Writing to file
rtoml.dump(data, path)

# Reading from string
config = rtoml.loads(toml_string)

# Writing to string
toml_string = rtoml.dumps(data)

JSON

Use orjson for all JSON serialization/deserialization.

Significantly faster than stdlib json (10-50x for large payloads)
Native support for dataclasses, datetime, UUID, numpy arrays
Returns bytes (not str) — more efficient for I/O
Strict by default (no NaN/Infinity, sorted keys optional)

import orjson

# Reading
data = orjson.loads(json_bytes)  # accepts bytes or str

# Writing
json_bytes = orjson.dumps(data)

# With options (pretty print, sorted keys, datetime as ISO)
json_bytes = orjson.dumps(
    data,
    option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS
)

# Dataclass serialization (native support)
from dataclasses import dataclass
from datetime import datetime, timezone

@dataclass
class User:
    name: str
    created: datetime

orjson.dumps(User("Alice", datetime.now(timezone.utc)))  # works directly

YAML

Use ruamel.yaml for all YAML operations.

Preserves comments and formatting on round-trip (read -> modify -> write)
Full YAML 1.2 support (unlike PyYAML which only supports YAML 1.1)
Preferred over PyYAML

from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True

# Reading
with open("config.yaml") as f:
    data = yaml.load(f)

# Writing (preserves comments and formatting)
with open("config.yaml", "w") as f:
    yaml.dump(data, f)

HTTP/REST Calls

Use httpx2 for all HTTP requests. Do not use requests or the original httpx.

httpx2 is the actively maintained, Pydantic-stewarded continuation of httpx (drop-in API; only the import name changes)
Modern async-first design with sync support
HTTP/1.1 and HTTP/2 support
Connection pooling and keep-alive by default
Type-annotated API
Timeout configuration required (no silent hangs)

import httpx2

# Synchronous usage
response = httpx2.get("https://api.example.com/data", timeout=10.0)
data = response.json()

# With client (recommended for multiple requests)
with httpx2.Client(timeout=30.0) as client:
    response = client.post(
        "https://api.example.com/submit",
        json={"key": "value"},
    )

# Async usage
async with httpx2.AsyncClient(timeout=30.0) as client:
    response = await client.get("https://api.example.com/data")

Required patterns:

Always specify timeout (never rely on defaults)
Use Client/AsyncClient context managers for connection reuse
Prefer async in async codebases

Database Access

Decision Matrix

| Use Case | Library | Notes | |-------------------------------|--------------------------|-------------------------------------------------------| | ODBC connections (any DB) | pyodbc | Most widely used, stable, works with any ODBC driver | | MySQL (simple tasks) | mysql-connector-python | Official Oracle driver, pure Python, no external libs | | MySQL (ORM / complex) | SQLAlchemy | Use mysql-connector-python as MySQL backend | | Async database access | SQLAlchemy (async) | With asyncio extension + async driver |

Usage Patterns

# Direct ODBC connection (any database with ODBC driver)
import pyodbc

conn = pyodbc.connect(connection_string)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
rows = cursor.fetchall()
conn.close()

# Direct MySQL connection (simple tasks)
import mysql.connector

conn = mysql.connector.connect(
    host="localhost", database="mydb", user="user", password="pass"
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
rows = cursor.fetchall()
conn.close()

# SQLAlchemy with mysql-connector-python backend (ORM / complex queries)
from sqlalchemy import create_engine

engine = create_engine(
    "mysql+mysqlconnector://user:pass@host/db",
    pool_pre_ping=True,
    pool_recycle=3600,
)

Rules

Use pyodbc for ad-hoc or cross-database ODBC access
Use mysql-connector-python directly for simple MySQL tasks
Use SQLAlchemy (with mysql-connector-python as backend) for ORM, migrations, or complex queries
Always use parameterized queries — never string-format SQL
Always configure connection pooling for long-running applications

Testing

| Component | Library | Notes | |----------------------|-------------------|-------------------------------------------| | Test framework | pytest | Preferred over unittest | | Coverage | pytest-cov | Coverage reporting integrated with pytest | | Mocking | unittest.mock | stdlib — no extra dependency needed | | Async tests | pytest-asyncio | For testing async code | | Parameterization | pytest built-in | @pytest.mark.parametrize | | Fixtures | pytest built-in | Prefer fixtures over setUp/tearDown |

Rules:

Use pytest style (functions + fixtures), not unittest style (classes + setUp/tearDown)
Use pytest.raises for exception testing, not try/except
Name test files test_*.py, test functions test_*
Use conftest.py for shared fixtures

Compression (gzip alternatives)

Use high-performance compression libraries instead of stdlib gzip.

| Use Case | Library | Why | |-------------------|------------------------|----------------------------------------------------------| | Web transport | isal (ISA-L / igzip) | Intel-optimized, 2-4x faster compression/decompression | | Storage | libdeflate | Higher compression ratios, optimized for single-shot ops |

# Web transport: ISA-L (igzip) — streaming, fast
import isal.igzip as igzip

# Compress for HTTP response
compressed = igzip.compress(data, compresslevel=1)  # level 0-3

# Decompress incoming
decompressed = igzip.decompress(compressed)

# Streaming with file-like objects
with igzip.open("file.gz", "wb") as f:
    f.write(data)

# Storage: libdeflate — single-shot, high ratio
import libdeflate

# Compress for archival (levels 1-12, default 6)
compressor = libdeflate.Compressor(compression_level=9)
compressed = compressor.compress(data)

# Decompress
decompressor = libdeflate.Decompressor()
decompressed = decompressor.decompress(compressed, len(original_data))

Selection guide:

Streaming / chunked data -> isal.igzip
HTTP compression -> isal.igzip (compatible with standard gzip)
Archive / backup -> libdeflate (better ratios)
Unknown decompressed size -> isal.igzip (handles streaming)

Date & Time

Use datetime (stdlib) with timezone awareness enforced. Use zoneinfo (stdlib) for named timezones.

Always use timezone-aware datetimes — never naive datetimes
Use zoneinfo (Python 3.9+) for named timezones — do not use pytz
Store/transmit as UTC; convert to local only for display

from datetime import datetime, timezone
from zoneinfo import ZoneInfo

# Always timezone-aware
now_utc = datetime.now(timezone.utc)
now_berlin = datetime.now(ZoneInfo("Europe/Berlin"))

# Convert between timezones
local_time = now_utc.astimezone(ZoneInfo("Europe/Berlin"))

# Parse ISO format
parsed = datetime.fromisoformat("2025-01-15T10:30:00+01:00")

# Format for storage/transmission (ISO 8601 UTC)
timestamp_str = now_utc.isoformat()

Anti-patterns

datetime.now() without timezone (creates naive datetime)
Using pytz (replaced by zoneinfo in stdlib since Python 3.9)
Storing naive datetimes in databases
Comparing naive and aware datetimes (raises TypeError)

Directory & Filepath Handling

Always use pathlib.Path and ensure cross-platform compatibility.

Critical Rule: Accept Linux-style paths (//share/directory, /path/to/file) even on Windows.

UNC paths (//server/share) must work on all platforms
Never use os.path for new code — prefer pathlib
Use Path.as_posix() when storing/transmitting paths
Use forward slashes (/) in configuration files and APIs

from pathlib import Path
import sys

def normalize_path(path_str: str) -> Path:
    """Convert any path string to a native Path, preserving UNC paths.

    Args:
        path_str: Path string (Linux or Windows style).

    Returns:
        Native Path object.

    Examples:
        >>> normalize_path('//server/share/file.txt')  # UNC path
        >>> normalize_path('/home/user/file.txt')      # Linux path
    """
    # On Windows, convert forward-slash UNC paths to native backslash form
    if sys.platform == "win32" and path_str.startswith("//"):
        return Path(path_str.replace("/", "\\"))
    return Path(path_str)

# For storage/transmission, always use POSIX style
def path_to_string(path: Path) -> str:
    """Convert Path to portable string representation."""
    return path.as_posix()

# Accept both styles in function signatures
def process_file(file_path: str | Path) -> None:
    path = Path(file_path) if isinstance(file_path, str) else file_path
    # ... process

Configuration & API Guidelines:

Store paths as POSIX strings (/path/to/file, //server/share)
Parse with Path() or normalize_path() at runtime
Never hardcode backslashes in source code or configs

.env File Reading

Use python-dotenv for loading .env files.

Use find_dotenv() to walk up parent directories automatically
Do not commit .env files to version control

from dotenv import load_dotenv, find_dotenv

# find_dotenv() walks up directories to find .env
load_dotenv(find_dotenv(usecwd=True))

# Or with explicit path
load_dotenv("/path/to/project/.env")

# Access loaded values
import os
db_host = os.getenv("DB_HOST", "localhost")

Integration with Pydantic BaseSettings (pip install pydantic-settings):

from pydantic import ConfigDict
from pydantic_settings import BaseSettings

class AppSettings(BaseSettings):
    db_host: str = "localhost"
    db_port: int = 5432
    debug: bool = False

    model_config = ConfigDict(env_file=".env", env_file_encoding="utf-8")

settings = AppSettings()  # auto-reads .env + environment variables

Configuration Management

Use lib_layered_config for application configuration.

Supports layered config: defaults -> config file -> environment variables -> CLI args
Integrates with lib_log_rich for logging setup (see Logging section)

For simpler cases or libraries, use pydantic.BaseSettings (see .env section above).

Subprocess / External Commands

Use subprocess.run (stdlib) for running external commands.

Always use list form for arguments, not shell=True
Set check=True to raise on non-zero exit codes
Set capture_output=True when you need stdout/stderr
Set timeout to prevent hanging

import subprocess

result = subprocess.run(
    ["git", "status", "--porcelain"],
    capture_output=True,
    text=True,
    check=True,
    timeout=30,
)
print(result.stdout)

Anti-patterns

shell=True with unsanitized input (command injection risk)
os.system() (no output capture, no error handling)
Missing timeout on external calls

Type Checking

Use mypy for static type checking.

Run as part of CI/CD pipeline
Use strict mode (--strict) for new projects
Use py.typed marker file for typed libraries (PEP 561)

# pyproject.toml
[tool.mypy]
python_version = "3.10"
strict = true
warn_return_any = true
warn_unused_configs = true

Python Library Usage Guidelines

Overview

Standardized library choices for all Python projects. Ensures consistency, avoids reinventing the wheel, and enforces preferred tools over alternatives.

General Rule: Always prefer existing library functions over custom implementations.

Library Selection Policy:

The first entry in each list is the preferred option.
Alternatives may be used only if they offer specific advantages or required functionality.
Propose new functions to our internal libraries (lib_*) if needed — we can change them.

Quick Reference Table

| Category | Preferred Library | Do NOT Use | |----------------------|-----------------------------------------|--------------------------------------| | CLI exit/traceback | lib_cli_exit_tools | manual sys.exit | | Logging (apps) | lib_log_rich | print(), bare logging | | Logging (libs) | logging stdlib | lib_log_rich | | Terminal output | rich | colorama (fallback only) | | TUI | textual | curses | | Domain models | dataclasses | dict, Pydantic | | Boundary validation | pydantic | manual parsing | | Enums | IntEnum / StrEnum | plain Enum, magic strings | | TOML | rtoml | tomllib, tomli | | JSON | orjson | json stdlib | | YAML | ruamel.yaml | PyYAML | | HTTP | httpx2 | requests, httpx | | Compression (stream) | isal | gzip stdlib | | Compression (store) | libdeflate | gzip stdlib | | Paths | pathlib.Path | os.path | | .env files | python-dotenv | manual parsing | | Config management | lib_layered_config | manual argparse + env | | Database (ODBC) | pyodbc | raw ODBC bindings | | Database (MySQL) | mysql-connector-python / SQLAlchemy | PyMySQL, mysqlclient | | ORM / query builder | SQLAlchemy | custom ORM, raw SQL for complex apps | | Testing | pytest | unittest | | Date/time | datetime + zoneinfo | pytz | | Subprocess | subprocess.run | os.system | | Type checking | mypy | no type checking |

CLI Tools

lib_cli_exit_tools — for handling CLI exit codes, tracebacks, and related utilities. Propose new functions if needed.

Logging

Why the distinction:

Libraries must not configure logging (leave that to the consuming application)
Libraries install NullHandler to prevent "no handler found" warnings
Applications control output format, level, and destination via lib_log_rich

Library pattern:

# src/<pkg>/__init__.py
import logging
logging.getLogger(__name__).addHandler(logging.NullHandler())

Application pattern (basic):

import lib_log_rich
from lib_log_rich.runtime import attach_std_logging

# Initialize with RuntimeConfig
lib_log_rich.init(lib_log_rich.RuntimeConfig(
    service="my-app",
    environment="dev",
    console_level="INFO",      # INFO and above to console
    backend_level="WARNING",   # WARNING and above to journald/EventLog
))

# Bridge stdlib logging so existing logging.* calls work
attach_std_logging()

# Use lib_log_rich's LoggerProxy for new code
logger = lib_log_rich.getLogger(__name__)
logger.info("Application started", extra={"port": 8080})

# Or use stdlib logging (bridged automatically)
import logging
logging.info("This also works!")

# Clean shutdown (flushes queues, closes adapters)
lib_log_rich.shutdown()

Application pattern (with lib_layered_config):

# logging_setup.py - centralized logging initialization
from lib_layered_config import Config
import lib_log_rich.config
import lib_log_rich.runtime

def init_logging(config: Config) -> None:
    """Initialize logging from layered config [lib_log_rich] section."""
    if not lib_log_rich.runtime.is_initialised():
        lib_log_rich.config.enable_dotenv()  # optional: load LOG_* from .env
        log_config = dict(config.get("lib_log_rich", default={}))
        log_config.setdefault("service", "my-app")
        log_config.setdefault("environment", "prod")
        runtime_config = lib_log_rich.runtime.RuntimeConfig(**log_config)
        lib_log_rich.runtime.init(runtime_config)
        lib_log_rich.runtime.attach_std_logging()

Key RuntimeConfig options:

service (required): Logical service name
environment (required): Deployment label (dev/stage/prod)
console_level: Threshold for Rich console (default: INFO)
backend_level: Threshold for journald/EventLog (default: WARNING)
queue_enabled: Background queue for async delivery (default: True)
enable_journald/enable_eventlog/enable_graylog: Backend adapters

Shutdown (required for clean exit):

# Synchronous shutdown - flushes queues, drains adapters, clears state
lib_log_rich.shutdown()

# Async shutdown - for asyncio applications
await lib_log_rich.shutdown_async()

Important: Always call shutdown() before process exit to ensure:

Background queue is drained (pending log events are delivered)
Graylog/journald/EventLog adapters flush their buffers
Global runtime state is cleared

Safe to call multiple times after initialization.

Propose new functions to lib_log_rich if needed — we can change that library.

Colored Terminal Output

rich — preferred for beautiful, formatted console output.
colorama — may be used if rich is unavailable or not suitable.

Text-based User Interfaces (TUI)

textual — for building rich, interactive TUIs in the terminal.

Data Models: Dataclasses vs Pydantic

Rule: Prefer structured types (dataclasses or Pydantic) over dicts. Data flowing between modules must use typed models, not raw dicts.

Decision Matrix

Preference Order

dataclasses (stdlib) — default for domain/application layer.
- Always use @dataclass(frozen=True, slots=True) for immutability and efficiency.
- Use for internal data that doesn't need validation or format conversion.
pydantic.BaseModel — use at boundaries where data enters/exits the system.
- Parsing untrusted input (HTTP requests, CLI args, config files).
- Type coercion needed (string -> int, string -> datetime).
- Serialization to JSON, dict, or other formats required.
- Use model_config = ConfigDict(frozen=True) for immutability.
pydantic.dataclasses — when you want dataclass syntax with Pydantic validation.
- Useful for gradual migration from stdlib dataclasses.

Dict Usage Policy

Default: Use typed structures (dataclass, Pydantic) for all data.

Dicts are acceptable only when ALL of these apply:

Truly dynamic/schema-less data (e.g., arbitrary JSON passthrough with no business logic)
Fewer than 3 keys
No cross-module flow (contained within a single function/helper)
No business logic operates on the dict

Never acceptable:

Passing dict[str, Any] between modules or layers
Using dict keys in conditionals or business logic
Dict as function return type for structured data

When in doubt: Convert to a typed structure at the boundary.

Anti-patterns

Passing dict[str, Any] between modules
Using Pydantic in the domain layer
Using dataclasses for untrusted external input
Mixing dict-based and model-based data in the same flow

Data Flow Pattern

External Input -> Pydantic (validate) -> Dataclass (domain logic) -> Pydantic (serialize) -> External Output

Enums: Prefer IntEnum

Rule: Replace string literals representing categories, statuses, modes, event types, or command names with Enum classes. Prefer IntEnum over Enum where possible.

Why IntEnum?

When to use each

Example

from enum import IntEnum, StrEnum, auto

# Preferred for numeric categories
class TaskStatus(IntEnum):
    PENDING = 0
    RUNNING = 1
    COMPLETED = 2
    FAILED = 3

# Works seamlessly with JSON, databases, comparisons
assert TaskStatus.RUNNING == 1  # True
assert TaskStatus.RUNNING > TaskStatus.PENDING  # True

# Use StrEnum for string-based identifiers
class HttpMethod(StrEnum):
    GET = auto()
    POST = auto()
    PUT = auto()
    DELETE = auto()

Anti-patterns

Using plain Enum when values need JSON serialization
Magic strings: if status == "pending" instead of if status == Status.PENDING
Magic numbers: if code == 1 instead of if code == ErrorCode.INVALID_INPUT

Serialization Formats

TOML

Use rtoml for all TOML parsing operations.

Preferred over stdlib tomllib for consistency across Python versions
Preferred over tomli (third-party read-only)
Provides both read and write capabilities

import rtoml

# Reading from file
config = rtoml.load(path)          # path as str or Path

# Writing to file
rtoml.dump(data, path)

# Reading from string
config = rtoml.loads(toml_string)

# Writing to string
toml_string = rtoml.dumps(data)

JSON

Use orjson for all JSON serialization/deserialization.

Significantly faster than stdlib json (10-50x for large payloads)
Native support for dataclasses, datetime, UUID, numpy arrays
Returns bytes (not str) — more efficient for I/O
Strict by default (no NaN/Infinity, sorted keys optional)

import orjson

# Reading
data = orjson.loads(json_bytes)  # accepts bytes or str

# Writing
json_bytes = orjson.dumps(data)

# With options (pretty print, sorted keys, datetime as ISO)
json_bytes = orjson.dumps(
    data,
    option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS
)

# Dataclass serialization (native support)
from dataclasses import dataclass
from datetime import datetime, timezone

@dataclass
class User:
    name: str
    created: datetime

orjson.dumps(User("Alice", datetime.now(timezone.utc)))  # works directly

YAML

Use ruamel.yaml for all YAML operations.

Preserves comments and formatting on round-trip (read -> modify -> write)
Full YAML 1.2 support (unlike PyYAML which only supports YAML 1.1)
Preferred over PyYAML

from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True

# Reading
with open("config.yaml") as f:
    data = yaml.load(f)

# Writing (preserves comments and formatting)
with open("config.yaml", "w") as f:
    yaml.dump(data, f)

HTTP/REST Calls

Use httpx2 for all HTTP requests. Do not use requests or the original httpx.

httpx2 is the actively maintained, Pydantic-stewarded continuation of httpx (drop-in API; only the import name changes)
Modern async-first design with sync support
HTTP/1.1 and HTTP/2 support
Connection pooling and keep-alive by default
Type-annotated API
Timeout configuration required (no silent hangs)

import httpx2

# Synchronous usage
response = httpx2.get("https://api.example.com/data", timeout=10.0)
data = response.json()

# With client (recommended for multiple requests)
with httpx2.Client(timeout=30.0) as client:
    response = client.post(
        "https://api.example.com/submit",
        json={"key": "value"},
    )

# Async usage
async with httpx2.AsyncClient(timeout=30.0) as client:
    response = await client.get("https://api.example.com/data")

Required patterns:

Always specify timeout (never rely on defaults)
Use Client/AsyncClient context managers for connection reuse
Prefer async in async codebases

Database Access

Decision Matrix

Usage Patterns

# Direct ODBC connection (any database with ODBC driver)
import pyodbc

conn = pyodbc.connect(connection_string)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
rows = cursor.fetchall()
conn.close()

# Direct MySQL connection (simple tasks)
import mysql.connector

conn = mysql.connector.connect(
    host="localhost", database="mydb", user="user", password="pass"
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
rows = cursor.fetchall()
conn.close()

# SQLAlchemy with mysql-connector-python backend (ORM / complex queries)
from sqlalchemy import create_engine

engine = create_engine(
    "mysql+mysqlconnector://user:pass@host/db",
    pool_pre_ping=True,
    pool_recycle=3600,
)

Rules

Use pyodbc for ad-hoc or cross-database ODBC access
Use mysql-connector-python directly for simple MySQL tasks
Use SQLAlchemy (with mysql-connector-python as backend) for ORM, migrations, or complex queries
Always use parameterized queries — never string-format SQL
Always configure connection pooling for long-running applications

Testing

Rules:

Use pytest style (functions + fixtures), not unittest style (classes + setUp/tearDown)
Use pytest.raises for exception testing, not try/except
Name test files test_*.py, test functions test_*
Use conftest.py for shared fixtures

Compression (gzip alternatives)

Use high-performance compression libraries instead of stdlib gzip.

# Web transport: ISA-L (igzip) — streaming, fast
import isal.igzip as igzip

# Compress for HTTP response
compressed = igzip.compress(data, compresslevel=1)  # level 0-3

# Decompress incoming
decompressed = igzip.decompress(compressed)

# Streaming with file-like objects
with igzip.open("file.gz", "wb") as f:
    f.write(data)

# Storage: libdeflate — single-shot, high ratio
import libdeflate

# Compress for archival (levels 1-12, default 6)
compressor = libdeflate.Compressor(compression_level=9)
compressed = compressor.compress(data)

# Decompress
decompressor = libdeflate.Decompressor()
decompressed = decompressor.decompress(compressed, len(original_data))

Selection guide:

Streaming / chunked data -> isal.igzip
HTTP compression -> isal.igzip (compatible with standard gzip)
Archive / backup -> libdeflate (better ratios)
Unknown decompressed size -> isal.igzip (handles streaming)

Date & Time

Use datetime (stdlib) with timezone awareness enforced. Use zoneinfo (stdlib) for named timezones.

Always use timezone-aware datetimes — never naive datetimes
Use zoneinfo (Python 3.9+) for named timezones — do not use pytz
Store/transmit as UTC; convert to local only for display

from datetime import datetime, timezone
from zoneinfo import ZoneInfo

# Always timezone-aware
now_utc = datetime.now(timezone.utc)
now_berlin = datetime.now(ZoneInfo("Europe/Berlin"))

# Convert between timezones
local_time = now_utc.astimezone(ZoneInfo("Europe/Berlin"))

# Parse ISO format
parsed = datetime.fromisoformat("2025-01-15T10:30:00+01:00")

# Format for storage/transmission (ISO 8601 UTC)
timestamp_str = now_utc.isoformat()

Anti-patterns

datetime.now() without timezone (creates naive datetime)
Using pytz (replaced by zoneinfo in stdlib since Python 3.9)
Storing naive datetimes in databases
Comparing naive and aware datetimes (raises TypeError)

Directory & Filepath Handling

Always use pathlib.Path and ensure cross-platform compatibility.

Critical Rule: Accept Linux-style paths (//share/directory, /path/to/file) even on Windows.

UNC paths (//server/share) must work on all platforms
Never use os.path for new code — prefer pathlib
Use Path.as_posix() when storing/transmitting paths
Use forward slashes (/) in configuration files and APIs

from pathlib import Path
import sys

def normalize_path(path_str: str) -> Path:
    """Convert any path string to a native Path, preserving UNC paths.

    Args:
        path_str: Path string (Linux or Windows style).

    Returns:
        Native Path object.

    Examples:
        >>> normalize_path('//server/share/file.txt')  # UNC path
        >>> normalize_path('/home/user/file.txt')      # Linux path
    """
    # On Windows, convert forward-slash UNC paths to native backslash form
    if sys.platform == "win32" and path_str.startswith("//"):
        return Path(path_str.replace("/", "\\"))
    return Path(path_str)

# For storage/transmission, always use POSIX style
def path_to_string(path: Path) -> str:
    """Convert Path to portable string representation."""
    return path.as_posix()

# Accept both styles in function signatures
def process_file(file_path: str | Path) -> None:
    path = Path(file_path) if isinstance(file_path, str) else file_path
    # ... process

Configuration & API Guidelines:

Store paths as POSIX strings (/path/to/file, //server/share)
Parse with Path() or normalize_path() at runtime
Never hardcode backslashes in source code or configs

.env File Reading

Use python-dotenv for loading .env files.

Use find_dotenv() to walk up parent directories automatically
Do not commit .env files to version control

from dotenv import load_dotenv, find_dotenv

# find_dotenv() walks up directories to find .env
load_dotenv(find_dotenv(usecwd=True))

# Or with explicit path
load_dotenv("/path/to/project/.env")

# Access loaded values
import os
db_host = os.getenv("DB_HOST", "localhost")

Integration with Pydantic BaseSettings (pip install pydantic-settings):

from pydantic import ConfigDict
from pydantic_settings import BaseSettings

class AppSettings(BaseSettings):
    db_host: str = "localhost"
    db_port: int = 5432
    debug: bool = False

    model_config = ConfigDict(env_file=".env", env_file_encoding="utf-8")

settings = AppSettings()  # auto-reads .env + environment variables

Configuration Management

Use lib_layered_config for application configuration.

Supports layered config: defaults -> config file -> environment variables -> CLI args
Integrates with lib_log_rich for logging setup (see Logging section)

For simpler cases or libraries, use pydantic.BaseSettings (see .env section above).

Subprocess / External Commands

Use subprocess.run (stdlib) for running external commands.

Always use list form for arguments, not shell=True
Set check=True to raise on non-zero exit codes
Set capture_output=True when you need stdout/stderr
Set timeout to prevent hanging

import subprocess

result = subprocess.run(
    ["git", "status", "--porcelain"],
    capture_output=True,
    text=True,
    check=True,
    timeout=30,
)
print(result.stdout)

Anti-patterns

shell=True with unsanitized input (command injection risk)
os.system() (no output capture, no error handling)
Missing timeout on external calls

Type Checking

Use mypy for static type checking.

Run as part of CI/CD pipeline
Use strict mode (--strict) for new projects
Use py.typed marker file for typed libraries (PEP 561)

# pyproject.toml
[tool.mypy]
python_version = "3.10"
strict = true
warn_return_any = true
warn_unused_configs = true

Adoption

bitranox/bx-python-libraries-to-use

$ install --global

Security Scan Results

SKILL.md

Python Library Usage Guidelines

Overview

Quick Reference Table

CLI Tools

Logging

Colored Terminal Output

Text-based User Interfaces (TUI)

Data Models: Dataclasses vs Pydantic

Decision Matrix

Preference Order

Dict Usage Policy

Anti-patterns

Data Flow Pattern

Enums: Prefer IntEnum

Why IntEnum?

When to use each

Example

Anti-patterns

Serialization Formats

TOML

JSON

YAML

HTTP/REST Calls

Database Access

Decision Matrix

Usage Patterns

Rules

Testing

Compression (gzip alternatives)

Date & Time

Anti-patterns

Directory & Filepath Handling

.env File Reading

Configuration Management

Subprocess / External Commands

Anti-patterns

Type Checking

Related Skills

bitranox/bx-textual

bitranox/bx-skill-writer

bitranox/bx-rpyc

bitranox/bx-python-clean-architecture

bitranox/bx-python-libraries-to-use

$ install --global

Security Scan Results

SKILL.md

Python Library Usage Guidelines

Overview

Quick Reference Table

CLI Tools

Logging

Colored Terminal Output

Text-based User Interfaces (TUI)

Data Models: Dataclasses vs Pydantic

Decision Matrix

Preference Order

Dict Usage Policy

Anti-patterns

Data Flow Pattern

Enums: Prefer IntEnum

Why IntEnum?

When to use each

Example

Anti-patterns

Serialization Formats

TOML

JSON

YAML

HTTP/REST Calls

Database Access

Decision Matrix

Usage Patterns

Rules

Testing

Compression (gzip alternatives)