altinity-expert-clickhouse/skills/altinity-expert-clickhouse-kafka/SKILL.md
Diagnose ClickHouse Kafka engine health, consumer status, thread pool capacity, and consumption issues. Use for Kafka lag, consumer errors, and thread starvation.
npx skillsauth add altinity/skills altinity-expert-clickhouse-kafkaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run all queries from checks.sql in this skill's directory and analyze the results.
Check if consumers are stuck by comparing exception time vs activity times:
last_exception_time >= last_poll_time OR last_exception_time >= last_commit_time → consumer stuck on error, not progressingThe exceptions column is a tuple of arrays with matching indices — exceptions.time[-1] and exceptions.text[-1] give the most recent error.
kafka_consumers > mb_pool_size → thread starvation — consumers waiting for available threadsbackground_message_broker_schedule_pool_size (default: 16)max.poll.interval.ms and get kicked from the groupJSONExtract calls re-parsing the same JSON blobJSONExtract(json, 'Tuple(...)') AS parsed + tupleElement() — see troubleshooting.mdFor deeper investigation, run queries from advanced_checks.sql in this skill's directory:
Important: rdkafka_stat is not enabled by default in ClickHouse. It requires <statistics_interval_ms> in the Kafka engine settings. See advanced_checks.sql for setup instructions.
For troubleshooting common errors and configuration guidance, see troubleshooting.md:
| Finding | Load Module | Reason |
|---------|-------------|--------|
| Slow MV inserts | altinity-expert-clickhouse-ingestion | Insert pipeline analysis |
| High merge memory | altinity-expert-clickhouse-merges | Merge patterns |
| Query-level issues | altinity-expert-clickhouse-reporting | Query optimization |
| Schema concerns | altinity-expert-clickhouse-schema | Table design |
| Setting | Scope | Notes |
|---------|-------|-------|
| background_message_broker_schedule_pool_size | Server | Thread pool for Kafka/RabbitMQ/NATS consumers (default: 16) |
| kafka_num_consumers | Table | Parallel consumers per table (limited by cores) |
| kafka_thread_per_consumer | Table | Required for parallel inserts (= 1) |
| kafka_handle_error_mode | Table | stream (21.6+) or dead_letter (25.8+) |
| max_poll_interval_ms | librdkafka | Max time between polls before consumer is kicked (default: 300s) |
| statistics_interval_ms | librdkafka | Enable rdkafka_stat collection (disabled by default) |
tools
Read-only ClickHouse security audit expert for live or exported systems. Use when assessing ClickHouse security posture, reviewing users, roles, grants, settings profiles, row policies, table functions, external sources, table engines, executable UDFs, audit logs, named collections, password hash hygiene, SQL SECURITY DEFINER, impersonation, TLS/network exposure, Keeper/interserver security, encryption at rest, backups, the HTTP interface surface, cluster security, or version-specific ClickHouse security behavior. Diagnoses from SQL/system tables, supplied configuration files, query logs, access metadata, and ClickHouse/Altinity documentation.
tools
Diagnose and resolve ClickHouse grant and authentication errors, especially after upgrades. Use when queries fail with ACCESS_DENIED/NOT_ENOUGH_PRIVILEGES, AUTHENTICATION_FAILED/WRONG_PASSWORD/REQUIRED_PASSWORD, or ON CLUSTER privilege errors; when system.* or INFORMATION_SCHEMA access is denied; or when grant behavior changes after version upgrades.
tools
Profile a ClickHouse cluster via MCP and emit a per-cluster "analyst" Skill the user can save in claude.ai. Activate when the user asks to "profile this ClickHouse", "generate an analyst skill", "build a schema guide", "map the data in this cluster", or regenerate an existing cluster-analyst Skill after schema changes. Works against any ClickHouse with read-only SELECT/SHOW/DESCRIBE access via an `execute_query` MCP tool (e.g. the Altinity MCP server). Outputs a 5-file markdown bundle plus a README.
tools
Diagnose ClickHouse disk usage, compression efficiency, part sizes, and storage bottlenecks. Use for disk space issues and slow IO.