skills/chdb-sql/SKILL.md
Use when the user wants to run SQL — especially analytical SQL — on local files (parquet/csv/json), URLs, S3 paths, or remote databases (Postgres, MySQL, MongoDB, ClickHouse Cloud, Iceberg, Delta Lake) without setting up a server. Provides chDB — embedded ClickHouse SQL in Python with 1000+ functions, Session for stateful multi-step pipelines, parametrized queries, and cross-source joins via `s3()`, `mysql()`, `postgresql()`, `iceberg()`, `deltaLake()`, `remoteSecure()` table functions. TRIGGER when: user wants SQL on parquet/csv/files or across remote analytical sources; uses ClickHouse SQL features (window functions, windowFunnel, geoToH3, JSON path ops, Session, parametrized queries); imports `chdb` or calls `chdb.query()`. SKIP this skill for pandas-style DataFrame method-chaining (use chdb-datastore instead) or ClickHouse server administration.
npx skillsauth add ClickHouse/agent-skills chdb-sqlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run ClickHouse SQL directly in Python — no server needed. Query local files, remote databases, and cloud storage with full ClickHouse SQL power.
pip install chdb
1. One-off query on files or databases → chdb.query()
2. Multi-step analysis with tables → Session
3. DB-API 2.0 connection → chdb.connect()
4. Pandas-style DataFrame operations → Use chdb-datastore skill instead
import chdb
chdb.query("SELECT * FROM file('data.parquet', Parquet) WHERE price > 100 LIMIT 10") # local files
chdb.query("SELECT * FROM mysql('db:3306', 'shop', 'orders', 'root', 'pass')") # databases
chdb.query("SELECT * FROM s3('s3://bucket/data.parquet', NOSIGN) LIMIT 10") # cloud storage
chdb.query("SELECT * FROM deltaLake('s3://bucket/delta/table', NOSIGN) LIMIT 10") # data lakes
# Cross-source join
chdb.query("""
SELECT u.name, o.amount FROM mysql('db:3306', 'crm', 'users', 'root', 'pass') AS u
JOIN file('orders.parquet', Parquet) AS o ON u.id = o.user_id ORDER BY o.amount DESC
""")
data = {"name": ["Alice", "Bob"], "score": [95, 87]}
chdb.query("SELECT * FROM Python(data) ORDER BY score DESC") # Python data
df = chdb.query("SELECT * FROM numbers(10)", "DataFrame") # output formats
chdb.query("SELECT toDate({d:String}) + number FROM numbers({n:UInt64})",
"DataFrame", params={"d": "2025-01-01", "n": 30}) # parametrized
Table functions → table-functions.md | SQL functions → sql-functions.md | Full API → api-reference.md
from chdb import session as chs
sess = chs.Session("./analytics_db") # persistent; Session() for in-memory
sess.query("CREATE TABLE users ENGINE=MergeTree() ORDER BY id AS SELECT * FROM mysql('db:3306','crm','users','root','pass')")
sess.query("CREATE TABLE events ENGINE=MergeTree() ORDER BY (ts,user_id) AS SELECT * FROM s3('s3://logs/events/*.parquet',NOSIGN)")
sess.query("""
SELECT u.country, count() AS cnt, uniqExact(e.user_id) AS users
FROM events e JOIN users u ON e.user_id = u.id
WHERE e.ts >= today() - 7 GROUP BY u.country ORDER BY cnt DESC
""", "Pretty").show()
sess.close()
from chdb import dbapi
conn = dbapi.connect()
cur = conn.cursor()
cur.execute("SELECT * FROM file('data.parquet', Parquet) WHERE value > 100")
print(cur.fetchall())
cur.close()
conn.close()
| Problem | Fix |
|---------|-----|
| ImportError: No module named 'chdb' | pip install chdb |
| DB::Exception: FILE_NOT_FOUND | Check file path; use absolute path or verify cwd |
| DB::Exception: Unknown table function | Check function name spelling (e.g., deltaLake not deltalake) |
| Connection refused to remote DB | Check host:port format; ensure remote DB allows connections |
| Environment check | Run python scripts/verify_install.py (from skill directory) |
Note: This skill teaches how to use chdb SQL. For pandas-style operations, use the
chdb-datastoreskill. For contributing to chdb source code, see CLAUDE.md in the project root.
tools
Use when a user wants to build an application with ClickHouse, set up a local ClickHouse development environment, install ClickHouse, create a local server, create tables, or start developing with ClickHouse. Covers the full flow from zero to a working local ClickHouse setup.
tools
Use when the user has tabular data (pandas DataFrame, parquet, csv, Arrow, json) and wants to filter, group, aggregate, join, or speed up slow pandas. Provides chDB DataStore — same pandas API, ClickHouse engine underneath. Also handles reading from S3, MySQL, PostgreSQL, MongoDB, ClickHouse Cloud, Iceberg, Delta Lake as DataFrames and joining across sources. TRIGGER when: user mentions DataFrame, parquet, csv, "fast pandas", "speed up pandas", or cross-source DataFrame joins; user imports `chdb.datastore` or `from datastore import DataStore`. SKIP this skill for raw SQL syntax (use chdb-sql instead), ClickHouse server administration, or non-Python DataStore API work.
tools
Write idiomatic application code with the ClickHouse Node.js client (`@clickhouse/client`). Use this skill whenever a user is *building* against the Node.js client — configuring the client, pinging, inserting rows in JSON or raw formats, selecting and parsing results, binding query parameters, managing sessions and temporary tables, working with data types or customizing JSON parsing. Do NOT use for browser/Web client code.
tools
MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 31 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.